Elliot's Statistics Study Page

Independent vs. Dependent Variables

Create a sentence with the word "causes" in it.
Then the independent variable "causes" the dependent variable
Example
Independent: number of players on the DL list
Dependent: length of losing streak

Levels of measurement
Nominal	Categories have names but No numbers	Pitchers Catchers Infielders Outfielders
Ordinal	Categories can be placed in an Ordered list, but do not have associated numbers The terms "greater than" and "less than" can be applied	Freshman Sophomore Junior Senior
Interval-ratio	Categories have equal numeric intervals Ratio: One category begins at zero	GPA 2.51 - 3.00 GPA 3.01 - 3.50 GPA 3.51 - 4.00

Reference

Types of statistics
Descriptive Statistics	Univariate	Describes one variable at a time	Average ERA of Red Sox pitchers
Descriptive Statistics	Bivariate	Describes the association between 2 variables	Average ERA of pitchers by team
Inferential Statistics	Determine if the statistics from a sample IN a larger population confidently describe the statistics of the larger population Reference

Proportion, percentage, ratio, rate
These all have to do with comparing parts of something either to the whole thing or to each other
How big a part of the whole is something?	Proportion	Number of observations in one category ÷ Number of observations in all categories Always a number BETWEEN 0 AND 1	The Red Sox have won of games this season. Proportion won:
	Percentage	Number of observations in one category ÷ Number of observations in all categories x 100 Always a number BETWEEN 0 AND 100	The Yankees have won of games this season. Percentage won: %
	You can convert between proportion and percentage by multiplying (or dividing) by 100
	Crude Rate Rate	Number of actual occurrences of a phenomenon ÷ Number of possible occurrences of a phenomenon in some unit of time Always a number BETWEEN 0 AND 1	There were cases of swine flu in Sudbury last year The population of Sudbury is Crude infection rate: Infection rate per 1000:
	Rates are often expressed as powers of 10 to eliminate decimal points. To do this, just multiply the crude rate by the power. For example, a crude rate of 0.526 is the same as 526 (0.526 x 1000) per thousand
How do two things compare to each other?	Ratio	Number of observations in one category ÷ Number of observations in another category Always a number GREATER THAN 0	In Sudbury there are Democrats and Republicans Ratio of Democrats to Republicans:
How do two things compare to each other?	Ratios are often expressed as powers of 10 to eliminate decimal points. To do this, just multiply the ratio result by the power. For example, a ratio of 0.873 is the same as 873 (0.873 x 1000) per thousand

The Normal Curve

If you begin collecting data on a variable that is continuous, such as heights, weights, temperatures, etc., if you take enough samples (collect enough data points), the population will usually be described by the Normal Distribution. That can be proven by something called the Central Limit Theorem, but the proof is a mathematical exercise best left to math geeks.
The Normal Curve is a function plotted in xy coordinates. The actual equation is complex and not necessary to learn. The key features of the normal distribution are as follows:

It has the shape of a "bell" curve
The curve is horizontally symmetric (the left half will match the right if folded over)
The tails of the curve extend indefinitely and never actually touch the X axis
The midpoint of the curve is the same as the arithmetic mean of the sample and is designated by μ (mu)
A value named σ (sigma) is the same as the standard deviation of the sample. When σ is large, the curve is short and wide; when σ is small, the curve is tall and narrow.

The Normal Distribution is a continuous probability density function defined by the area under a portion of the Normal Curve. The total area under the curve is always 1, no matter what μ and σ are The thing that is important about the Normal Curve is not the value of any point on it. What is important is the area (distribution) under various portions of it.

Some of the kinds of questions you can use the Normal Distribution to answer are as follows (use the above diagram):

What is the probability that some value, A, is greater than X? (X < A < +∞)
What is the probability that some value, A, is less than X? (-∞ > A < X)

Areas under the normal curve

About 68% of the area under the curve falls within 1 standard deviation of the mean.
About 95% of the area under the curve falls within 2 standard deviations of the mean.
About 99.7% of the area under the curve falls within 3 standard deviations of the mean.
Summary: (the 68 - 95 - 99.7 rule)

How to use the normal distribution

Most problems using the normal distribution involve matching up and combining "slices" of area with the number of standard deviations of the mean. The mean (μ) and standard deviation (σ) are usually provided. Here's a diagram of the most important slices:

Using the above diagram, if the average (μ) of Major League pitchers' ERA's is and the standard deviation (σ) is , then:

50% of pitchers had an ERA less than
If a pitcher is selected at random, the chance his ERA is greater than is 50%
34% of pitchers had an ERA between (μ - 1σ) and (μ)
If a pitcher is selected at random, the chance his ERA is between (μ) and (μ + 1σ) is 34%

The following numbers are important to recognize in a problem:

2.5%	-∞ to μ - 2σ	μ + 2σ to +∞
5%	-∞ to μ - 2σ AND μ + 2σ to +∞
13.5%	μ - 2σ to μ - 1σ	μ + 1σ to μ + 2σ
16%	-∞ to μ - 1σ	μ + 1σ to +∞
32%	-∞ to μ - 1σ AND μ + 1σ to +∞
34%	μ - 1σ to μ	μ to μ + 1σ
47.5%	μ - 2σ to μ	μ to μ + 2σ
50%	-∞ to μ	μ to +∞
68%	μ - 1σ to μ + 1σ
84%	-∞ to μ + 1σ	μ - 1σ to +∞
95%	μ - 2σ to μ + 2σ
97.5%	-∞ to μ + 2σ	μ - 2σ to +∞

Problems using the normal distribution

The mean batting average (μ) for the American League is now The standard deviation (σ) of the batting average for the American League is now 50% of the batters had batting averages greater than If you select a random batter, the probability his average is less than is 50% of the batters have averages GREATER THAN ??? A randomly selected batter has a ??? chance of having a batting average less than of the batters have averages BETWEEN ???