Probability and the Normal Distribution
·
Recognize basic probability concepts
·
Discern the basics of the Normal Probability
Distribution
·
Apply the Normal Distribution to solve
probability problems
·
Solve probability problems using the Central
Limit Theorem when the distribution is not Normal
Probability and the Normal Distribution
Probability is an
everyday part of our lives. We use it to measure uncertainty and to help us
make choices. For example, when the weatherman says that there is a 90% chance
of rain, do you take an umbrella that day? Probability is NOT the same as betting odds (payoffs); Kentucky Derby
and Pro Sports sets odds to insure they make money and the odds depend on the
trend in betting. Lower odds do not mean one is more likely to win, it just
means that more people think something is true, or that an event will happen.
Vegas tables set payoffs lower than true odds so they always come out ahead in
the money.
Solving Probability Problems
|
Dem
|
Rep
|
Ind
|
Total
|
Male
|
10
|
25
|
5
|
40
|
Female
|
30
|
20
|
10
|
60
|
Total
|
40
|
45
|
15
|
100
|
1. P[Female]
2. P[Republican]
3. P[Female
and Democrat]
4. P[Female
or Democrat]
5. P[Female
| Democrat] (i.e., probability of female given Democrat)
1.
P[Female] = total females / total people = 60/100
|
Dem
|
Rep
|
Ind
|
Total
|
Male
|
10
|
25
|
5
|
40
|
Female
|
30
|
20
|
10
|
60
|
Total
|
40
|
45
|
15
|
100
|
2.
P[Republican] = total republicans / total people = 45/100
|
Dem
|
Rep
|
Ind
|
Total
|
Male
|
10
|
25
|
5
|
40
|
Female
|
30
|
20
|
10
|
60
|
Total
|
40
|
45
|
15
|
100
|
3. P[Female
and Dem] = intersection of females and democrats / total people
= 30/100
|
Dem
|
Rep
|
Ind
|
Total
|
Male
|
10
|
25
|
5
|
40
|
Female
|
30
|
20
|
10
|
60
|
Total
|
40
|
45
|
15
|
100
|
4. P[Female
or Dem] = union of females and dem / total people =
(10+30+20+10)/100 =70/100
|
Dem
|
Rep
|
Ind
|
Total
|
Male
|
10
|
25
|
5
|
40
|
Female
|
30
|
20
|
10
|
60
|
Total
|
40
|
45
|
15
|
100
|
5. P[Female/Dem]
= percentage of democrats who are female (i.e., prob of
female given Dem)
= # of female democrats / total
democrats = 30/40
|
Dem
|
|
|
|
Male
|
10
|
|
|
|
Female
|
30
|
|
|
|
Total
|
40
|
|
|
|
6. P[Dem/Female]
= percentage of females who are democrat (i.e., prob of
Dem given female)
= # of female democrats / total females
= 30/60
|
Dem
|
Rep
|
Ind
|
Total
|
|
|
|
|
|
Female
|
30
|
20
|
10
|
60
|
|
|
|
|
|
The Normal Distribution
If one were to look
at a 5'8" tall man, he would probably be considered short. If one were to
look at a 5'7" tall woman, she would probably be considered tall, despite
the fact that she is shorter than the man is. Why? It is because we tend to
compare each data point to its respective mean. To solve problems regarding
heights, one would think we would need two separate formulas (one for men and
one for women). However, this is not true. The Normal Probability Distribution
allows us to put everything on the same scale. The Normal
curve is a bell-shaped curve that peaks in the middle at the mean. The Standard
Normal curve has a mean of zero. Units on the standard normal curve are
measured in terms of the standard deviations; one standard deviation in both
directions from the mean captures 68% of the data, two standard deviations in
both directions captures 95% of the data, and three standard deviations in both
directions captures 99.7% of the data. As expected, the bulk of the data is
close to the mean.
The Standard Normal
Distribution has values mainly between -3 and +3 as measured by a z-score, with
the z-score being the number of standard deviations a value is from the mean.
Thus, by converting the normal distribution of male height to a standard normal
distribution, the height of 5'8" for a man is about one standard deviation
below average (z = -1), and the height of 5'7" for a female is about two
standard deviations above average (z = +2). The female would show up to the
right of the male on this standard scale.
The SAT scores, for
example, are computed by the comparing one's raw score with the mean raw score
and dividing by the standard deviation (thus giving the z-score); then the
z-score is converted to the SAT scale, which forces the mean to be 500 and the
standard deviation to be 100. If the mean raw score on the math section was 40
and the standard deviation was 10, then a person who scored 60 would have
scored 20 points higher than the mean, and since the standard deviation is 10,
that person scored 2 standard deviations above average. On the SAT system, this
translates to 200 points above the average of 500, so the SAT score would be
700. Because SAT scores are normally distributed, we know that about 68% of
students score between 400 and 600, 95% score between 300 and 700, and the
remaining 5% score below 300 or above 700.
One of the great
features of using the Normal Distribution is that we can compute probabilities
very easily. We only need the mean and standard deviation to define the curve completely.
If one scored 400, and the mean is 500 with a standard deviation of 100, the
computed score is 1 standard deviation below the mean.
The Central Limit Theorem
When a distribution
is not Normal, we cannot compute probability directly. However, if we take
samples from the data, the means of the samples will appear normally
distributed if the sample size is large enough. Regardless of the look of the
original distribution, larger samples will result in the curve starting to
appear Normal.
Generally, a sample of at least 30 guarantees that the distribution of sample
means will be normally distributed. The mean of the new distribution is the
same as the original mean, but the spread is cut down dramatically, and this
dispersion in the distribution of samples is known as the standard error (which
is computed from the standard deviation and the sample size).
When solving
problems regarding the probability of a sample mean being in a specified range,
only the mean, the standard deviation, and the sample size are needed. Excel
will still do all of the work, but here is an example nonetheless.
If the mean age at
GCU is 40 and the standard deviation is 12, then what is the probability of
taking a sample of 36 students and finding the average age to be over 42? While
42 is only two years higher than 40, we are dealing with a sample mean here.
The Central Limit Theorem will apply. The standard error = 12 / square root
(36) = 12 / 6 = 2. So the z-score = (42 – 40) / 2 = 1.00. 16% of the values
have a z-score greater than 1.00, so the probability of finding an average
higher than 42 is 16%.
Conclusion
These
core tools take the powers of probability to a new height. A measure of central
tendency and a measure of dispersion can completely define a distribution, and
now can allow one to solve for any probability in regards to that distribution.
The only catch is that the data must be normally distributed, but if not, there
are ways to bend the rules legally and make the data work