Hypothesis Testing

·
Analyze the basics of hypothesis testing as it applies to everyday life

·
Determine errors in hypothesis testing and how to choose a significance level

·
Solve one sample hypothesis tests for means

Hypothesis Testing

How can a coin be proved perfectly fair? It cannot, but it can
be proven that it is not perfect. Flipping 5 H / 5 T should not be convincing; likewise, 9H/ 1T. But if it flips 90 H / 10
T, something may be wrong, while 60 H / 40 T may not be enough to make a judgment.

If a student improves their SAT by 10 points using a SAT improvement
course, would they be satisfied? NO. But if they improved their ACT by 10, they would be thrilled. Points are relative. There
is a threshold for everyone such that one will draw a conclusion with little chance of error.

Stepping on and off a scale, the weights vary. Why? Does someone lose weight merely by stepping on and off? What if a skier weighed 165, and cross country
skied for 20 minutes, weighed in at 164 and claim they lost a pound. Twenty minutes later they weigh in at 166 and claim that
the skiing does not work anymore. At what point is a weight loss significant enough to declare it was attributed to exercise
and not to the imbalance of the scale? Perhaps 5 pounds in 5 days might be convincing, but 3 pounds in 5 days may be shifts
in water weight combined with scale imbalance.

A theory about a parameter (the average is ..., the proportion
is ...), can be tested via a hypothesis test. If the results are far enough from the hypothesized number, it can be concluded
that with minimal error, the statement must be true.

In
the judicial system, the null hypothesis (H_{0}) is that the defendant is innocent; while the alternate hypothesis
(H_{1}) is that the defendant is guilty. No one is ever found innocent, because innocence cannot be proven. Guilt
can be proven, or lack enough evidence to conclusively prove guilt. Imagine how hard it would be to be proven innocent. All
of the examples of honesty cannot prove exclusive innocence, while one example of dishonesty surely proves guilt. This is
why our legal system assumes innocence until proven guilty, but it never declares innocence.

Anyway, the prosecutor’s job is to gather evidence in the
form of fingerprints, witnesses, blood, etc., and try to convince the jury that there is enough evidence to convict the defendant.
The jury has a significance level that serves as a cutoff for error (i.e, reasonable doubt). The jury either decides that
sufficient evidence exists beyond reasonable doubt and therefore rejects the null and finds the defendant guilty (meaning
the chance of convicting by mistake is very small), or the jury finds that there is not sufficient evidence to convict and,
therefore, finds the defendant not guilty. Note that a not guilty verdict does not mean they find the defendant innocent,
because the evidence is still there, but the evidence was not enough to convince them beyond reasonable doubt (i.e., too great
a risk of error of sending an innocent person to prison). The significance value (or p-value) that is seen in the SPSS output
is the risk of error, while the significance level that the courts set is the tolerance for error. If the p-value is less
than the significance level, then the risk for error is within the tolerance level and the null is rejected; in this case
it means the court feels very confident in their decision to convict. If the p-value is greater than the significance level,
then the risk is too great and the null is not rejected (thus acquitting the defendant due to lack of evidence). The p-value
of .01, for example, means that if the defendant is convicted, there is a 1% chance that he is innocent; a p-value of .25
means that if the defended is convicted, there is a 25% chance that he is innocent. The cutoff is determined by the significance
level set (typically 5% in most studies). Thus, a low p-value is typically the goal in research so something can be proven
with evidence.

Type 1 and Type 2 Errors

Now what bothers people most – convicting an innocent man
or acquitting a guilty man? Both stink, but our system is predicated on the former being worse. When the null is rejected
but it is true, a Type 1 error has been committed; the probability of such an error is the significance level that chosen
for the hypothesis test. The null is not rejected and it is false, a Type 2 error has been committed; the probability of such
an error is unknown, but as the Type 1 error increases, the Type 2 error decreases. Thus, the probability of a Type 2 error
gets smaller by setting the significance level higher, but then the probability of a Type 1 error increases.

Now since our country does not like to see innocent men go to
jail, we would like to see the Type 1 error = 0, but that would mean everyone must be acquitted, as it is the only way to
guarantee that no innocent person is ever convicted. As a result, the probability of a Type 2 error grows to 100% because
all guilty people go free. And what about the problem of acquitting the guilty – just make the Type 2 error = 0 so that
never happens, but the only way that is possible is to convict everyone, thus making the probability of a Type 1 error = 100%.
It seems that it is impossible to eliminate both errors, and to eliminate one of the two means the other occurs 100% of the
time. Thus, the best solution is to allow the possibility of both errors (which is what our court system currently does).
This is truly the best solution.

Raising or lowering the significance level can provide input,
depending on which of the two errors is more serious. In the court system, the Type 1 error is more serious, so the courts
might set the significance level to .01, making it only a 1% chance of convicting an innocent person. In airport security
with the null hypothesis is that the passenger is unarmed, a Type 1 error would mean having to frisk a person in error (because
the alarm sounded) and a Type 2 error would mean letting an armed man walk through. Of course a Type 2 error is worse here,
so the best thing to do is raise the significance level to maybe .10 (10% probability that an unarmed person is searched),
thus making it very unlikely that an armed person gets by.

Conclusion

This is the basics of hypothesis testing and the
errors associated with it. Research in all forms (academic, medical, business, educational …) all involve some form
of hypothesis testing. There is a lot of detail involved in doing it correctly, but everything predicates on the hypothesis
and the significance level. So set it up right and make the test results meaningful.