Make your own free website on

Business Statistics

Home | Whitty's MBA Home | Introduction To Statistics | Descriptive Statistics and Graphical Analysis | Measures of Central Tendency | Probability and the Normal Distribution | A Statistical Review | Confidence Intervals and Sample Size | Hypothesis Testing | More Hypothesis Testing | Confidence and Testing Review | Stats Websites Links
Hypothesis Testing

Hypothesis Testing

          Analyze the basics of hypothesis testing as it applies to everyday life

          Determine errors in hypothesis testing and how to choose a significance level

          Solve one sample hypothesis tests for means

Hypothesis Testing

How can a coin be proved perfectly fair? It cannot, but it can be proven that it is not perfect. Flipping 5 H / 5 T should not be convincing; likewise, 9H/ 1T. But if it flips 90 H / 10 T, something may be wrong, while 60 H / 40 T may not be enough to make a judgment.


If a student improves their SAT by 10 points using a SAT improvement course, would they be satisfied? NO. But if they improved their ACT by 10, they would be thrilled. Points are relative. There is a threshold for everyone such that one will draw a conclusion with little chance of error.


Stepping on and off a scale, the weights vary. Why?  Does someone lose weight merely by stepping on and off? What if a skier weighed 165, and cross country skied for 20 minutes, weighed in at 164 and claim they lost a pound. Twenty minutes later they weigh in at 166 and claim that the skiing does not work anymore. At what point is a weight loss significant enough to declare it was attributed to exercise and not to the imbalance of the scale? Perhaps 5 pounds in 5 days might be convincing, but 3 pounds in 5 days may be shifts in water weight combined with scale imbalance.


A theory about a parameter (the average is ..., the proportion is ...), can be tested via a hypothesis test. If the results are far enough from the hypothesized number, it can be concluded that with minimal error, the statement must be true.


      In the judicial system, the null hypothesis (H0) is that the defendant is innocent; while the alternate hypothesis (H1) is that the defendant is guilty. No one is ever found innocent, because innocence cannot be proven. Guilt can be proven, or lack enough evidence to conclusively prove guilt. Imagine how hard it would be to be proven innocent. All of the examples of honesty cannot prove exclusive innocence, while one example of dishonesty surely proves guilt. This is why our legal system assumes innocence until proven guilty, but it never declares innocence. 


Anyway, the prosecutor’s job is to gather evidence in the form of fingerprints, witnesses, blood, etc., and try to convince the jury that there is enough evidence to convict the defendant. The jury has a significance level that serves as a cutoff for error (i.e, reasonable doubt). The jury either decides that sufficient evidence exists beyond reasonable doubt and therefore rejects the null and finds the defendant guilty (meaning the chance of convicting by mistake is very small), or the jury finds that there is not sufficient evidence to convict and, therefore, finds the defendant not guilty. Note that a not guilty verdict does not mean they find the defendant innocent, because the evidence is still there, but the evidence was not enough to convince them beyond reasonable doubt (i.e., too great a risk of error of sending an innocent person to prison). The significance value (or p-value) that is seen in the SPSS output is the risk of error, while the significance level that the courts set is the tolerance for error. If the p-value is less than the significance level, then the risk for error is within the tolerance level and the null is rejected; in this case it means the court feels very confident in their decision to convict. If the p-value is greater than the significance level, then the risk is too great and the null is not rejected (thus acquitting the defendant due to lack of evidence). The p-value of .01, for example, means that if the defendant is convicted, there is a 1% chance that he is innocent; a p-value of .25 means that if the defended is convicted, there is a 25% chance that he is innocent. The cutoff is determined by the significance level set (typically 5% in most studies). Thus, a low p-value is typically the goal in research so something can be proven with evidence. 

Type 1 and Type 2 Errors

Now what bothers people most – convicting an innocent man or acquitting a guilty man? Both stink, but our system is predicated on the former being worse. When the null is rejected but it is true, a Type 1 error has been committed; the probability of such an error is the significance level that chosen for the hypothesis test. The null is not rejected and it is false, a Type 2 error has been committed; the probability of such an error is unknown, but as the Type 1 error increases, the Type 2 error decreases. Thus, the probability of a Type 2 error gets smaller by setting the significance level higher, but then the probability of a Type 1 error increases.


Now since our country does not like to see innocent men go to jail, we would like to see the Type 1 error = 0, but that would mean everyone must be acquitted, as it is the only way to guarantee that no innocent person is ever convicted. As a result, the probability of a Type 2 error grows to 100% because all guilty people go free. And what about the problem of acquitting the guilty – just make the Type 2 error = 0 so that never happens, but the only way that is possible is to convict everyone, thus making the probability of a Type 1 error = 100%. It seems that it is impossible to eliminate both errors, and to eliminate one of the two means the other occurs 100% of the time. Thus, the best solution is to allow the possibility of both errors (which is what our court system currently does). This is truly the best solution. 


Raising or lowering the significance level can provide input, depending on which of the two errors is more serious. In the court system, the Type 1 error is more serious, so the courts might set the significance level to .01, making it only a 1% chance of convicting an innocent person. In airport security with the null hypothesis is that the passenger is unarmed, a Type 1 error would mean having to frisk a person in error (because the alarm sounded) and a Type 2 error would mean letting an armed man walk through. Of course a Type 2 error is worse here, so the best thing to do is raise the significance level to maybe .10 (10% probability that an unarmed person is searched), thus making it very unlikely that an armed person gets by.


This is the basics of hypothesis testing and the errors associated with it. Research in all forms (academic, medical, business, educational …) all involve some form of hypothesis testing. There is a lot of detail involved in doing it correctly, but everything predicates on the hypothesis and the significance level. So set it up right and make the test results meaningful.


Whitt's Consulting * Reading * PA