Hypothesis Tests: Six Steps (Plus One)
Copyright © 2010–2012 by Stan Brown, Oak Road Systems
Copyright © 2010–2012 by Stan Brown, Oak Road Systems
See also: Inferential Statistics: Basic Cases and Top 10 Mistakes of Hypothesis Tests
The name I give to each step is just a convenient reference. Write down the step numbers, not these step names, when doing your hypothesis test.
Always state them in symbols. When it’s helpful, particularly for two-population tests, state them in English as well.
H0 and H1, the null hypothesis and the alternative hypothesis, are always identical except for the relational symbol. Case 0 through 3 hypotheses will always look like one of these three possibilities:
| H0: (parameter) = (number) H1: (parameter) < (number) |
H0: (parameter) = (number) H1: (parameter) ≠ (number) |
H0: (parameter) = (number) H1: (parameter) > (number) |
(parameter) is μ or μd for numeric data, p for binomial data. Use the right one! (When the parameter is μd you must define d before writing the hypotheses.)
The ≠ test is called a two-tailed test because it tests for a difference in either direction, above or below; the others are one-tailed tests. See One-Tailed or Two-Tailed Hypothesis Test? for help deciding whether you need a one-tailed or two-tailed test.
The (number) is what you are testing for — the claim. Never use sample data here!
Example: Suppose management claims that the average deposit is $200 with standard deviation $45, and your random sample of 100 deposits has a mean of $189.56. Here are your hypotheses:
H0: μ = 200, management’s claim is correct
H1: μ ≠ 200, management’s claim is wrong
(This example will continue through the other steps.)
Notice that the words add something; they’ll be helpful when you come to write your conclusion in Step 6. If all you said in words was “the mean is not 200”, that wouldn’t be helpful enough to be worth the ink it takes.
For Cases 4 and 5, you compare parameters of two populations:
| Case 4 (numeric data): Define Population 1 and Population 2, then | ||
| H0: μ1 = μ2 H1: μ1 < μ2 |
H0: μ1 = μ2 H1: μ1 ≠ μ2 |
H0: μ1 = μ2 H1: μ1 > μ2 |
| Case 5 (binomial data): Define Population 1 and Population 2, then | ||
| H0: p1 = p2 H1: p1 < p2 |
H0: p1 = p2 H1: p1 ≠ p2 |
H0: p1 = p2 H1: p1 > p2 |
For Cases 6 and 7, there are no parameters and you give H0 and H1 in words. Case 6 hypotheses are something like
H0: The (state the model) model is good
H1: The model is bad
For Case 7, which tests the statistical significance of a two-way table, the hypotheses vary more, but one common formulation is
H0: (row variable) is independent of (column variable)
H1: (row variable) is not independent of (column variable)
However it’s phrased, H0 is always “nothing special going on here” and H1 is always “something is happening.”
State α here. Usually it will be given to you in the problem, but if you have to pick an α yourself then you pick lower α when the consequences of a Type I error are more severe, and you pick higher α when the consequences of a Type I error are less severe.
α = 0.05 is the most common choice, especially in a business context. α = 0.01 and α = 0.001 are less common, but one application is research involving human medicine.
(2) α = 0.05
Always write α as a decimal, never as a percentage.
This step isn’t numbered, because it can occur at different places in the sequence. For Cases 0 through 4, you can check requirements at this point; for Cases 5 through 7, it’s easier to check requirements after calculating the p-value in Steps 3–4.
(RC) n > 30, therefore the sampling distribution is ND.
For the specific requirements by data type, see Inferential Statistics: Basic Cases. Caution: test requirements for Case 2 by using po, the number specified in the hypotheses; for Case 5, use p̂, the blended proportion on your TI-83/84 output screen.
Remember, a hypothesis test decides whether the evidence contradicts the null hypothesis. The test statistic (z, t, or χ²) measures how big the difference is between H0 and the actual data. Then the p-value says how unlikely it is to get the data you got, if H0 is actually true.
Your book treats this as two steps, but the TI-83/84 calculator does both at the same time. Show your work: write down the screen name, all inputs, and all outputs that don’t duplicate inputs.
See also: What Does the p-Value Mean?
Continuing with the example:
(3–4) Z-Test: μo=200, σ=45, x̅=189.56, n=100, μ≠μo
outputs: z=−2.32, p=0.0203
By convention, we always round the test statistic to two decimal places and the p-value to four. Caution! Watch for powers of 10 (E minus whatever) and never write something daft like “p-value = 5.6212”.
Though not required, it can be helpful to sketch the sampling distribution, as shown. Label the axis x̅ or p̂ as appropriate. Show the number from H0 at the center of the axis, and show the sample statistic an appropriate distance to left or right. Shade the proper region(s) and give the p-value above, not on the axis.
If you wish, you can show the z or t axis in addition to the x̅ or p̂ axis. In this case, make sure you have the proper numbers on the proper axes.
There are two and only two correct things you can say here:
p < α. Reject H0 and accept H1
or
p > α. Fail to reject H0.
These are standard language, so don’t get creative. You can add the numbers, if you like — p < α (0.021<0.05) — but the symbols are required.
(5) p < α. Reject H0 and accept H1.
If p < α, this sample is too unusual; it’s further away from H0 than we can expect from random chance; the data cast too much doubt on H0. We say that the result is statistically significant.
Write your conclusion in English, mentioning the significance level. If you failed to reject H0, write your conclusion in neutral language. Please see Proper Conclusions to Your Hypothesis Tests.
When you reject H0 and accept a two-tailed H1, you can draw a further conclusion than just “different from”. See p < α in Two-Tailed Test: What Does It Tell You?.
(6) At the 0.05 level of significance, the true mean of all deposits is different from $200 and management’s figures are incorrect. In fact, the true mean of all deposits is less than $200.
You should understand Type I and Type II errors. A Type I error is rejecting H0 when it’s actually true, and a Type II error is failing to reject H0 when it’s actually false. If H0 is “the defendant is innocent”, then a Type I error would convict an innocent person and a Type II error would let a guilty person go free.
These are not errors in the sense of mistakes, but they do represent incorrect results. If your significance level (Step 2) is 0.05, and you do everything right, still in the long run you’ll reject a true H0 about one time in twenty (0.05 = 1/20). The problem, of course, is that you don’t know which one out of twenty.
In homework problems and on quizzes and exams, not unless the problem specifically asks for it.
In real life, including the ESP Lab and Field Project, a CI is often a good idea. If you rejected H0 and accepted H1, the CI tells you the size of the effect, which can help to gauge whether the statistically significant result is also practically significant. On the other hand, if you failed to reject H0, you can’t draw a conclusion but at least with a CI you can get some idea of the maximum size of the effect.
If you do a CI, what’s the appropriate confidence level? Unless you have a specific reason to choose a different level, it would be 1 minus the two-tailed α, expressed as a percent. If you did a one-tailed test (> or <) at the .01 significance level, the two-tailed α would be .02 and the corresponding confidence level would be 1−.02 = 98%.
This page is used in instruction at Tompkins Cortland Community College in Dryden, New York; it’s not an official statement of the College. Please visit www.tc3.edu/instruct/sbrown/ to report errors or ask to copy it.
For updates and new info, go to http://www.tc3.edu/instruct/sbrown/stat/