Hypothesis Test by Confidence Interval
Copyright © 2011–2012 by Stan Brown, Oak Road Systems
Copyright © 2011–2012 by Stan Brown, Oak Road Systems
A 95% CI is the flip side of a 0.05 two-tailed HT. More generally, a 1−α CI is the flip side of an α two-tailed HT.
H0 always contains a number. If that number is inside the CI, then H0 is consistent with the sample data and you fail to reject H0. If that number is outside the CI, then H0 is not consistent with your sample and you reject H0 and accept H1.
Applies to: Numeric and binomial data, for one or two populations. It doesn’t apply to the other cases because we don’t know how to calculate confidence intervals for those cases.
Here’s an example with numeric data.
A machine is supposed to be turning out something with a mean
value of 100.00 and standard deviation of 6.00, and you take a random sample
of 36 objects produced by the machine. Suppose your sample mean is
98.00; then your 95% confidence interval is 96.04 to 99.96. (Verify
this by doing a ZInterval yourself.)
Now, can you make any conclusion about whether the machine is working properly? Well, you’re 95% confident that its true mean output is somewhere between 96.04 and 99.96. That means there’s only a 5% chance that the true mean is outside those values, and less than a 5% chance that the true mean is as large as 100.00. 5% is 0.05, so you conclude that, within a 0.05 significance level, the machine is not behaving properly.
Let’s examine the possibilities a bit more systematically. You’re taking just one sample of 36, but consider the implications of some possible sample means:
| 95% confidence interval | HT for μ≠100, α=0.05 | |||
|---|---|---|---|---|
| x̅ | interval | consistent with μ=100? |
p-value | consistent with μ=100? |
| 97.00 | 95.04 to 98.96 | NO | 0.0027 | NO |
| 98.00 | 96.04 to 99.96 | NO | 0.0455 | NO |
| 98.04 | 96.08 to 100.00 | borderline | 0.0500 | borderline |
| 98.10 | 96.14 to 100.06 | YES | 0.0574 | YES |
| 99.00 | 97.04 to 100.96 | YES | 0.3173 | YES |
What do you see here? When μo (hypothetical population mean from H0) is outside the 95% CI, the p-value is less than 0.05. This is the reason we say the symbol for the confidence level is 1−α, because of this very correspondence between a two-tailed HT at significance level α and a CI at confidence level 1−α.
You can generalize this to any two-tailed test at any significance level:
Here’s an example from De Veaux, Velleman, and Bock, Intro Stats (Pearson Addison Wesley, 2009), page 541.
The baseline seven-year risk of heart attacks for diabetics is 20.2%. In 2007 a NEJM study reported a 95% confidence interval equivalent to 20.8% to 40.0% for the risk among patients taking the diabetes drug Avandia.
What did this confidence interval suggest to the FDA about the safety of the drug?
The FDA is 95% confident that the true risk of heart attack with Avandia is 20.8% to 40.0%. But that entire range is above the baseline rate of 20.2%. Therefore, the FDA is at least 95% confident that there is an increased risk over the baseline with Avandia. You don’t have a p-value, but if you computed it (from po=.202, x=27, n=89) you’d find it was less than 0.05. You’re at least 95% confident that the risk is increased, because if the true probability was 20.2% there’s be less than a 0.05 chance of getting the sample they got.
Think of it this way. The researcher who computes the 95% CI for the risk is saying, “I don’t know the true risk exactly, but I’m 95% confident it’s between 20.8% and 40.0%. I can say that that interval is consistent with the sample I got. If the true risk was outside that interval, then the sample I actually got (27 out of 89) would be less than 5% likely to occur. But my sample did occur. That tells me that at the 5% or 0.05 significance level my sample is inconsistent with any risk proportion outside that CI, including the baseline of 20.2%.”
Good question! A one-tailed test addresses only one direction, so α = 0.05 for a one-tailed test is equivalent to α = 0.10 for a two-tailed test, which matches up with a 90% CI, not a 95% CI. In general, an alpha for a one-tailed test is effectively doubled for a two-tailed test.
Therefore, a CI is equivalent to a one-tailed HT if you first double your α and then subtract from 1.
The principle in the box works for Cases 0, 1, and 2, both numeric and binomial data. For numeric data, the CI and HT are exactly equivalent, as I showed you with the table of CI and p-values for the machine.
But for binomial data, the CI and HT are only approximately equivalent. The reason is that with binomial data, the HT uses a standard error derived from po in the null hypothesis, but the CI uses a standard error derived from p̂, the sample proportion. Since they use different standard errors, right around the borderline they might get different answers. But when the hypothetical po is a good distance outside the CI, as it was in the drug example, the p-value will definitely be less than α.
Next week you’ll make inferences about two populations, Cases 3, 4, and 5. It’s only variations on this week’s theme; just remember that you’ll be testing a difference between two populations:
One-tailed versus two-tailed tests are the same deal as with one population.
Here again, the principle in the box is exact for numeric data but approximate for binomial data. The reason is the same: HT and CI use the same standard error for the numeric data cases, but different standard errors for two-population binomial data.
This page is used in instruction at Tompkins Cortland Community College in Dryden, New York; it’s not an official statement of the College. Please visit www.tc3.edu/instruct/sbrown/ to report errors or ask to copy it.
For updates and new info, go to http://www.tc3.edu/instruct/sbrown/stat/