TC3 → Stan Brown → Statistics → HT: Proper Conclusions
revised 24 Nov 2011

# Proper Conclusions to Your Hypothesis Tests

Summary: As a statistician, you have an ethical obligation to make your results as easy as possible to understand, and as hard as possible to misinterpret.

Avoid common errors when stating conclusions and interpreting them. Make sure you understand what you are doing, and explain it to others in their own language.

See also: What Does the p-Value Mean?

## When p < α, you reject H0 and accept H1.

If p < α, you have shown that your sample results were unlikely to arise by chance if H0 is true. The data are statistically significant.

You therefore reject H0 and accept H1. Since the sample you actually have is an unlikely sample if H0 is true, you conclude that H0 is most likely false. If H0 is most likely false, its opposite H1 is most likely true. You accept H1, but you don’t say you have proved it to a certainty. There’s always that p-value chance that the sample results could have occurred when H0 is true. That’s why we say we “accept” H1, not that we have “proved” H1.

Compare to a jury verdict of “guilty”. It means the jury is convinced that the probability (p) that the defendant is innocent is less than a reasonable doubt (α). It doesn’t mean there is no chance he’s innocent, just that there is very little chance.

Example:

Suppose your null is “the average package contains the stated net weight,” your alternative is “the average package contains less than the stated net weight,” and your significance level is 0.05.

If p = 0.0241, which is < α, you reject H0 and accept H1. You conclude “the average package does contain less than the stated net weight (p = 0.0241)” or “the average package does contain less than the stated net weight, at the 0.05 significance level.”

Don’t say the average package “might” be less than the stated weight or “appears to be” less than the stated weight. When you reject H0, state the alternative as a fact, within the stated significance level.

See also: p < α in Two-Tailed Test: What Does It Tell You? for one-tailed interpretation of a two-tailed test.

## When p > α, you fail to reject H0.

If p > α, you have shown that random chance could account for your results if H0 is true. You don’t know that random chance is the explanation, just that it’s a possible explanation. The data are not statistically significant.

You therefore fail to reject H0 (and don’t mention H1 in step 5). The sample you have could have come about by random selection if H0 is true, but it could also have come about by random selection if H0 is false. In other words, you don’t know whether H0 is actually true, or it’s false but the sample data just happened to fall not too far from H0.

Compare to a jury verdict of “not guilty”. That could mean the defendant is actually innocent, or that the defendant is actually guilty but the prosecutor didn’t make a strong enough case.

Example:

Suppose your null is “the average package contains the stated net weight,” your alternative is “the average package contains less than the stated net weight,” and your significance level is 0.05.

If p = 0.0788, which is > α, you fail to reject H0 and conclude “at the 0.05 significance level, it’s impossible to say whether average package contains less than the stated net weight” or “the data are insufficient to reach a conclusion about the average package contents, at the 0.05 significance level.”

Don’t say the average package “might” be anything, or “could” be anything. Don’t say “we can’t prove it’s under weight” or “we can’t prove it’s okay.” Both of those are true, but they’re only half the truth and they lead the reader to a wrong conclusion. (The most effective way to lie is to tell only part of the truth.)

Sometimes when p > α, people say “the data fail to disprove the null hypothesis” or “the data are not inconsistent with the null hypothesis”. In terms of the example, that would be “there’s insufficient evidence to show that the average package is under weight.” While technically correct, those forms are easily misinterpreted as “the null hypothesis is true”, and are not acceptable in this class. If p > α, you need to make a completely neutral statement that cannot easily be misinterpreted, such as “The data don’t lead to a conclusion either way.” (Your book, alas, is not always careful on this point.)

There’s an important matter of logic here. Lack of evidence against something is not evidence for it. You don’t have enough evidence to disprove H0, but you also don’t have evidence in favor of H0. To take a silly but colorful example, if you don’t tell me you are not a space monkey from the planet Zargop, that doesn’t mean you are!

If p > α, you don’t know whether the null hypothesis is actually true or false, within your stated significance level α.

Though you never accept H0, there’s one special circumstance you should know about. When your p-value is very large and your sample is large — or when yours is not the first experiment to yield a high p-value on the same hypotheses — you do begin to suspect that H0 is probably true. This is how science works: a high p-value on one experiment merely fails to disprove H0, but when the experiment is replicated multiple times and H0 is never rejected, it begins to look like it is true — always with that possibility of a later experiment overturning it after all.

In the classroom setting, though large p-values do come up occasionally, usually a large p-value means you made a mistake somewhere or that you did not pick hypotheses worth testing. For instance, if your H0 is that a coin comes up heads at least ¼ of the time, a few dozen flips will probably yield a p-value very close to 1.

This page is used in instruction at Tompkins Cortland Community College in Dryden, New York; it’s not an official statement of the College. Please visit www.tc3.edu/instruct/sbrown/ to report errors or ask to copy it.

For updates and new info, go to http://www.tc3.edu/instruct/sbrown/stat/