Confidence Intervals for Two Populations
Copyright © 2002–2013 by Stan Brown, Oak Road Systems
Copyright © 2002–2013 by Stan Brown, Oak Road Systems
Confidence intervals for two populations are easy enough to calculate on your TI-83. But one or both endpoints can be negative, and that means you have to write your interpretation carefully. Don’t just say “difference”; specify which population’s mean or proportion is larger or smaller. You must also distinguish between mean difference (for paired data) and difference in means (for unpaired data).
Study these examples of confidence intervals for two populations, and you’ll learn how to write your interpretations like a pro!
See also:
Triage: Which Inferential Stats Case Should I Use?
Inferential Statistics: Basic Cases
Page 425 of Johnson & Kuby’s Just the Essentials of Elementary Statistics 3/e presents an example of heights of randomly selected men and women at a college, and asks you to estimate the difference in average height as a 95% CI. (Men’s and women’s heights are normally distributed.)
| Sample | Mean, x̅ | Standard Deviation, s | Sample Size, n |
|---|---|---|---|
| Female, pop. 2 | 63.8" | 2.18" | 20 |
| Male, pop. 1 | 69.8" | 1.92" | 30 |
You have independent samples here: you get one number from each individual. The data type is numeric (height), so you have Case 4, difference of independent means.
With independent means, you check requirements for each sample separately.
All requirements for Case 4 are met.
The TI-83 computes μ1 − μ2, so you need to decide which will be population 1 and which will be population 2. I like to avoid negative signs, so unless there’s a good reason to do otherwise I take the sample with the larger mean as sample 1; in this case that’s the men.
Whichever way you decide, write it down: pop 1 = ________, pop 2 = ________.
On your calculator, press [STAT] [◄] and
scroll up or down to find 0:2-SampTInt. Enter the sample
statistics and use Pooled:No. Here are the input and output screens
:
Conclusion: With 95% confidence, the average man at that college is between 4.8″ and 7.2″ taller than the average woman, or μM−μF = 6.0″±1.2″. (You would probably present one or the other of those forms, not both.)
(6.0 is the difference of sample means and is the center of the confidence interval: x̅1−x̅2 = 69.8−63.8 = 6.0. Or, (4.7837+7.2163)/2 = 6.0.
Remark: The difference from the case of dependent means is subtle but important. With dependent means (paired data), the CI is about the average difference in measurements of a single randomly selected individual or matched pair. But with independent means (unpaired data), the CI is about the difference between the averages for two different populations.
Dabes & Janik’s Statistics Manual (1999) page 264 shows heart rate for a simple random sample of six subjects before and after drinking coffee:
| Person | 1 | 2 | 3 | 4 | 5 | 6 |
|---|---|---|---|---|---|---|
| Before | 78 | 64 | 70 | 71 | 70 | 68 |
| After | 83 | 66 | 77 | 74 | 75 | 71 |
You have numeric data, and you’re getting two numbers from each subject. Therefore this is Case 3, mean difference for paired data. (Before-and-after studies are classic examples of paired data.)
With Case 3, you check requirements on the d’s, not the original data. Define d as After−Before, and compute:
| Person | 1 | 2 | 3 | 4 | 5 | 6 |
|---|---|---|---|---|---|---|
| Before | 78 | 64 | 70 | 71 | 70 | 68 |
| After | 83 | 66 | 77 | 74 | 75 | 71 |
| d = A−B | 5 | 2 | 7 | 3 | 5 | 3 |
(Could you define d as Before−After? Certainly! Your d’s and your confidence interval would then all have the opposite signs from mine, but your written conclusion would be identical because you never describe negative differences when interpreting a CI. I usually define d so that I have as few negative signs as possible in the data, but that’s purely personal preference.)
Since the sample size is below 30, you need to use MATH200A part 2 to check for outliers and MATH200A part 4 to verify that the data are normally distributed. In fact there are no outliers, and the data are close enough to normal (r=.9638). SRS is stated in the problem, so all requirements are met.
To compute a 95% CI,
enter the d’s in L1, then press
[STAT] [◄] [8]. The input and output screens are shown
at right.
Conclusion: With 95% confidence, the mean increase in heart rate for all people after drinking coffee is between 2.2 and 6.1 beats per minute. (Notice that this is the mean difference μd, not a difference in means μA−μB. With paired data you are predicting the mean difference between two measurements taken from one randomly selected individual.)
Now let’s alter the data a bit to bring up a new concept. (The d’s are still normally distributed with no outliers.)
| Person | 1 | 2 | 3 | 4 | 5 | 6 |
|---|---|---|---|---|---|---|
| Before | 78 | 64 | 70 | 71 | 70 | 68 |
| After | 79 | 62 | 73 | 70 | 71 | 67 |
| d = A−B | 1 | −2 | 3 | −1 | 1 | −1 |
Notice that some heart rates declined after the people drank coffee. Now when you compute a 95% CI you get the results shown at right.
How should you interpret a negative endpoint in the interval? Remember that you are computing a CI for the quantity After−Before. You could follow the earlier pattern and say “With 95% confidence, the mean increase in heart rate for all individuals after drinking coffee is between −1.8 and +2.1 beats per minute,” but only a mathematician would love a statement that talks about an increase being negative. Instead, you draw attention to the fact that the change might be a decrease or an increase, as follows.
Conclusion: With 95% confidence, the mean change in heart rate for all individuals after drinking coffee is between a decrease of 1.8 and an increase of 2.1 beats per minute. Since it’s obviously very important to get the direction right, be sure to check your conclusion against your H1 (if any) and your original definition of d.
Remark 1: Though it’s correct to present the CI as a point estimate and margin of error, it’s probably not a good idea because that form is so easy to misinterpret. If you say “With 95% confidence, the mean increase in heart rate for all individuals is 0.2±1.9 beats per minute,” many people won’t notice that the margin of error is bigger than the point estimate, and they’ll come to the false conclusion that you have established an increase in heart rate after drinking coffee. As statistics mavens we have a responsibility to present our results clearly, so that people draw the right conclusions and not the wrong ones.
Remark 2: Remember that the CI occupies the middle of the distribution while the HT looks at the tails. If 0 is inside the CI, it can’t be in either tail. Therefore, from this confidence interval you know that testing the null hypothesis μd = 0 at the 0.05 level (0.05 = 1−95%) would fail to reject H0: this experiment failed to find a significant difference in heart rate after drinking coffee. (See Confidence Interval and Hypothesis Test (Two Populations).)
Remember the difference between “no significant difference found” and “no difference exists”. Since 0 is in the CI, you can’t say whether there is a difference. The correct statement, “I don’t know whether there is a difference,” is different from the incorrect “There is no difference.”
The following data are from Dabes & Janik’s Statistics Manual (1999) page 269. Men and women were polled in a systematic sample on whether they favored legalized abortion, and the results were as follows:
| Sample | Number in Favor, x | Sample Size, n |
|---|---|---|
| Females, pop. 1 | 60 | 100 |
| Males, pop. 2 | 40 | 80 |
Find a 98% confidence interval for the difference in level of support between women and men.
You have binomial data: each person either supports legalized abortion or not. (Obviously this example is oversimplified.) Binomial data with two populations is Case 5, difference of proportions.
For Case 5, you need to test requirements against each sample separately, not against the combined samples.
You need p̂1 and p̂2 for the tests. Usually it’s easier to let the calculator find p̂1 and p̂2 for you and then check requirements, but this time the numbers are so easy that there’s no need to wait. Support among the sample of women is 60/100 = 60%, and among the sample of men is 40/80 = 50%. So let’s define population 1 = women, population 2 = men, and therefore p̂1 = 0.6 and p̂2 = 0.5.
All requirements for a Case 5 CI are met.
On the TI-83 or TI-84, press [STAT] [◄] and
scroll up to find B:2-PropZInt. The input and output
screens look like this:
Two-population confidence intervals can be tricky to interpret, particularly when the two endpoints have different signs and particularly for Case 5, two population proportions. You can reason it out in words, or use algebra.
In words, remember that the confidence interval is the estimated difference p1−p2, which is the estimated amount by which the proportion in the first population exceeds the proportion in the second population. So a negative endpoint for your CI means that the first proportion is lower than the second, and a positive endpoint means that the first proportion is larger.
Using algebra, begin with the calculator’s estimate of p1−p2:
−0.0729 ≤ p1−p2 ≤ +0.27292 (98% conf.)
All p2 to all three parts of the inequality, and you have
p2−0.0.729 ≤ p1 ≤ p2+0.27292 (98% conf.)
That’s a little easier to work with. The 98% confidence bounds on p1 (level of women’s support) are p2−0.0729 (7.3% below men’s support) and p2+0.27292 (27.3% above men’s support).
Conclusion: You are 98% confident that somewhere between 7.3% fewer females than males, and 27.3% more females than males, support legalized abortion.
Remark: It would be equally valid to turn that around and say you’re 98% confident that somewhere between 27.3% fewer males than females, and 7.3% more males than females, support legalized abortion.
Johnson & Kuby’s Just the Essentials of Elementary Statistics 3/e presents another example on page 427. What is the difference (if any) in academic performance between fraternity members and nonmembers? Forty members of each population were randomly selected, and their cumulative GPA recorded as an indication of performance. The results were as follows:
| Sample | x̅ | s | n |
|---|---|---|---|
| Fraternity members, pop. 1 | 2.03 | 0.68 | 40 |
| Independents, pop. 2 | 2.21 | 0.59 | 40 |
Here you have numeric data, two independent samples. (You know it’s independent samples, unpaired data, because each member of the sample gives you just one number.) This is Case 4, difference of independent means.
Each sample was random, and each sample size is >30. All requirements for Case 4 are met.
The CI is −0.46 to +0.10, with 95% confidence.
To interpret this, remember that the TI-83 computes a CI for
μ1−μ2, and we defined population 1 as
fraternity and population 2 as independent. The calculator is
telling you that
−0.46 ≤ μ1−μ2 ≤ +0.10 (95% conf.)
or, adding μ2 to all three parts,
μ2−0.46 ≤ μ1 ≤ μ2+0.10 (95% conf.)
Conclusion: the true difference in academic performance, as measured by GPA, is somewhere between 0.46 worse and 0.10 better for fraternity members relative to nonmembers, with 95% confidence.
You could also write a somewhat longer form: with 95% confidence, the average fraternity member’s academic performance, as measured by GPA, is somewhere between 0.46 worse and 0.10 better than the average independent’s performance.
Remark: Don’t be fooled by the fact that the CI is mostly below zero. You really cannot conclude that fraternity members probably have lower academic performance. Remember that the 95% CI is the result of a process that captures the true population mean (or difference, in this case) 95 times out of 100. But you can’t know where in that interval the true mean (or difference) lies. If you could, there would be no point to having a CI!
Remark 2: Even though zero is within the CI, you must not say that there is no difference in performance between members and nonmembers. The difference might indeed be zero, but it might also be anywhere between 0.46 in one direction and 0.10 in the other. There’s even a 5% chance that the true difference lies outside those limits. Always bear in mind the difference between insufficient evidence for and evidence against. (You may hear that said as “lack of evidence for is not evidence against.)”
FLOAT
setting. (Previously, they all had four decimal places.)This page is used in instruction at Tompkins Cortland Community College in Dryden, New York; it’s not an official statement of the College. Please visit www.tc3.edu/instruct/sbrown/ to report errors or ask to copy it.
For updates and new info, go to http://www.tc3.edu/instruct/sbrown/stat/