TC3 → Stan Brown → Statistics → Sample Size
revised Jul 27, 2008

How Big a Sample Do I Need?

Copyright © 2007–2008 by Stan Brown, Oak Road Systems

Summary: 

When you estimate a population parameter, you compute a confidence interval after taking sample data. Based on the confidence level that you preselect, and characteristics of your sample or population, you compute a margin of error.

But before you perform the study, how can you decide how big a sample you need so that your confidence interval will have your desired margin or error or less?

The answer is that you take the formula for the margin of error, rearrange it algebraically to solve for the sample size, compute, and round up. This page shows the formulas for some common cases, with examples.

Contents: 

See also: 

Inferential Statistics Cases
For more advanced treatment of more cases:

Case 0: One population mean, known σ

If you know the standard deviation σ of the population, and you want to estimate the mean μ to within a given margin of error E in a 1−α confidence interval, here’s how to find the required sample size n:

E equals z of alpha over 2, times sigma over square root of n transforms to n equals z of alpha over 2, times sigma, over E, all squared

Example 1: You want to estimate the average hourly output of a machine to within ±1.5, with 90% confidence. Based on historical data, you have reason to believe that the standard deviation of the machine’s hourly output is 6.2. How large a sample do you need?

Solution: Note first that this is not a realistic situation. It’s pretty unlikely that you would know the standard deviation of a population but not know the mean of that population. However, statistics texts always begin with this case because it’s the simplest way to demonstrate the principles. You leave Perfectland and enter Realityville in the other cases. With that said—

CommentsComputation
It’s good practice to start any problem by writing down what you know and what you need, with symbols. Given: E = 1.5, σ = 6.2, 1−α = 0.90.
Wanted: sample size n
The formula wants zα/2. How do you compute it? Begin by finding α/2. 1−α = 0.90 ⇒ α = 0.10 ⇒ α/2 = 0.05.
Since α/2 = 0.05, zα/2 = z0.05.
zrtail is the critical z, or the z score that divides the normal curve leaving a right-hand tail with an area of rtail. You compute it on your TI-83/84/89 by using invNorm. (See z Function (Critical z).) z0.05 = invNorm(1−0.05) ≈ 1.6449
Now you have all the pieces. Don’t use the rounded value of zα/2, but use [2nd (-) makes ANS] to keep full precision. n = [ zα/2 × σ ÷ E ]² = (ANS×6.2÷1.5)² = 46.2227... → 47

Answer: Given a population standard deviation of 6.2 units per hour, if you have a sample size ≥47 the margin of error in a 90% confidence interval will be ≤1.5 units per hour.

Why do we round up? After computing 46.2227, why not report a sample size of 46? Well, the computation shows that a sample size of exactly 46.2227... would give a margin of error of exactly 1.5. If you go slightly lower, to 46, the margin of error will be slightly higher than 1.5. Since the sample size must be a whole number, 46 or 47, and your margin of error must not exceed 1.5, you have to choose the slightly higher number 47, which will give a margin of error slightly less than 1.5.

Case 1: One population mean, unknown σ

Note:  Many basic statistics courses skip the material in this section and estimate sample sizes using a z distribution, so the material in this section might be an advanced extra for you. Check your course requirements.

This is the realistic case for estimating a population mean. Usually you don’t know the standard deviation of the population, so you have to use Student’s t disribution instead of the normal (z) distribution. You estimate the standard deviation of the population from the standard deviation of a sample obtained in a prior study or a small pilot study. Here is the formula for sample size:

E equals t of df comma alpha over 2, times s over square root of n transforms to n equals t of df comma alpha over 2, times s, over E, all squared

There’s a certain element of Catch-22 in this formula for n. You don’t know n, so you don’t know the degrees of freedom df either and you can’t compute the critical t for the formula. How do you get around this?

Use what NIST/SEMATECH calls an iterative method. First compute the formula using zα/2 instead of t. Then, when you have a preliminary sample size determined by (ab)using z in this way, recompute the formula using that sample size minus 1 for df. The two numbers should not be very different, since t is generally not very different from z; but if they are, you can use the second number to compute t once again.

See also:  Sample Sizes Required in NIST/SEMATECH e-Handbook of Statistical Methods (link verified 2007-08-23): scroll down to “More often we must compute the sample size with the population standard deviation being unknown”

Let’s illustrate the method using a modified form of the previous example.

Example 2: You want to estimate the average hourly output of a machine to within ±1.5, with 90% confidence. A small pilot study finds a sample standard deviation of the machine’s hourly output is 6.2. How large a sample do you need?

Solution: Use z instead of t to make a preliminary estimate, then recompute with t.

CommentsComputation
Marshal your data. Given: E = 1.5, s = 6.2, 1−α = 0.90.
Wanted: sample size n
The formula wants tdf,α/2, but we approximate with zα/2. Begin by finding α/2. 1−α = 0.90 ⇒ α = 0.10 ⇒ α/2 = 0.05.
Since α/2 = 0.05, zα/2 = z0.05.
z0.05 is the critical z score that divides the normal distribution such that the area of the right-hand tail is 0.05, and therefore the area of the left-hand tail is 1−0.05. (See z Function (Critical z).) z0.05 = invNorm(1−0.05) ≈ 1.6449
Now you have all the pieces you need for the preliminary sample size. Don’t use the rounded value of zα/2, but use [2nd (-) makes ANS] to keep full precision. n = [ zα/2 × σ ÷ E ]² = (ANS×6.2÷1.5)² = 46.2227... → 47
Your preliminary sample size is 47, and next you use that to compute t. df = n−1 = 46, so you need t46,0.05. See Critical t on the TI-83/84/89 for this computation. t46,0.05 = 1.67866
Now recompute the formula using the t value. Remember, always round sample size up. n = [ tdf,α/2 × σ ÷ E ]² = (ANS×6.2÷1.5)² = 48.142... → 49

Answer: Given a sample standard deviation of 6.2 units per hour, if you have a sample size ≥49 the margin of error in a 90% confidence interval will be ≤1.5 units per hour.

Remark: The sample size of 49 is a bit larger than the Case 0 sample size of 47. This makes sense. When you don’t know the standard deviation of the population, you have to use the t distribution. Student’s t is more spread out than z, so the confidence intervals are a bit wider, so you have to use a lerger sample to keep the confidence interval to the same width.

Case 2: One population proportion

For binomial data with true proportion p, the population standard deviation is σ = √(p(1−p)). Even though you don’t know p, the value (1−) from your sample will be quite close to the true value p(1−p) in the population, because the product p(1−p) doesn’t vary much as p varies.

Therefore you can use a z function, and the formulas are the same as Case 0 with √p(1−p) substituted for σ:

E equals [z of alpha over 2] times square root of (p times 1 minus p, all over n) transforms to n=p times 1 minus p times square of z of half alpha, over E squared

is your prior estimate for p. This may look like cheating, but it’s not because p(1−p) varies a lot less than p on its own. For instance, suppose the true population proportion is 45% but your estimate is 35%. True p(1−p) is 0.45×0.55 = 0.2475, and your estimate is 0.35×0.65 = 0.2275. The difference between 0.2475 and 0.2275 is a lot less than the difference between 0.45 and 0.35.

If you don’t have any credible estimate, use  = (1−) = 0.5. This is the conservative procedure because the product (1−) takes its highest value when  = 0.5. The conservative procedure may give you a sample size larger than necessary, but you can be sure your sample won’t be too small, forcing you to throw out your survey and start over.

Example 3: What percent of the voters would vote for your candidate if the election were held today? You want 95% confidence in your answer, with a margin of error no more than 3.5%. Last month’s poll showed your candidate had 42% support. How many voters do you need to survey?

CommentsComputation
Marshal your data. Caution! 3.5% is 0.035 not 0.35. Given: 1−α = 0.95, E = 0.035,  = 0.42
Wanted: sample size n
To find zα/2, first find α/2. 1−α = 0.95 ⇒ α = 0.05 ⇒ α/2 = 0.025
zα/2 = z0.025, the critical z for a right-hand tail area of 0.025. See z Function (Critical z). z0.025 = invNorm(1−0.025) = 1.9600
Now you have all the pieces. Don’t use the rounded value of zα/2, but use [2nd (-) makes ANS] to keep full precision. Remember, always round sample size up. n = (1−) ( zα/2 ÷ E )² = 0.42×0.58×(ANS÷0.035)² = 763.9015... → 764

Answer: To find an 95% CI with a margin of error no more than ±3.5 percentage points, where the true population proportion is around 42%, you must survey ≥764 people.

Example 4: Suppose you’re planning your first poll, and you have no idea of your candidate’s level of support. How big a sample would you need to be sure of a margin of error no more than 3.5% in a 95% CI?

Solution: Compute zα/2 = 1.9600 as in the previous example. But this time use  = 0.5 since you have no estimate for p.

n = (1−) ( zα/2 ÷ E )² = 0.5×0.5×(ANS÷0.035)² = 783.971... → 784

Answer: To find an 95% CI with a margin of error no more than ±3.5 percentage points, where you have no idea of the true population proportion, you must survey ≥784 people.

Case 5: Difference of two population proportions

When you’re comparing two population proprtions, it’s perfectly legitimate to have different-sized samples. The formula for margin of error, below left, is just an extension of the formula for one population proportion.

But when you’re planning sample size, you can’t solve one equation for two variables n1 and n2. (If you had a reason to choose some particular value for one of them, you could solve for the other one.) You can solve for sample size if you decide to use the same size for both samples.

E equals [z of alpha over 2] times square root of fraction p1 times 1 minus p1 over n1 plus fraction p2 times 1 minus p2 over n2 transforms to n, n1, and n2 equal bracket, p1 times 1 minus p1 plus p2 times 1 minus p2, close bracket, times square of z of half alpha, over E squared

For the reasons given above, if you have any prior estimates for the population proportions p1 and p2 you should use them; otherwise use 0.5.

Example 5: You’d like to know how your candidate’s support differs between men and women. You know that overall support is 42%. How many of each sex must you survey to answer the question with 90% confidence and a margin of error no more than 3%?

CommentsCalculations
Marshal your data. (Caution! 3% is 0.03 not 0.3.) Given: 1−α = 0.90, E = 0.03
Wanted: sample size n=n1=n2
Do you have an estimate of p1 and p2? Yes, since the overall support is 42% you expect that men’s and women’s support is not too different from that. (You do expect p1 and p2 are somewhat different, or you wouldn’t be doing the survey. But remember from one population proportion that p(1−p) doesn’t vary much when p varies.) Prior: 1 = 0.42 and 1−1 = 0.58
2 = 0.42 and 1−2 = 0.58
Compute zα/2 in the usual way. 1−α = 0.90 ⇒ α = 0.10 ⇒ α/2 = 0.05
z0.05 = invNorm(1−0.05) ≈ 1.6449
Finish, using the unrounded value of z. Always remember to round sample sizes up. [1(1−1) + 2(1−2)] [zα/2÷E]² = (0.42×0.58+0.42×0.58)×(ANS÷0.03)² = 1464.60... → 1465

Answer: To find a 90% CI for the difference in your candidate’s support between men and women, with margin of error no more than 3%, you must survey at least 1465 men and at least 1465 women.

Remark: You might wonder why the samples must be so large. After all, to estimate one population proportion to ±3% in a 90% CI, with prior estimate  = 42%, a sample of 752 is enough. (Check it!) Why do you need over 2900 people in two groups for the same margin of error?

The answer is that it’s not the same margin of error. If you surveyed 752 men and 752 women you’d have confidence intervals of ±3% for each, but that’s an overall margin of error of ±6% — think that the true proportion might be near the bottom of one group’s interval and near the top of the other group’s, or vice versa. (It’s not quite that simple, but that’s the basic idea.) To bring down that margin of error, you have to increase the sample size.

Example 6: Let’s modify the previous example. Suppose you have reason to believe your candidate appeals more strongly to women, with a gap of about 10%? That means you estimate men’s support at 37% and women’s support at 47%. 1 = 0.37, 1−1 = 0.63, 2 = 0.47, 1−2 = 0.52. Your required sample size becomes

z0.05 = invNorm(1−0.05) ≈ 1.6449

[1(1−1) + 2(1−2)] [zα/2÷E]² = (0.37×0.63+0.47×0.53)×(ANS÷0.03)² = 1449.57... → 1450

Answer: For a 90% CI with margin of error ≤3%, when you think one population’s proportion is 37% and the other’s is 47%, you need a sample of at least 1450 from each group.


This page is used in instruction at Tompkins Cortland Community College in Dryden, New York; it’s not an official statement of the College. Please visit www.tc3.edu/instruct/sbrown/ to report errors or ask to copy it.

For updates and new info, go to http://www.tc3.edu/instruct/sbrown/stat/