Practice Problems for Statistics
Copyright © 2002–2012 by Stan Brown, Oak Road Systems
Summary:
These are practice problems to help you prepare for the final
exam.
Solutions are provided, but make a
genuine effort to work any given problem on your own before you turn
to the solution.
Don’t Panic!: This sheet is much longer than the exam
will be, and some problems are harder than the problems you will meet
on the exam.
How to use: Don’t necessarily make
it your goal to work every problem. But do at least look at every one
and make sure that you can set it up correctly. Your success on the
final exam hinges on your ability to identify which type of problem
you are facing.
See also:
Review Guide for MATH200, Statistics
Section A: Concept Questions
Write your answer to each question. There’s no work to
be shown. Don’t bother with a complete sentence if you can
answer with a word, number, or phrase.
1Two events A and B are disjoint. Is it
possible for those same events to be independent as well? Give an
example, or explain why it’s impossible.
2Gasoline pumped from a supplier’s
pipeline is supposed to have an octane rating of 87.5. To test this, a
random
sample was taken on 13 consecutive days and the octane measured in a
lab.
(a) The data would best
be analyzed as an example of
A. one population proportion
B. two populations, difference in proportions
C. one population mean
D. two populations, difference in means, paired data
E. two populations, difference in means, unpaired data
F. goodness of fit
G. contingency table
(b) Which two tests must you perform on your sample data before
doing the analysis mentioned above? (In other words, how would you
make sure that the sample meets the requirements?)
3The two main types of data are qualitative and quantitative.
Give the shorter name for each, and give an example of each.
4The probability of rolling a 6 on an honest die is 1/6. If you
roll an honest die ten times and none of the rolls comes up 6, is the
probability of rolling a 6 on the next roll less than 1/6, equal to
1/6, or greater than 1/6? Explain why.
5In a large elementary school, you select two age-matched groups
of students. Group 1 follows the normal schedule. Group 2 (with
parents’ permission) spends 30 minutes a day learning to play a
musical instrument. You want to show that learning a musical
instrument makes a student less likely to get into trouble. You
consider a student in trouble if s/he was sent to the principal’s
office at any time during the year.
(a) Write your hypotheses, in symbols.
(b) Identify either the case number or the specific
TI-83 test you would use.
6Imagine rolling five standard dice. You compute the probability
of rolling no 3s, one 3, and so on up to five 3s. Is this a binomial
probability distribution? With reference to the definition of a
binomial PD, why or why not?
7Over the course of many statistical experiments, which one of
these values for the significance level would enable you to prove the
most results?
A. 5% B. 1% C. 0.1%
D. Significance level has no effect
on how likely you are to prove a hypothesis.
8A key step in hypothesis testing is computing a p-value and
comparing it to your preselected α. After you do that, which of the
following conclusions would be possible, depending on the specific
values of p and α?
(Write the letter of each correct answer; there may
be more than one.)
A. Accept H0, reject H1
B. Reject H0, accept H1
C. Fail to accept H0, no conclusion
D. Fail to reject H0, no conclusion
9 Distinguish disjoint events, mutually exclusive events,
and complementary events. Give an example of each.
10 When is a histogram an appropriate graphical method of
presentation?
11For what type of events does P(A or B) =
P(A) + P(B)? Give an example.
12In a χ² goodness-of-fit test, which of the
following is/are true?
(A question with this many technical alternatives will not be
on the exam. Just use it to test your own understanding of χ².)
A. The hypotheses are stated in words rather than relating some
population parameter to a number.
B. The null hypothesis is always some variation on “the observed
sample matches the model reasonably closely.”
C. The alternative hypothesis is always some variation on “our model
is good.”
D. Instead of a p-value, we compare the value of χ²
to α to draw a conclusion.
E. Degrees of freedom equals the number of cells in our model.
F. If the difference between our observed results and our expected
results could likely have occurred by random chance, we reject the null
hypothesis.
13What are the two types of numeric data called? Explain the
difference, and give an example of each.
14Suppose the null hypothesis is that a machine is producing
the allowed 1% proportion of defectives
(H0: p = 0.01).
Your experiment could end
in one of several conclusions, depending on your sample data. List
the letters of all possible conclusions from those below.
(The actual conclusion would depend on α, the choice
of H1, and the calculated p-value. Not all possible conclusions
are listed below.)
A. The machine is producing exactly the acceptable proportion of defectives.
B. The machine is producing no more defectives than acceptable.
C. The machine is producing too many defectives.
D. Unable to prove anything either way.
15How can you avoid making a Type I error in a hypothesis
test?
16Which one or more of the following describe the p-value in an
experiment?
A. the probability of a correct decision
B. the probability that we are right to reject the null hypothesis
C. the probability that rejecting the null hypothesis is an error
D. the probability that our sample results could have been obtained by random selection if the null hypothesis is true
E. the probability of a Type I error
F. the probability of a Type II error
17Data are gathered and a computation is done to answer the
question “As near as we can tell, how much does the average high-school
student spend on lunch?” This computation would be part of
A. hypothesis test
B. sample size
C. confidence interval
D. none of the above
18Linear correlation coefficients must lie between what two
values? What value indicates “no linear correlation”? Does this mean
no correlation at all?
19“Four out of five dentists surveyed recommend Trident
sugarless gum for their patients who chew gum.” Which of these is the
correct symbol for “four out of five dentists surveyed”?
μ
π
σ
p
p̂
po
x
x̄
s
20A poll concludes that 26.9% of TC3 students are satisfied with
the food service. What is the type of the original data gathered?
21For what sort of data would you typically prefer a pie chart? Why?
22 The mean is usually the best measure of center of numerical
data. But under certain circumstances the mean is not representative
and you prefer a different measure of center. Which circumstances, and
which measure of center?
23Usually you make what you want to prove the alternative
hypothesis, not the null hypothesis. Why?
24A company wishes to claim,
“People who eat
our shredded wheat for breakfast every day for a month lose more than
ten points on their cholesterol.” One or more of the following
state the null and alternative hypotheses correctly. Which one(s)?
A. H0 > 10
H1 ≤ 10
B. H0: x̄ > 10
H1: x̄ ≤ 10
C. H0: μ > 10
H1: μ ≤ 10
D. H0: x > 10
H1: x ≤ 10
|
E. H0 = 10
H1 > 10
F. H0: x̄ = 10
H1: x̄ > 10
G. H0: μ = 10
H1: μ > 10
H. H0: x = 10
H1: x > 10
|
I. H0 ≤ 10
H1 > 10
J. H0: x̄ ≤ 10
H1: x̄ > 10
K. H0: μ ≤ 10
H1: μ > 10
L. H0: x ≤ 10
H1: x > 10
|
25Which of the following is a Type I error?
A. failing to reject the null hypothesis when it is true
B. failing to reject the null hypothesis when it is false
C. rejecting the null hypothesis when it is true
D. rejecting the null hypothesis when it is false
26Compare an experiment and an observational study.
27Our symbol for level of confidence in a confidence interval is
α
α/2
1–α
z(α/2)
E
(If none of these, supply the correct symbol.)
28You gather a random sample of selling prices of
2006 Honda Civics.
Which selection on your TI-83 would be used to test the claim “In
the U.S., 2006 Honda Civics sell, on average, for more than
$2,000”?
A. Z-Test
B. T-Test
C. 1-PropZTest
D. 1-PropTTest
E. χ²-Test
F. none of these
29Compare descriptive and inferential statistics, and give an
example of each.
30You find that your maximum error of estimate (margin of
error) is ±3.3 at a confidence level of 95%. At 90% confidence,
what would be the maximum error of estimate?
A. more than 3.3
B. 3.3
C. less than 3.3
D. can’t say without more information.
31Compare “sample” and “population”; give an example.
32You take a random sample of Lamborghini owners and a random
sample of Subaru owners. Which selection on your TI-83 would be used to
answer the question “How much more do Lamborghini owners spend per
year on maintenance than Subaru owners?”
A. ZInterval
B. TInterval
C. 2-SampZInt
D. 2-SampTInt
E. 2-PropZInt
F. none of these
33You believe that more than 25%
of high-school students experienced strong peer pressure to have sex. To
test this belief, you survey 500 randomly selected graduating seniors
nationwide and find that 150 of them say that they did feel such
pressure.
(a) The data would best
be analyzed as an example of
A. one population proportion
B. two populations, difference in proportions
C. one population mean
D. two populations, difference in means, paired data
E. two populations, difference in means, unpaired data
F. goodness of fit
G. contingency table
(b) Which two tests must you perform on your sample data before
doing the analysis mentioned above? (In other words, how would you
make sure that the sample meets the requirements?)
Section B. Problems
Show your work for all problems. Round probabilities to four
decimal places and test statistics (t, z, χ²) to two.
For hypothesis tests, check requirements and show all six numbered
steps.
Red die | White die | Red die total |
| 1 | 2 | 3 | 4 | 5 | 6 |
| 1 | 547 | 587 | 500 | 462 | 621 | 690 | 3407 |
| 2 | 609 | 655 | 497 | 535 | 651 | 684 | 3631 |
| 3 | 514 | 540 | 468 | 438 | 587 | 629 | 3176 |
| 4 | 462 | 507 | 414 | 413 | 509 | 611 | 2916 |
| 5 | 551 | 562 | 499 | 506 | 658 | 672 | 3448 |
| 6 | 563 | 598 | 519 | 487 | 609 | 646 | 3422 |
White die total | 3246 | 3449 | 2897 | 2841 | 3635 | 3932 | 20000 |
34
Skip this problem: it uses techniques we did not
study.
In 1850, the Swiss astronomer Wolf rolled
two dice 20,000 times to determine
whether they were biased. His data are shown at right. (For example,
there were 2841 rolls when the white die came up 4. There were 611
rolls when the white die came up 6 and the red die came up 4.)
(a) What is P(2 on red | 4 on white)?
(b) What is P(5 on white and 1 on red)?
(c) What is P(5 on white or red)?
(d) At the 0.05 significance level, is the white die biased?
(Hint: what would you expect if the white die is not biased?)
35You are testing the assertion, “Judge Judy is more
friendly to plaintiffs than Judge Wapner was.” Since it would be
tedious to tabulate the hundreds or thousands of decisions each judge
has handed down, you randomly select 32 of each judge’s decisions.
Judge Judy’s average award to plaintiffs was $650 (standard
deviation = $250) and Judge Wapner’s was $580 (standard
deviation = $260).
Assume that the amounts are normally distributed without outliers.
Using a significance level of 0.05, can you
conclude that Judge Judy does indeed give higher awards on
average?
36Weights of frozen turkeys at one large market were normally
distributed with a mean of 14.8 pounds and a standard deviation of 2.1
pounds. If there were 10,000 turkeys in the market, how many choices
would a shopper have who wanted a bird 20.5 pounds or larger? (Hint:
begin by figuring the percentage or proportion of turkeys in that
weight range.)
37(from Johnson & Kuby’s Just the
Essentials of Elementary Statistics 2/e
problem 9.26) “The addition of a new accelerator is claimed to
decrease the drying time of latex paint by more than 4%. Several test
samples were conducted with the following percentage decrease in
drying time:
“5.2 6.4 3.8 6.3 4.1 2.8 3.2 4.7
“If we assume that the percentage decrease in drying time is
normally distributed”
(a) Test the claim, at the .05 level.
(b) “Find the 95% confidence interval for the true mean decrease
in the drying time based on this sample.”
3828% of a certain breed of rabbits are born with long hair.
Assume that the distribution is random, and consider a litter of five
rabbits.
(a) What is the probability that none of the rabbits in the
litter have long hair?
(b) What is the probability that one or more in a litter have
long hair?
(c) What is the probability that four or five of them have long
hair?
(d) What is the average number (mean) of long-haired rabbits
you expect in a litter of five?
39An aptitude test is known to have a mean score of 37.5 with
standard deviation of 3.5. A company administers this test to
applicants, and requires a standard score of z = 1.5 or
better. For Jane to be considered, she needs at least what test score
on the aptitude test?
40A survey asked a number of professionals, “Which of the
following is your most common choice for breakfast?” Using the
following data from a random survey, determine whether doctors choose breakfasts
in different proportions from other self-employed professionals, to
a .05 significance level.
Cereal Pastry Eggs Other No bfst Total
Doctors 85 22 47 60 17 231
Others 185 90 160 135 35 605
Total 270 112 207 195 52 836
41 Suppose that the mean adult male height is
5′10″ (70″) and the standard
deviation is 1.4″.
(a) If a particular man’s z-score is −1.2,
what is his actual height to the nearest 0.1″?
(b) Using the Empirical Rule, what percentile is a height of
68.6″?
(c) By the empirical rule, what proportion of adult men are
shorter than 72.8″?
| life, hr | count |
| 500–650 | 6 |
| 650–800 | 18 |
| 800–950 | 60 |
| 950–1100 | 89 |
| 1100–1250 | 29 |
| 1250–1400 | 17 |
42The length of life of a random sample of incandescent
light bulbs was obtained, and the results are in the table at
right.
(a) Plot a histogram of the data.
(b) What is the size of the sample, with its proper symbol?
(c) What are the mean, standard deviation, and
variance? (Use the proper symbols and round to one decimal place.)
(d) What is the relative frequency of the 1100–1250
class?
43One way to set speed limits is to observe a random sample of
drivers. The speed limit is set at the 85th percentile, which is the
speed such that 85% of drivers are going slower and 15% are going
faster. What speed corresponds to that 85th percentile, assuming
drivers’ speeds are normally distributed with
μ = 57.6 and σ = 5.2 mph?
44You’re planning a survey to see what fraction of people who
live in Virgil would take the bus if the county added a route between
Greek Peak and downtown Cortland via routes 392 and 215.
(a) You think the
answer is only about 20% of them. If you need 90% confidence in an
answer to within ±4%, how many people will you need to
survey?
(b) What if you have no idea of the answer? How
many would you need to survey then?
45A 1992 study showed the mean cost for all homes in Sassafras
County to be $70,000 with standard deviation $5,500.
This year, you survey 35 randomly selected homes
that were sold, and you find a mean of $72,050.
(a) Compute the value of the test statistic for the mean
of this sample.
(b) Compute the value of P(x̄ ≥ 72,050),
the probability of getting a sample mean this large or larger, if the
mean price for all Sassafras County houses is still $70,000.
(c) At the .05 level, has the mean price of a Sassafras County house
increased since the study was done? (Use your answer from (b); you
don’t need to do the full hypothesis test.)
46
Some popular fast-food items were compared for calories and fat, and
the results are shown below:
| Calories (x) |
270 |
420 |
210 |
450 |
130 |
310 |
290 |
450 |
446 |
640 |
233 |
| Fat (y) |
9 |
20 |
10 |
22 |
6 |
25 |
7 |
20 |
20 |
38 |
11 |
(a) Make a scatter plot on your
TI-83. Do you expect a positive, negative, or zero correlation?
Why?
(b) Find the correlation coefficient and the equation of the
line of best fit and write them down. Round to four decimal places and
use proper symbols.
(c) Give the value of the y intercept and interpret its
meaning.
(d) Using the regression equation or your TI-83 graph, how
many grams of fat would you predict for an item of 310 calories?
Explain why this is different from the actual data point (310
calories, 25 grams).
(e) What is the value of the residual for the data point
(310,25)?
(f) What is the value of the coefficient of determination in
this regression? What does it mean?
(g) The decision point for n = 11 is 0.602. What if
anything can you say about the correlation for all fast
foods?
47Aluminum plates produced by a company are normally distributed
with a mean thickness of 2.0 mm and a standard deviation of
0.1 mm. If 6% of the plates are too thick, what is the cutoff
point between “too thick” and “acceptable?”
48Many people took a physical fitness course.
Seven of them were
randomly selected and were tested for how many sit-ups they could do.
The same seven were re-tested after the course. From the data below,
can you conclude that improvement took place among the general run of
people who took the course? Use α = 0.01.
Anne Bill Chance Deb Ed Frank Grace
Before 29 22 25 29 26 24 31
After 30 26 25 35 33 36 32
49 You’re auditing a bank. The bank’s internal
accountants tell you that the average deposit is $189.56 and the
standard deviation is $45.00. You take a random sample of 400 deposits
and find an average of $200.00. How likely is it, if the bank’s
accountants are correct, that you would get a random sample of that
size with a mean of $200.00 or more?
| Unit size | Entire US | Nebraska |
| Studio/efficiency | 18.2% | 75 |
| 1 bedroom | 18.2% | 60 |
| 2 bedrooms | 40.4% | 105 |
| 3 bedrooms | 18.2% | 45 |
| Over 3 bedrooms | 5.0% | 15 |
| Total | 100.0% | 300 |
50(adapted from Johnson & Kuby’s
Just the Essentials of Elementary Statistics 2/e
problem 11.15)
A survey was taken nationally to see what
size vacation home people preferred. A separate survey was taken in
Nebraska. Both were random samples.
Do the Nebraska results differ significantly (0.05 level)
from the national results?
51An experiment was designed to test the effectiveness of a short
course that teaches diabetic self care. Fifty diabetic patients were
enrolled in the course, and fifty others served as a control group.
(Patients were randomly assigned between the two groups.)
Six months after the course, blood sugar levels were tested and
results obtained as follows:
Diabetic course group: mean = 6.5, standard deviation = 0.7
Control group: mean = 7.1, standard deviation = 0.9
At a significance level of 0.01, does the diabetic course succeed
in lowering patients’ blood sugar?
52(Johnson & Kuby’s Just the
Essentials of Elementary Statistics 2/e problem
9.36)
“A study in the journal PAIN, October
1994, reported on six patients with chronic myofascial pain syndrome.
The mean duration of pain had been 3.0 years for the 6 patients and
the standard deviation had been 0.5 year. Test the hypothesis that the
mean pain duration of all patients who might have been selected for
this study [meaning, of all persons who suffer from this condition]
was greater than 2.5 years. Use α = 0.05.
Assume that the sample is a random sample, normally distributed with
no outliers.
53In a survey of working parents, 200 men and 200 women were
randomly selected and
asked, “Have you refused a promotion because it would mean less time
with your family?” Of the men, 60 said yes; 48 of the women said yes.
(a) Obviously more men in the sample refused promotions. But
can you conclude at the 0.05 significance level that a higher
percentage of all working men have refused promotions, versus
the percentage of all working women?
(b) In an English sentence, state
a 95% confidence interval for the difference in percentages of men and
women who refuse promotions.
54Ten thousand students take a test, and their scores are
normally distributed. If the middle 95% of them score between 70 and 130, what
are the mean and standard deviation?
55An insurance company advertises that 75% of its claims are settled
within two months of being filed. The state insurance commission
thinks the percentage is less than 75, and sets out to prove it. First a
small study is done. For this preliminary study, the commissioner can
live with a 5% chance of making a Type I error. The commission staff
randomly selects 65 claims, and finds out that 40 were settled within
two months. Based on this study, can you say that less than 75% of
claims are settled within two months?
56
Skip this problem: it uses techniques we did not
study.
A shoe store gets its shoes from just two companies,
40% from A and 60% from B. 2.5% of
pairs from Brand A are mislabeled, and 1.5% of pairs from Brand B are
mislabeled. Find
the probability that a randomly selected pair of shoes in the store is
mislabeled.
57Ten randomly selected men compared two brands of razors.
Each man shaved one side of his face with brand A and the other side
with brand B. (They flipped coins to decide which razor to use on
which side.)
Each tester assigned a “smoothness score” of 1 to 10 to each side
after shaving. The scores are as shown below. Determine whether
there is a difference in smoothness performance between the two
razors, using α = 0.10.
Man 1 2 3 4 5 6 7 8 9 10
A score 7 8 3 5 4 4 9 8 7 4
B score 5 6 3 4 6 5 6 7 3 4
58In August 2009, the
National Geographic News Web site reported that 90% of
US currency was tainted with cocaine.
(a) If you drew a random sample of two bills, what is the
chance that exactly one of them is tainted with cocaine?
(b) You have ten bills, and you’ve been told that 90% of
these ten bills are tainted with cocaine. If you draw two of the ten
bills at random, what is the chance that exactly one of your two is
tainted with cocaine?
59Fifteen farms were randomly selected from a large agricultural
region. Each farm’s yield of wheat per acre was measured. For the
15 farms, the mean yield per acre was 85.5 bushels and the standard
deviation was 10.0 bushels. Find a 90% confidence interval for the
mean yield per acre for all farms in this region, assuming yield per
acre is normally distributed and there were no outliers in the sample.
60You draw five cards from a
deck, without replacement, and record the number of aces you drew.
Then you replace the five cards and shuffle the deck thoroughly.
If you repeat this experiment many times, is the number of aces in
five cards drawn a binomial distribution? Why or why not?
61 In a survey of 300 people from Tompkins County, 128 of
them preferred to rent or stream a movie on Saturday night rather than
watch broadcast or cable TV.
In Cortland County, 135 of 400 people surveyed preferred a
movie. You’re interested in the difference of proportion in
movie renters for Tompkins County over Cortland County.
(a) What is the point estimate for that difference?
(b) Find the 98% confidence interval for the difference in the two
proportions for all residents of the counties.
(c) What is the maximum error of estimate, at the 98% confidence
level?
|
Germinated |
Didn’t |
| Untreated |
80 |
20 |
| Treated |
135 |
15 |
62Two batches of seeds were randomly drawn from the
same lot, and one batch was given a special treatment. Consider the
data for germination shown at right. At significance level 0.05, does the
treatment make any difference in how likely seeds are to
germinate?
63A booster rocket has six
gaskets, each with a 97% reliability rating. If any gasket fails, the
launch fails and the rocket will explode. (This actually happened to
the space shuttle Challenger.) Assuming that the gaskets
hold or fail independently, what is the chance of an explosion?
What’s New
- 6 Sep 2011: Mark problems to be skipped because they
require probability rules not covered in class.
- (intervening changes suppressed)
- 11 Nov 2007: new document, mostly an amalgamation of the old
separate documents for descriptive and inferential statistics
to home page
This page is used in instruction at
Tompkins Cortland Community College in Dryden, New
York; it’s not an official statement of the College. Please visit
www.tc3.edu/instruct/sbrown/
to report errors or ask to copy it.
For updates and new info, go to
http://www.tc3.edu/instruct/sbrown/stat/