These solutions show about the same level of work I expect from you, though I add quite a bit of extra commentary. Please see Show Your Work for the what, why, and how.
(adapted from Freedman, Pisani, Purves Statistics [Norton, 2007], pages 150,–151)
Answer: No, because this was an observational study.
Alternative solution: This was not an experiment, there could have been lurking variables, there were no controlled treatments, etc.
Remark: In fact there was a lurking variable: age. Younger people tended to be better educated than older people (as they still do), and employers seemed to prefer to fill vacancies with younger people. Within a given age group, the association between education and time without a job was much weaker.
| Brand | per serving | |
|---|---|---|
| Cost | mg Na | |
| Amy’s Black Bean | $3.03 | 780 |
| Banquet Cheese | $1.28 | 1500 |
| Patio Cheese | $1.07 | 1570 |
| Banquet Beef | $1.27 | 1330 |
| El Charrito Beef | $1.53 | 1370 |
| Patio Beef | $1.05 | 1700 |
| Healthy Choice Chicken | $2.34 | 440 |
| Lean Cuisine Chicken | $2.47 | 520 |
| Weight Watchers Chicken | $2.09 | 660 |
2(points: 2+3+1=6) The March 2000 Consumer Reports reported
the cost and sodium content of several brands of supermarket enchiladas.
(a) Plot a scatter diagram on your TI. Is association
positive or negative?
(b) Compute the correlation coefficient and write it down with its symbol.
(c) The decision point for n=9 is 0.666. What if anything can you conclude about a linear relationship between sodium content and cost for all supermarket enchiladas, assuming this selecton was random?
(adapted from Johnson & Kuby Elementary Statistics [Thomson, 2004], page 153, problem 3.42)
(a) Costs in L1, sodium in L2. Set up the Stat Plot screen as
shown, press [ZOOM] [9], and get the graph shown below, which is
a clear negative association and reasonably close to linear.
(b) LinReg(ax+b) L1,L2 yields
r = −0.8789669359 →
r = −0.8790
Common mistake: Students often leave out the minus sign. It’s not an optional decoration! Always look at the plotted points before you do any computations, and then you’ll know whether r is positive or negative.
(c) |r| = 0.8790 is greater than the decision point. Therefore there is a linear association in the population, and it is negative. The more expensive enchiladas tend to have less salt.
See also: Decision Points for Correlation Coefficient
Common mistake: It’s not enough just to say that there is a linear association; you must state the direction also.
Common mistake: On the other hand, don’t say too much. You know that the correlation coefficient of the population is negative, but you don’t know what number. It could be greater than, equal to, or less than −0.8790.
Remark: Though we don’t learn how to do it in this course, the population correlation coefficient can be estimated. See Inferences about Linear Correlation if you’re interested.
| Age, yr | Price, $000 |
|---|---|
| 3 | 14.9 |
| 6 | 14.0 |
| 4 | 12.0 |
| 11 | 5.5 |
| 9 | 9.8 |
| 4 | 11.0 |
| 10 | 9.0 |
| 11 | 7.6 |
| 8 | 9.0 |
| 5 | 15.0 |
3(points: 3+2+2=7) The table shows the age and asking price for
randomly selected Honda Accords that were listed on AutoTrader.com on
Sept. 8, 2002.
(a) Find the equation of the line of best fit and write it down
with correct symbols.
(b) Predict the average asking price for all 10-year-old Honda Accords on Autotrader.com as of that date.
(c) What is the numerical value of the coefficient of determination? Write a sentence to interpret its meaning for someone who has not studied statistics.
(adapted from Johnson & Kuby Elementary Statistics [Thomson, 2004], page 166, problem 3.62)
(a) Ages in L1, prices in L2. LinReg(ax+b) L1,L2,Y1.
The coefficients are a = −0.8878680801 and b =
17.08386337, so the least-squares regression line has the equation
ŷ = −0.8879x + 17.0839
(b)Press [TRACE] [▲]. Enter 10 and read off the answer,
8.205182568 in thousands or $8205.
Common mistake: Notice that the original price data are in thousands. Besides, a price of $8 for a car makes no sense. Always check your answers for reasonableness, and if you get an unreasonable answer check back to see what you’re missing
Remark: We usually round averages to one more decimal place than the original data. The original data have one decimal place, so strictly speaking we should round 8.205182568 to 8.21 thousands or $8210. But when you have scale factors like “thousands” it seems more natural to round to the nearest integer.
Alternative solution: ŷ = (10)(−0.8878680801) + 17.08386337 = 8.2051826 → 8.205 in thousands, so the predicted average price is $8205.
(c) Going back to the output of the LinReg(ax+b),
read off the coefficient of determination, R² = 0.7276626063
→ R² = 0.7277
Interpretation: about 73% of the variation in asking price is associated with age of the vehicle. The other 27% is other factors including lurking variables and sampling error. One major lurking variable is the number of miles on the car. Others include the presence or absence of optional equipment, rust, mechanical condition of the car, and the owner’s urgency in selling.
Common mistake: Say “associated with” rather than “explained by” or “caused by”. Since this is an observational study, we don’t know that change in age causes change in price.
home page | problems with viewing?
This page is used in instruction at Tompkins Cortland Community College in Dryden, New York; it’s not an official statement of the College. Please visit www.tc3.edu/instruct/sbrown/ to report errors or ask to copy it.
For updates and new info, go to http://www.tc3.edu/instruct/sbrown/stat5008c/