TC3 → Stan Brown → Statistics → Fall08 ME50 → Chapter 4 Quiz
revised Oct 7, 2008

Quiz with Solutions: Chapter 4 (30 min)

These solutions show about the same level of work I expect from you, though I add quite a bit of extra commentary. Please see Show Your Work for the what, why, and how.

1(points: 2) During the Great Depression of 1929–1933, an association was found: better-educated people tended to have shorter spells of unemployment. Can we conclude that higher education levels were responsible for reducing the length of time a person was likely to be unemployed? In a few words, why or why not?

(adapted from Freedman, Pisani, Purves Statistics [Norton, 2007], pages 150,–151)

Answer: No, because this was an observational study.

Alternative solution: This was not an experiment, there could have been lurking variables, there were no controlled treatments, etc.

Remark: In fact there was a lurking variable: age. Younger people tended to be better educated than older people (as they still do), and employers seemed to prefer to fill vacancies with younger people. Within a given age group, the association between education and time without a job was much weaker.

Brandper serving
Costmg Na
Amy’s Black Bean$3.03780
Banquet Cheese$1.281500
Patio Cheese$1.071570
Banquet Beef$1.271330
El Charrito Beef$1.531370
Patio Beef$1.051700
Healthy Choice Chicken$2.34440
Lean Cuisine Chicken$2.47520
Weight Watchers Chicken$2.09660

2(points: 2+3+1=6) The March 2000 Consumer Reports reported the cost and sodium content of several brands of supermarket enchiladas.
(a) Plot a scatter diagram on your TI. Is association positive or negative?

(b) Compute the correlation coefficient and write it down with its symbol.

(c) The decision point for n=9 is 0.666. What if anything can you conclude about a linear relationship between sodium content and cost for all supermarket enchiladas, assuming this selecton was random?

(adapted from Johnson & Kuby Elementary Statistics [Thomson, 2004], page 153, problem 3.42)

(a) Costs in L1, sodium in L2. Set up the Stat Plot screen as shown, press [ZOOM] [9], and get the graph shown below, which is a clear negative association and reasonably close to linear.

 

Stat Plot setip screen    scatter plot

(b) LinReg(ax+b) L1,L2 yields r = −0.8789669359 → r = −0.8790

Common mistake: Students often leave out the minus sign. It’s not an optional decoration! Always look at the plotted points before you do any computations, and then you’ll know whether r is positive or negative.

(c) |r| = 0.8790 is greater than the decision point. Therefore there is a linear association in the population, and it is negative. The more expensive enchiladas tend to have less salt.

See also:  Decision Points for Correlation Coefficient

Common mistake:  It’s not enough just to say that there is a linear association; you must state the direction also.

Common mistake:  On the other hand, don’t say too much. You know that the correlation coefficient of the population is negative, but you don’t know what number. It could be greater than, equal to, or less than −0.8790.

Remark:  Though we don’t learn how to do it in this course, the population correlation coefficient can be estimated. See Inferences about Linear Correlation if you’re interested.

Age, yrPrice, $000
314.9
614.0
412.0
115.5
99.8
411.0
109.0
117.6
89.0
515.0

3(points: 3+2+2=7) The table shows the age and asking price for randomly selected Honda Accords that were listed on AutoTrader.com on Sept. 8, 2002.
(a) Find the equation of the line of best fit and write it down with correct symbols.

(b) Predict the average asking price for all 10-year-old Honda Accords on Autotrader.com as of that date.

(c) What is the numerical value of the coefficient of determination? Write a sentence to interpret its meaning for someone who has not studied statistics.

(adapted from Johnson & Kuby Elementary Statistics [Thomson, 2004], page 166, problem 3.62)

(a) Ages in L1, prices in L2. LinReg(ax+b) L1,L2,Y1. The coefficients are a = −0.8878680801 and b = 17.08386337, so the least-squares regression line has the equation

ŷ = −0.8879x + 17.0839

tracing on the regression line (b)Press [TRACE] []. Enter 10 and read off the answer, 8.205182568 in thousands or $8205.

Common mistake: Notice that the original price data are in thousands. Besides, a price of $8 for a car makes no sense. Always check your answers for reasonableness, and if you get an unreasonable answer check back to see what you’re missing

Remark: We usually round averages to one more decimal place than the original data. The original data have one decimal place, so strictly speaking we should round 8.205182568 to 8.21 thousands or $8210. But when you have scale factors like “thousands” it seems more natural to round to the nearest integer.

Alternative solution: ŷ = (10)(−0.8878680801) + 17.08386337 = 8.2051826 → 8.205 in thousands, so the predicted average price is $8205.

(c) Going back to the output of the LinReg(ax+b), read off the coefficient of determination, R² = 0.7276626063 → R² = 0.7277

Interpretation: about 73% of the variation in asking price is associated with age of the vehicle. The other 27% is other factors including lurking variables and sampling error. One major lurking variable is the number of miles on the car. Others include the presence or absence of optional equipment, rust, mechanical condition of the car, and the owner’s urgency in selling.

Common mistake: Say “associated with” rather than “explained by” or “caused by”. Since this is an observational study, we don’t know that change in age causes change in price.


This page is used in instruction at Tompkins Cortland Community College in Dryden, New York; it’s not an official statement of the College. Please visit www.tc3.edu/instruct/sbrown/ to report errors or ask to copy it.

For updates and new info, go to http://www.tc3.edu/instruct/sbrown/stat5008c/