Guide to Chapter 3
Copyright © 2008–2012 by Stan Brown, Oak Road Systems
Copyright © 2008–2012 by Stan Brown, Oak Road Systems
This is your guide to what’s important in the chapter, with comments on some things that the chapter leaves out or doesn’t explain well. Page numbers refer to Sullivan, Michael, Fundamentals of Statistics 3/e (Pearson Prentice Hall, 2011), which is equivalent to the “second custom edition” for TC3.
Always check Corrections to Sullivan’s Fundamentals of Statistics, 3rd Edition for known mistakes in the textbook.
1-VarStatsHandout: Sample Statistics on TI-83/84 for mean, s.d., variance, and five-number summary. Use this instead of the formulas in your book.
MATH200A: Use MATH200A part 2 to make boxplots. These will show outliers and can be traced to show the five-number summary.
Announce: Sleep Lab is due next week and will use material from Chapter 2 and Chapter 3. Staple it before you come to class.
General advice on formulas: The book has a lot, but your TI-84/84 does almost all the work for you. Look at a formula so that you understand what it’s telling you, but don’t memorize it and don’t use it in computations.
117–118 The mean is computed the same way whether it’s a sample mean or a population mean; only the symbol of the result is different (x̄ or μ).
Again, use the calculator, not the formulas. See Sample Statistics on TI-83/84 or page 130 of your book. Practice with Example 1.
You must show your work. Write down
1-VarStats and then the list that you enter on the
calculator screen.
118Note the convention: In this class, we will always round means (and standard deviations, when we meet them) to one more decimal place than the sample data.
118–119(b) and (c) are there to remind you that samples will vary from the population and from each other (sampling error). A different sample of four would most likely have a different mean.
119Note mean as center of gravity of histogram. This will help you understand the term “resistant” later.
119–120 definition of median
If you want the median and no other statistics, you can compute it the way the book says. But if you want other statistics at the same time, which usually you do, see Sample Statistics on TI-83/84.
Page 120 Example 3 is the same data from Example 1; just read the median off your calculator’s display.
121–122 Use the dot plot on page 122 to visualize the difference between mean as center of gravity, and median as “half the data below, half the data above” without regard to their values. You should then be able to see why the median is resistant and the mean is not.
122Your book implies it but doesn’t come right out and say it till the next page: For skewed data, the median is usually a better choice than the mean.
Understand what resistant means, because you’ll meet that word again several more times this semester.
122 Look at Table 4 and Figure 7. You can think of extreme values as pulling the mean away from the median, just as one person with a very high score pulls up the class average on a quiz. As your book says, this rule generally holds for continuous data. The article cited in the footnote is linked from our Web page under Chapter 3 optional extras.
123 This distribution is roughly bell shaped.
Caution! Mean and median close together does tell you that the distribution is most likely symmetric, but not which variety of symmetric. To say a distribution is bell shaped, you have to look at the histogram.
123 definition of mode
124 If two or more values are tied for greatest frequency, your book says that the distribution has two or more modes.
124–125 Qualitative data can have mode, but can’t have mean or median.
125 Remember to test yourself with the “Concepts and Vocabulary” section, and maybe even some “Skill Building”. If you don’t understand something, go back and figure it out. This is a good idea for every section of the textbook.
Why do we care? Because more dispersion means less consistency and predictability.
132 The range is easy to compute but has two problems: it’s not resistant and it doesn’t use all the data.
132–136 solution: The variance does use all the data and is resistant (for larger data sets).
Note: The variance is frankly not important for a practical statistics course like ours; it becomes important in a mathematical statistics course. For our purposes, look at it as a stepping stone to the standard deviation.
133Why does ∑(x-x̄) = 0 ?
133 Have a look at the formula. (You won’t use any of the formulas in this section to solve problems.) Variance is the average of squared deviations from the mean.
133–134 Skip Example 3. (Or just look at the computation process, page 134 top left.)
134 Variance is a good measure of spread, but with one problem: its units are the square of the original units. What do square dollars of square quiz points or square pounds mean? This will be solved on page 136, but first. ...
134–135 Unlike the means, the sample variance s² and the population variance σ² are not computed the same way. It’s the difference between N (population size) and n−1 (sample size minus 1).
135–136 Skip Example 4.
136 ff The standard deviation is our go-to measure of spread. Like the variance, it uses all the data and in larger data sets it is resistant, but it’s also measured in the same units as the original data.
136It’s true that population standard deviation is the square root of population variance, and sample standard deviation is the square root of population variance. In the olden days of the twentieth century, you had to compute the variance and then take the square root to get the s.d. Now, we have calculators that compute the s.d. directly, and if we want the variance we get it by squaring the s.d. (Don’t round the s.d. before squaring.)
137 Work through Example 6 on your calculator. (See data on pages 132 and 136, and see Sample Statistics on TI-83/84 or textbook page 130 for the procedure.)
Caution! Your calculator doesn’t know whether you have a whole population or just a sample, so it gives you both standard deviations and depends on you to pick the right one.
rounding convention: Round the standard deviation or variance to one decimal place more than the data, just as you did with the mean. The book uses this convention but does not state it.
138In interpreting the standard deviation, remember that it’s about consistency and predictability. If test scores have low s.d, it means that most people scored close to average (which probably means it wasn’t a very good test). If historical prices of a stock have high s.d., it means the stock is highly volatile, and if forced to sell on short notice you’re about equally likely to make a large profit or take a big loss.
138 The Empirical Rule or 68–95–99.7 Rule is an important interpretation of the standard deviation. But don’t use it where it doesn’t apply. The Empirical Rule applies only to bell-shaped distributions, also known as normal distributions.
139–140 Work through Example 8 carefully.
140–141 Chebyshev’s Inequality takes the place of the Empirical Rule when a distribution isn’t known to be bell shaped. Be aware that it exists, but don’t spend any time on it because no problems will be assigned.
155 The z-score or standard score tells you where a data value stands within its sample or population. It uses the standard deviation as a yardstick.
z-scores will be important right through Chapter 10, so make sure you understand them thoroughly and can compute them.
156 Know how to interpret percentiles. You will compute percentiles in the Sleep Lab and in Chapter 7.
157 Be able to interpret quartiles, but let the calculator compute them for you. See Sample Statistics on TI-83/84.
Your calculator’s quartiles may differ slightly from the textbook’s. Different authors compute quartiles in different ways, but all interpret them the same way.
Be aware that Q1=P25 and Q3=P75. Q2=P50=M.
158–159 The IQR is a resistant measure of spread. You don’t get it directly from your calculator, but you get the quartiles so you compute it as Q3−Q1. Show your work!
159 Just below the table, notice the definition of describe the distribution: shape, center, spread.
159–160 Understand what the formulas are telling you, but don’t use them to find outliers. Instead, use the boxplot as described in MATH200A Program part 2. Caution! If there’s an outlier, don’t just say “there’s an outlier” — give the exact number(s). In statistics, always be as specific as possible.
160Why check for outliers? They may be errors, or they may be important information you didn’t anticipate. As we’ll see in Chapters 9–11, if you have a fairly small data set and it has outliers, you can’t perform the usual procedures of inferential statistics.
The five-number summary is nothing but a new name for statistics you’ve already computed.
164–165 On your calculator, see
Sample Statistics on TI-83/84 to get the five-number summary;
it’s simply the second panel of output of
1-VarStats.
Practice with Example 1 on your calculator. You should get the same answers, correct to two decimal places.
165–166 To make a boxplot (also called a box-whisker), use MATH200A part 2 and not the procedure in your book. Verify that your boxplot matches the one in the book, including the outlier.
Notice that step 5 mentions the specific
outlier. Always be specific. (You can use [TRACE] on the
boxplot to find the value of each outlier.)
168 MATH200A part 2 can compare two or three data sets. Try this example with the program.
148 Remember from section 2.2 that you can group discrete or continuous data.
As the book says, you get approximations only, but usually quite good (better for larger data sets).
The formulas are similar to the earlier ones, but each data point xi is multiplied by the number of times it occurs, fi. Of course you won’t be using the formulas.
Actually the x’s aren’t data points but class midpoints. Caution! Compute the midpoint correctly; (high+low)/2 is wrong. Use the book method or the equivalent low + ½(class width).
148–149 Work Example 1 with your calculator. See Sample Statistics on TI-83/84 if you need directions, or look at the bottom of page 154.
You must show your work. Write down
1-VarStats and then the lists that you enter on the
calculator screen.
149–150 weighted mean (ex: GPA) — weights replace frequencies
ex: three cars get 20 mpg, two get 22 mpg, one gets 24 mpg; does x̄ = 22?
150 Variance and s.d. for grouped data have similar changes in formula, but you won’t use the formulas. When you computed the mean a couple of pages back you already had the s.d. Compare your answer with Example 3.
Caution! The five-number summary isn’t meaningful for grouped data because you need the actual data, not the class midpoints. The same applies to the boxplot, which is just a picture of the five-number summary.
This page is used in instruction at Tompkins Cortland Community College in Dryden, New York; it’s not an official statement of the College. Please visit www.tc3.edu/instruct/sbrown/ to report errors or ask to copy it.
For updates and new info, go to http://www.tc3.edu/instruct/sbrown/stat/