MATH200A Program —
Statistics Utilities for TI-83/84
Copyright © 2008–2010 by Stan Brown, Oak Road Systems
Copyright © 2008–2010 by Stan Brown, Oak Road Systems
Professor Marvel’s MATH200 classes will use this program, but any statistics student should find it helpful.
Contents:
MATH200A Program Overview:
Getting the Program |
Using the Program
See also: Troubles? See TI-83/84 Troubleshooting.
See also: Advanced students might also want to download the program described in MATH200B Program — Extra Statistics Utilities for TI-83/84.
MATH200A Program OverviewYou need two programs, the main MATH200A and a subprogram called MATH200Z. There are three methods to get the programs into your calculator:
2nd x,T,θ,n makes LINK]
[►] [ENTER], and then on hers press
[2nd x,T,θ,n makes LINK] [3], select MATH200A,
then press [►] [ENTER]. Repeat for MATH200Z.If you have a TI-83 Plus or Silver Edition, the above program versions are fine for you and you should ignore this Special Note. But if you have the original TI-83, without a Plus or Silver designation, then this note applies to you.
Instead of the MATH200A and MATH200Z programs, you need M20083A and M20083Z. If transferring them from a colleague’s calculator, check the program names carefully. If you’re getting the programs from the MATH200A.ZIP file, look for them in the subfolder called For_Original_TI83.
The two versions are functionally identical, but the “original TI-83” version uses all capital letters for prompts and displays because the original TI-83 couldn’t handle lower case in programs. (ρ and σ are replaced with RHO and σx for the same reason.) This Web page shows all screens from the TI-83 Plus or TI-84 version, because most students have a calculator that can handle it.
Press the [PRGM] key. If you can see MATH200A
in the menu, press its number; otherwise, scroll to it and press
[ENTER]. When the program name appears on your home screen,
press [ENTER] to run it. Check to make sure you have the
latest version, as shown on the splash screen, then press
[ENTER].
The menu at right shows what the program can do:
Histograms etc:
make a histogram or polygon,
or overlay both, for a frequency distribution, a relative frequency
distribution, a probability distribution, or a simple list of
numbers.Box-whisker:
plot a box-whisker diagram, showing any
outliersBinomial prob:
compute probability or cumulative probability of a
binomial distributionBinom PD histo:
plot a histogram of a binomial distributionNormality chk:
test whether your sample is drawn from a normal
distributionSample size:
find
the necessary sample size for a given confidence level and margin
of error for binomial data (one or two populations) or numeric data
(with population standard deviation σ known or unknown)GoF test:
test goodness of fit for categorical data in one
populationThe program is protected so that you don’t edit it accidentally. If you’d like to see the source code, please look at MATH200A.PDF in the downloadable MATH200A.ZIP file.
If you should ever need to break out of the program
before finishing the prompts, press [ON] [1].
Each procedure leaves its results in variables in case you want to use them for further computations. For details, please see the separate document MATH200A Program — Technical Notes.
Making a histogram or
frequency polygon using native TI-83/84 commands is kind of tedious,
especially setting up the WINDOW screen.
This part of the MATH200A program automates the process.
The program can create a
histogram or polygon, or both in overlay, for these
distributions:
MATH200 students: Use this part of the MATH200A program
for histograms of grouped and ungrouped data in section 2.2 and for
discrete probability histograms in section 6.1 of Sullivan, Michael, Fundamentals of Statistics (Pearson Prentice Hall, 2008).
Ignore frequency polygons.
To use the program, put the class midpoints or ungrouped data in a statistics list and the frequencies (if applicable) in another. (Your book might use the term class marks instead of class midpoints; they mean the same thing.)
Then press [PROG], select
MATH200A, and press
1:Histograms etc. The program will prompt you for the
necessary information and will check
silently to make
sure your inputs are in valid form. Then it will ask whether you want a
histogram, polygon, or both, and will produce your desired graph. (The
program uses an algorithm to ensure that there are an appropriate
number of dots vertically on the screen.)
Restrictions: If you have frequency data, the data list and frequency list must be the same length and the class widths in the data list must all be the same; the program checks this. The TI-83 and TI-84 won’t let you make a histogram with more than 47 classes, and the program also checks this. Finally, the program also won’t let you make a grouped frequency histogram with just one or two classes, because that’s silly.
| Class Boundaries | Class Midpoints | Frequency |
|---|---|---|
| 20 ≤ x < 30 | 25 | 34 |
| 30 ≤ x < 40 | 35 | 58 |
| 40 ≤ x < 50 | 45 | 76 |
| 50 ≤ x < 60 | 55 | 187 |
| 60 ≤ x < 70 | 65 | 254 |
| 70 ≤ x < 80 | 75 | 241 |
| 80 ≤ x < 90 | 85 | 147 |
The grouped frequency distribution at right shows the ages reported by Roman Catholic nuns, from Johnson & Kuby, Elementary Statistics 9/e (Thomson, 2004), page 67. Show the data as a histogram and as a frequency polygon.
Solution:
Your class marks (class midpoints) are 25, 35, up through 85, and your class
width is 10. On the STAT EDIT screen, enter those
class midpoints in one list and the frequencies in another.
Here the class midpoints have been entered in L5 and the
frequencies in L6.
Run the MATH200A program and select 1:Histograms etc.
On the Data Arrangement screen, select [3] for grouped
frequency distribution.
Specify your list of class midpoints (class marks) and your frequency
list.
Next, select which plots you want. In the illustration,
I’m selecting both plots. If you’re not doing frequency
polygons in your class, select [1] for just the histogram.
The output is shown at right, once with the histogram and polygon on
the same screen, and once with just the
histogram.
You can trace the histogram by pressing
[TRACE].
This lets you see the class boundaries and number of data points in each
class.
Press [◄] and [►] to
move through the classes. To suppress the tracing
information, press [GRAPH] again.
To trace the polygon, if you selected both, press
[TRACE] [▼].
This lets you see the class midpoints and number of data points in each
class, instead of the class boundaries.
Why press [▼]? When you press
[TRACE], the calculator starts with a trace of Stat Plot 1.
The up or down cursor key moves between plots. Since the frequency
polygon is Stat Plot 2, you are tracing it when you see P2 in the
upper left corner, as shown in the illustration.
| 11 | 15 | 14 | 12 |
| 9 | 8 | 7 | 5 |
| 6 | 11 | 10 | 10.5 |
| 12 | 11 | 13 | 2 |
| 6 | 4 | 13.5 |
Suppose you want to make a histogram of the class performance on a 15-point quiz, where the scores are shown at left. You don’t want to bother to group these 19 scores by hand, so you enter them in a statistics list such as L1 and let the program do the grouping for you. (An alternative graphical display is the box-whisker plot.)
Run the MATH200A program and select 1:Histograms etc.
When the program asks your data arrangement, select
[1] for a plain list of numbers.
Then enter the name of the list that contains your numbers.
For a plain list of numbers, the program needs to know how you want
to group your data, so it asks you to specify the lower bound of the first
class as well as the class width. Since 0 is the lowest possible grade,
that’s the obvious lower bound for this example. The quiz has a possible
maximum score of 15, and 10% of that is 1.5 points. This is a good
class width because grades of D, C, B, and A will each be one bar of the
histogram.
At this point you may see a pause as the program computes the
number of classes and places each data point into a class. (It uses
statistics lists LD for the class midpoints (class
midpoints) and
LF for the computed frequencies.)
For a simple list of numbers, the program will make only a histogram, not a frequency polygon.
In first looking at that histogram,
you might think there are four Ds, three Cs, two Bs,
and one A. But when you check this by pressing [TRACE] you see
that’s not correct. The highest class is the 15 to 16.5 class,
since someone had a perfect score of 15. (Remember: when a value
is right on a class boundary, it is always assigned to the higher
class.) So the top two classes in the histogram represent As
(three students), the next lower is the three Bs, the next lower
(shown at right) is four Cs, the next lower is two Ds, and the rest
are Fs.
| Children per family | Freq. |
|---|---|
| 0 | 9 |
| 1 | 6 |
| 2 | 10 |
| 3 | 2 |
| 4 | 2 |
| 5 | 1 |
You have recorded the numbers of children in 30 randomly selected families that used a community center in a given week, and you want to show a picture of the discrete distribution. There are only a few different values (numbers of children per family), so you choose an ungrouped frequency distribution.
Enter the data in two lists such as L3 and L4, run
the MATH200A program, and select 1:Histograms etc.
For Data Arrangement, select [2], ungrouped distribution.
Enter your data list and frequency list as usual.
When prompted, select whether you want a histogram, a frequency polygon, or both. I’ve selected a histogram, and the results are shown below.
The vertical line in the histogram is the y
axis — you can remove it, if you want, by pressing
[2nd ZOOM makes FORMAT] and selecting AXES OFF. Also in the
histogram, notice that the vertical rows of dots run through the
center of each bar rather than along the edges. This reminds you that
you should label the bars of an ungrouped frequency distribution under
the centers, not the edges as you would label a grouped histogram.
If you want to trace the ungrouped frequency histogram, follow the same procedure as for tracing a grouped frequency histogram.
If you have a relative frequency distribution or probability distribution, you plot it in almost the same way as a frequency distribution. The main difference is that the relative frequencies or probabilities must add up to 1, and the program checks this for you.
(There’s a special case: the binomial probability distribution. For this, please see Histogram of a Binomial Distribution below.)
| Number of dice alike | Probability |
|---|---|
| 1 | 720/7776 |
| 2 | 5400/7776 |
| 3 | 1500/7776 |
| 4 | 150/7776 |
| 5 | 6/7776 |
Here’s an example of a general discrete probability distribution, drawn from the rainy-day game Yahtzee. In Yahtzee you roll five dice and try to make various combinations. Shown at right are the probabilities for number of dice alike when you roll five standard six-sided dice. (Thanks to Paul Sperry for help with the probabilities.) You can make a histogram of this probability distribution.
Notice, by the way, that you’re more than twice as likely to roll three of a kind as to roll “none of a kind” or all five different: P(3) = 1500/7776, and P(1) = 720/7776. And you’re over seven times as likely to roll two of a kind (either two the same and three all different, or two pairs with the fifth die different): P(2) = 5400/7776. Of course, Yahtzee isn’t just about the initial roll. You get two tries to improve your combinations by re-rolling some of your dice.
As before, put the x’s (1–5 this time) in one list and
the p’s in another. Run the MATH200A program and select 1:Histograms etc.
The data arrangement this time is a
probability distribution, so specify [4] and then enter your
data list and probability list.
The finished probability histogram is shown at right. (For a
probability distribution, the program automatically makes a histogram,
with no option to make a polygon.) Notice that the
bar for two of a kind is much higher than any of the others.
If it wasn’t obvious from the numbers, you can see
from the histogram that this distribution is skewed right.
If you like, you can press the [TRACE] button and see
the numerical value of the probability for each outcome. For example,
P(2) = 0.6944, meaning that when you roll five dice you have
almost a 70% chance of getting two of a kind.
What about two of a kind or better? This is a classic “at least” problem, and the complement is your friend. P(1) is 0.0926 and 1−P(1) = 0.9074. You have better than a 90% chance of getting at least two of a kind when rolling five dice.
Summary: A modified box-whisker plot is a
quick graphical representation of a data set. It plots the
five-number summary (minimum, first quartile, median, third
quartile, maximum) and also shows outliers if the data set
contains any. The 2:Box-whisker part of the MATH200A program makes a modified
box-whisker diagram for one data set or compares two or three data
sets by stacking box-whisker plots.
MATH200 students: Use this part of the MATH200A program
for section 3.5 of Sullivan, Michael, Fundamentals of Statistics (Pearson Prentice Hall, 2008).
You will also use it with small samples in
sections 9.2, 10.3, 11.1, and 11.2.
| 11 | 15 | 14 | 12 |
| 9 | 8 | 7 | 5 |
| 6 | 11 | 10 | 10.5 |
| 12 | 11 | 13 | 2 |
| 6 | 4 | 13.5 |
Here again is the set of quiz scores. You’ve already seen them graphed as a histogram, but a box-whisker plot is another way to get a sense of the shape of the data.
Enter the data in any statistics list and run the MATH200A program.
Select 2:Box-whisker and you see a prompt for the number of samples. You
can make a box-whisker plot of a single data set, or you can
compare two or three data sets. The
quiz scores are a single sample, so you choose [1].
Then the program asks whether you have a plain list of numbers
or an ungrouped frequency distribution.
Here you have a plain list of numbers, so you choose data arrangement
[1].
Finally, the program asks you for your data list.
Caution: Never make a boxplot of a grouped frequency distribution. Only a simple list of numbers or an ungrouped frequency distribution is suitable for a box-whisker plot. If you have a grouped frequency distribution, a box-whisker plot won’t be accurate and you should be using a histogram.
The box-whisker plot now appears (below left). You can see at a glance that it has no outliers, and that it’s slightly skewed left.
You can also trace the box-whisker, to see the five-number
summary and the values of any outliers. Press the [TRACE] key
and then use [◄] and [►] to move
left and right. In the illustration below right you can see that the
median quiz score was 10.5.
| 11 | 15 | 14 | 12 |
| 9 | 8 | 7 | 5 |
| 6 | 11 | 10 | 10.5 |
| 12 | 11 | 13 | 2 |
| 6 | 4 | 13.5 | 22 |
What would an outlier look like on a box-whisker plot? Any outliers
show up as isolated points separate from the main diagram.
For example, take the same
set of data, but append a twentieth data point, 22. Now make the plot
and you’ll see the result at right. The [TRACE] key and arrow
keys will display the values of any outliers as well as the
five-number summary.
Sullivan, Michael, Fundamentals of Statistics (Pearson Prentice Hall, 2008), page 163, shows data for two groups of rats. One group was sent into space; the control group was treated the same except for the space flight. Their red blood cell mass was measured in milliliters.
| Flight | Control |
|---|---|
| 8.59 6.87 7.00 6.39 7.43 9.79 9.30 8.64 7.89 8.80 7.54 7.21 6.85 8.03 | 8.65 7.62 7.33 7.14 8.40 8.55 9.88 6.99 7.44 8.58 9.14 9.66 8.70 9.94 |
Plotting the two data sets as two boxplots on the same screen is a good way to get a sense of whether there is a difference between the samples.
Put the flight group in one statistics list and the control group in
another. Then run the MATH200A program, select 2:Box-whisker, and select
2:Compare 2 smpl.
For each sample you’ll be asked your data arrangement.
(That lets one sample be a plain list and the other an
ungrouped frequency distribution, but in this example both are plain
lists.)
Enter the names of the two lists, as shown at right.
The results are shown at right. You can see that the flight group, as
a group, had lower blood-cell mass than the control group, even though
some individuals from the two groups had equal blood-cell mass. Look
particularly at the medians: the median for the control group is about
equal to the third quartile of the flight group, meaning that about
three quarters of the flight rats had blood-cell mass lower than the
median of the control group.
Is that enough to say that space flight lowers blood-cell mass in rats in general? Not yet. Later in the course, you’ll learn how to use a two-sample t test to tell whether space flight lowers blood-cell mass in rats — whether there is a difference between the populations of all space rats and all earthbound rats.
Summary: If you have a fixed number of trials n, and each trial has only two outcomes (called success and failure), and the probability of success p is the same on each trial, then you have a binomial probability distribution. This part of the program computes the probability of a specific number of successes or the probability of a range of numbers of successes.
MATH200 students: Use this part of the MATH200A program
to replace the computations in section 6.2 objective 2 of
Sullivan, Michael, Fundamentals of Statistics (Pearson Prentice Hall, 2008).
TI-89 users: Please see Binomial Probability Distribution on TI-89.
The program always asks you the number of trials n, the probability p of success on one trial, and the number of successes from and to. If you want a specific number of successes rather than a range, then from and to will be the same number.
Example 1: Larry’s batting average is .260. If he’s at bat four times, what is the probability that he gets exactly two hits?
Solution:
n = 4, p = 0.260, x = 2
Note: Some textbooks use r for number of successes, rather than x.
Here you want the probability of exactly two successes, so
FROM and TO will both be 2. In other words, you
are computing the probability of 2 through 2 successes.
Run the MATH200A program, select 3:Binomial prob, and specify
n = 4, p = .26, from = 2, to = 2.
The input and output screens are shown at right.
Answer: P(2) = 0.2221
Caution: Sometimes the probability is very small and the calculator reports it in scientific notation, such as 5.4189E-6. The exponent is not a decoration, and you must not report the probability as simply 5.1489. Probabilities are never greater than 1!
Conventionally, we round probabilities to four decimal places. If the probability is smaller than 0.0001 (smaller than 1E-4), either show it to two significant figures, such as 1.3×10-6, or report it as <0.0001.
Example 2: Larry’s batting average is .260. If he’s at bat four times, what is the probability that he strikes out all four times?
Solution: Four out of four strikeouts means no hits, so you’ll put these values into the program:
n = 4, p = 0.260, from = 0, to = 0
The output screen is shown at right.
Answer: P(0) = 0.2999
Example 3: Suppose 65% of the registered voters in Dryden are Republicans. In a random sample of ten registered voters, what’s the probability of fewer than six Republicans?
Solution: “Fewer than six” is
zero through five.
n = 10, p = 0.65, 0 ≤ x ≤ 5
Run the MATH200A program, select 3:Binomial prob, and enter
n = 10, p = .65, from = 0, to = 5
Answer: 0.2485
There’s about one chance in four of getting fewer than six Republicans in a random sample of ten registered voters.
Summary: The previous section showed how to compute the binomial probability of a particular x or a range of x. But for an overview of a distribution, a histogram is most helpful. This part of the program creates one for you and calculates the mean and standard deviation of the distribution.
MATH200 students: Use this part of the MATH200A program to
replace the computations in section 6.2 objective 3 and create
the histogram in section 6.2 objective 4 of
Sullivan, Michael, Fundamentals of Statistics (Pearson Prentice Hall, 2008).
Example: Suppose 65% of registered voters in Dryden are Republicans. (a) Plot the binomial probability histogram for the random variable “number of Republicans in a random sample of ten registered voters“. (b) Interpret the mean and standard deviation of the distribution.
Solution: n = 10, p = 0.65. Run
the MATH200A program and select 4:Binom PD histo.
Enter the values of n and p when prompted. The program responds by
drawing the histogram and displaying μ and σ.
Interpreting the histogram: The bars are each one unit wide, running from 0 to n (0 to 10, for this example), with the dots marking each integer and the y axis marking the minimum possible value of successes, 0. You can see that the most likely result for a sample of ten registered voters is seven Republicans, with six being just slightly less likely. Eight and five are next most likely, then nine and four. A sample of ten is quite unlikely to contain ten or three Republicans, and fewer than three Republicans are extremely unlikely.
Interpreting the mean: The mean of the distribution is 6.5, meaning that if you took a whole lot of random samples of ten registered voters (with replacement), on average a sample would contain 6.5 Republicans.
Interpreting the standard deviation: The standard deviation is about 1.5. While this distribution is not a normal distribution (bell curve), it’s not extremely different from one, and therefore you can say that the Empirical Rule (68–95–99.7 Rule) is not too far wrong.
2σ = about 3, so you would expect roughly 95% of samples
to contain 6.5±3 Republicans. 6.5±3 is 3.5 to 9.5, but
the integers within that range are 4 to 9, so the standard
deviation tell you that roughly 95% of samples of ten
registered voters would have four to nine Republicans. (An equivalent
statement is that there’s about a 95% chance that a sample of
ten voters will contain four to nine Republicans.) If you use the
3:Binomial prob part of the MATH200A program to compute the actual probability,
you find that P(4 ≤ x ≤9) = 0.9605.
As with other histograms, you can press the [TRACE]
key and use the [◄] and [►] keys
to display the probability of each bar.
The two screen shots show values from the histogram.
The first picture shows that P(4) is 0.1536: there’s a 15.36% probability that a random sample of ten registered voters will contain four Republicans.
The second picture shows that P(0) is “2.7585E”.
Unfortunately, the trace doesn’t show the negative exponent, so
you don’t know whether the probability is 2.7585E-4 or
2.7585E-94; but you do know that it’s small, less than 0.001. If
it’s important to know the exact value, use the 3:Binomial prob
part of the program.
By plotting data on your TI calculator, you can easily see how close they are to a normal distribution. The special quantile plot or normal probability plot asks what the distribution would look like if it were normal, and plots that against the actual distribution. The closer the points seem to be to a straight line, the more nearly normal the original distribution.
The most common application is in inferential statistics: with a small sample (less than about 30), you need to make sure that the population is normally distributed before you perform a Student’s t test.
MATH200 students: Use this part of the MATH200A program
with section 7.4 of Sullivan, Michael, Fundamentals of Statistics (Pearson Prentice Hall, 2008).
You will also use it with small samples in
sections 9.2, 10.3, 11.1, and 11.2.
Example: Consider these vehicle weights (in pounds):
2500, 3250, 4000, 3500, 2900, 4500, 3800, 3000, 5000, 2200
Construct a plot to decide whether these vehicle weights seem to be normally distributed.
Solution: Put the data in any statistics list,
then press [PROG], scroll down to MATH200A, and
press [ENTER] twice. Select 5:Normality chk.
The program will make the plot and display the sample size n and correlation coefficient r, as shown at right. If the points are clearly linear (close to a straight line), you know that the sample came from a normal distribution; if they are clearly not near a straight line, you know that the sample is non-normal.
Sometimes it’s hard to decide whether the points are close enough to a straight line. For those cases, the program gives you the correlation coefficient r. r is a measure of how close the points lie to a straight line, with 1 being perfectly linear and 0 being completely non-linear. A good rule of thumb is that r ≥ 0.9 usually means the plot is linear and the original points are normally distributed. But always look first at the plot, and follow your eyes: r is there only for the doubtful cases.
For this example, the program computed a correlation coefficient of 0.9936. If it weren’t already obvious from the plot, this would tell us that the original data are approximately normal.
Summary:
Before you start gathering data, you plan for how large a
sample you will need. This depends on your desired margin of error
and confidence level, and on the type of the data and your prior
estimates. This part of the MATH200A program pulls all this together for you
for the most common cases.
See also: How Big a Sample Do I Need? gives the statistical concepts with examples of calculation “by hand”.
MATH200 students: Use this part of the MATH200A program
to find necessary sample sizes in sections 9.1, 9.3, and 11.3
of Sullivan, Michael, Fundamentals of Statistics (Pearson Prentice Hall, 2008).
Selection 2, finding sample size for numeric
data with unknown σ, is not required.
When you run the MATH200A program and select
6:Sample size, you are prompted first for the type of data, then for the
margin of error, prior estimate, and confidence level.
The program can find the
necessary sample size for binomial data for one or two populations, or
for numeric data for one population whether the population standard
deviation σ is known or not. (At present the program
can’t find sample size for numeric data for two populations.)
Caution: There are other criteria for sample size. For example, with numeric data your sample should be at least 30 unless you know that the population is normally distributed. Always check that your sample will meet the requirements before you begin gathering data.
The key here is whether you know the standard deviation of the population or not. If you don’t know σ, as usually you don’t, then you use s but the computation of sample size is different
Example 1—numeric data with known σ.
You want to estimate the average hourly output of a
machine to within ±1.5, with 90% confidence. Based on
historical data, you have reason to believe that the standard
deviation of the machine’s hourly output is 6.2. How large a
sample do you need?
Solution: Note first that this is not a realistic situation. It’s pretty unlikely that you would know the standard deviation of a population but not know the mean of that population. However, statistics texts always begin with this case because it’s the simplest way to demonstrate the principles. You leave Perfectland and enter Realityville in the other cases. With that said—
Marshal your data: 1−α = .90,
E = 1.5, and σ = 6.2.
Run the MATH200A program and select
6:Sample size, then 1:Num known σ. Enter σ=6.2,
E=1.5, and C-Level=.9 or 90.
The output screen echoes back your inputs and tells you the
critical z for that confidence level as well as the minimum necessary
sample size.
Conclusion: If the standard deviation of the population is 6.2, then to get a 90% confidence interval about the population mean μ with a margin of error no greater than 1.5, you need a sample of at least 47.
Example 2—numeric data with unknown σ
This is the realistic case for estimating a population
mean. Usually you don’t know the standard deviation of the
population, and you make a small pilot study to estimate it or you use
a prior estimate.
The MATH200A program uses the method shown in
Sample
Sizes Required in NIST/SEMATECH e-Handbook of Statistical
Methods (link verified 2009-12-28).
Here’s a modified form of the previous example. You want to estimate the average hourly output of a machine to within ±1.5, with 90% confidence. A small pilot study finds a sample standard deviation of the machine’s hourly output is 6.2. How large a sample do you need?
Solution:
Marshal your data: 1−α is .90,
E is 1.5, and s (not σ) is 6.2.
Run the MATH200A program and select
6:Sample size and then 2:Num unknown σ.
Enter s and E, then .9 or 90 for C-Level as before.
There may be a delay, because the program must compute one
or more inverse t numbers.
The output screen echoes back your inputs and tells you the
critical t as well as the minimum necessary
sample size.
Conclusion: If you don’t know the standard deviation of the population, but you have a prior sample with a standard deviation of 6.2, then to get a 90% confidence interval about the population mean μ with a margin of error no greater than 1.5, you need a sample of at least 49.
Why the difference from the previous example? In the previous case you knew the population standard deviation, and here you don’t. Since an additional parameter is unknown, it takes a larger sample to get the same level of precision for an estimate of μ.
The program helps you find the sample size for estimating a population proportion p (some books use π) for binomial data. Use a prior estimate p̂ in this computation if you have one; if you have no idea of the population proportion then use 0.5.
Example 3—binomial data for one population, with prior estimate
What percent of the voters would vote for your
candidate if the election were held today? You want 95% confidence in
your answer, with a margin of error no more than 3.5%. Last
month’s poll showed your candidate had 42% support. How many
voters do you need to survey?
Solution: Here you have 1−α = 0.95,
E = 0.035 (careful! not 0.35), and p̂ = 0.42.
(When you have a prior estimate, use it; otherwise use 0.5.)
Run the MATH200A program and select
6:Sample size, then 3:Binomial.
Enter your prior estimate, your margin of error, and your confidence
level, and you get the output screen shown at right.
Conclusion: If the true population proportion is somewhere in the neighborhood of 42%, and you want a 95% confidence interval with a margin of error no more than 3.5%, you’ll need a sample of at least 764.
Example 4—binomial data for one population, with no prior estimate
Suppose you didn’t have any idea of the proportion of the
population that planned to vote for your candidate? In that case,
you’d use 0.5 for your prior estimate, and you’d have the
output shown at right. The necessary sample size would rise to
784.
With binomial data for two populations, you compute necessary sample size to estimate the difference between the two proportions. This is different from the sample size needed to estimate the proportion in either population on its own.
It’s not actually necessary to have the two sample sizes equal, but that’s the only way it’s possible to compute them when neither one is specified up front. Just as before, use prior estimates for the population proportions if you have them, and otherwise use 0.5.
Example 5—binomial data for difference of two population proportions, with prior estimates
Suppose you’d like to know how your candidate’s
support differs between men and women. You know that overall support
is 42%, and you think the candidate is 10 percentage points more
popular among women versus men. How many of each sex must you
survey to answer the question with 95% confidence and a margin of
error no more than 3½%?
Do you have an estimate of p1 and p2? Yes, since the overall support is 42% you expect that men’s and women’s support is not too different from that. If women are 10% higher, then you estimate women at 47% and men at 37%.)
Solution: Here you have 1−α = 0.95,
E = 0.035, p̂1 = 0.47, and
p̂2 = 0.37.
Run the MATH200A program and select
6:Sample size, then 4:2 pop binomial.
Enter your prior estimates, your margin of error, and your confidence
level, and you get the output screen shown at right.
Conclusion: If the true population proportions are somewhere in the neighborhood of 47% and 37%, and you want a 95% confidence interval on the difference with a margin of error no more than 3½%, you’ll need a sample of at least 1513 women and 1513 men.
Example 6—binomial data for difference of two population proportions, with no prior estimates
And suppose this was the first poll and you had no idea of your
candidate’s support among women and men? In that case you use
0.5 for both p̂1 and p̂2, with the results shown at right.
With no prior estimate, to get a 95% confidence interval on the
difference in support, with only a 3½% margin of
error, you’d need 1568 women and 1568 men.
Summary:
The goodness-of-fit or GoF test determines
how well a multinomial model (more than two categories) matches the sample data.
This part of the MATH200A program computes the χ² test statistic and the
p-value and graphs the distribution. (Though the TI-84 has a GoF test,
but it still makes you do part of the computation by hand.)
MATH200 students: Use this part of the MATH200A program
with section 12.1 of Sullivan, Michael, Fundamentals of Statistics (Pearson Prentice Hall, 2008).
TI-89 users: Please see Testing Goodness of Fit on TI-89.
To perform a goodness-of-fit test, first
put your model in L1 and your observed numbers in L2.
Caution: the model is usually percents or ratios, but the
observed numbers are always the actual counts in whole numbers, never
percents.
Then run the MATH200A program and select 7:GoF test.
The program will ask you to confirm
that you’ve filled the two lists, and then it will perform all
computations for the χ² goodness of fit and show the results on
a graph.
The expected numbers are left in L3, and you should verify that they meet the requirements for a χ² test: none of them <1, and no more than 20% of them <5.
Example 1—shifting populations from Sullivan, Michael, Fundamentals of Statistics (Pearson Prentice Hall, 2008), Example 3, pages 555–556.
| Region | 2000 Census | current sample |
|---|---|---|
| Northeast | 19.0% | 274 |
| Midwest | 22.9% | 303 |
| South | 35.6% | 564 |
| West | 22.5% | 359 |
“An urban economist wonders if the distribution of residents in the United States is different today than it was in 2000. In 2000,” the proportions were as shown in the table. “The economist randomly selects 1500 households and obtains the frequency distribution shown. ... Conduct the appropriate test to determine if the distribution of residents in the United States is different today from the distribution in 2000, using the α = 0.05 level of significance.”
Solution: H0 is that the model is still good, and H1 is that the distribution of U.S. population has changed. The model is the percentages shown, and you are determining whether the current sample is enough different from the model for you to conclude that the model is no longer correct.
Put the percentages in L1; there’s no need to convert
them to decimals but if you do then you must convert all of them. The
current sample is the Observed numbers, and they go in L2.
Caution: The Observed numbers must always be actual counts; never convert them to percentages.
Caution: Some problems will give you total observations, but you never enter the totals in your calculator.
Run the MATH200A program, select 7:GoF test, and press [9] to confirm
that you have data in the correct lists.
The results screen is shown at right. The χ² test statistic is
8.25, with three degrees of freedom (four categories minus one). The
p-value is 0.0410.
Conclusion: Since p<α, you reject H0 and accept H1. At the 0.05 level of significance, you conclude that the regional distribution of U.S. residents is different today from what it was in 2000.
Note: Your book rounds its calculations at various stages, so its χ² test statistic may be slightly less accurate than yours. Your book also uses tables to look up p-values, where your calculator uses a more accurate method of computation. So don’t worry if your numbers don’t match the book’s exactly — yours are better.
The screen also reminds you to check L3, the
Expected numbers, to make sure that the requirements for a χ²
test are met. In fact, there’s useful information in L3 and
L4, as shown at right.
L3 contains the expected numbers. Always check them after the computation to make sure that none of them are below 1 and no more than 20% of them are below 5. Here, all are well above 5.
L4 shows how far off the Observed numbers are, taking into account the weights of Expected numbers in the model. Or you can say that L4 shows each category’s contribution to χ². You can see that the most important deviation is in the second category (4.78), and the least important is in the first category (0.42). The total of L4 is the χ² test statistic, which you’ve already seen has a value of 8.25 for this test.
Example 2—equal frequencies from Sullivan, Michael, Fundamentals of Statistics (Pearson Prentice Hall, 2008), Example 4, pages 556–558.
| Sample of Birth Records | |
|---|---|
| Day of Week | Freq- uency |
| Sunday | 57 |
| Monday | 78 |
| Tuesday | 74 |
| Wednesday | 76 |
| Thursday | 71 |
| Friday | 81 |
| Saturday | 63 |
“An obstetrician wants to know whether or not the proportion of children born each day of the week is the same. She randomly selects 500 birth records and obtains the data shown.” Clearly the frequencies in the sample vary from day to day, but do they vary enough that you can say babies in general are born with different frequencies on different days, at the 0.05 level of signifcance?
Solution: H0 is that babies are born with equal frequency on all days of the week, and H1 is that they are not. You’re not given a numerical model because “equal frequencies” means that all model numbers are the same. Since there are seven categories, your model is seven 1’s in L2. The Observed numbers go in L2, as usual.
The results are shown at right. Here df = 6 because there
are seven categories. The χ² test statistic is 6.18, and the
p-value is 0.4029. That is greater than α, and so you fail to
reject H0.
Writing non-conclusions is problematic when p>α in a χ² test. Strictly speaking, the conclusion should be in neutral language as usual: you can’t determine from the data whether babies are born with equal frequency on all the days of the week or not. But traditionally, the conclusion is often stated in some words equivalent to “the data do not rule out the model”.
This is the scientific method. If the experiment is repeated multiple times and H0 is never rejected, we begin to have more and more confidence that H0 is actually true. We can’t accept H0 from a single experiment, but the more times it’s not rejected, the more we believe that it may never be rejected.
28 Dec 2009: Add “MATH200A Program” to the document title, and make a few small text changes for clarity.
11 Nov 2009 (program version 4): This program and the companion MATH200B Program — Extra Statistics Utilities for TI-83/84 replace the old division into descriptive and inferential statistics. It’s not strictly logical, but it simplifies things for my students because everything required is now in a single menu. Additional changes:
1:Histograms etc and 2:Box-whisker now ask data
arrangement first, then data and frequency lists with more appropriate
prompts.6:Sample size asks for the margin of error and confidence level last
instead of before the data type.7:GoF test now checks that the Observed numbers are whole
numbers, and it no longer uses L5.Here’s part of the earlier history of the pages now in
this page and the programs now part of the MATH200A program:
HISTNPGN on 20 Jan 2008 and documented in
Frequency Polygons on TI-83/84. On 7 Jun 2008 the program was moved to
a new document, Frequency Histograms and Polygons the Easy Way
on TI-83/84; on 21 Sep 2008 the
program was rewritten to accommodate lists of numbers as well as
frequency distributions and was renamed HISTNPG2. When the
program was consolidated into the MATH200A program, I tightened up the error
checking and no longer allowed a frequency polygon for a simple list
of numbers.BINOMPRB
program was created, and that program has now become the 3:Binomial prob
selection in the MATH200A program.4:Binom PD histo were written for this
Web page on 6 Dec 2008.5:Normality chk part of the MATH200A program started as the TI-83 program
NORMCHEK in October 2007.MULTINOM.home page | problems with viewing?
This page is used in instruction at Tompkins Cortland Community College in Dryden, New York; it’s not an official statement of the College. Please visit www.tc3.edu/instruct/sbrown/ to report errors or ask to copy it.
For updates and new info, go to http://www.tc3.edu/instruct/sbrown/ti83/