Triage: Which Inferential Stats Case Should I Use?
Copyright © 2007–2012 by Stan Brown, Oak Road Systems
Summary:
How do you know
which hypothesis test or confidence
interval to use? This page leads you through a series of decisions
to a specific numbered case, cross referenced to
Inferential Statistics Cases.
The interactive version is much easier
to use. Please visit http://www.tc3.edu/instruct/sbrown/stat/sbrown/stat/castriag.htm.
If you print this page, it will appear
in a more compact form but you’ll lose the interactivity.
See also:
For a chart with many more tests, see Harvey Motulsky’s
Intuitive
Biostatistics: Choosing a Statistical Test (accessed 2009-12-28).
Start Here
What type of data do you have?
- Numeric — Each individual contributes a number (discrete or continuous); typical inferences are about means — go to Node 100
- Binomial — Each person answers a yes/no question, or each individual either has or doesn’t have a particular trait; inferences are about proportions — go to Node 200
- Categorical — Each individual has a non-numeric trait with multiple possible answers, like hair color or marital status — go to Node 300
Node 100. Numeric data
What population parameter are you trying to make inferences about?
Node 110. Numeric data, pop. mean(s)
How many samples or populations are there?
- One — This includes the case where you have a fixed reference point in a different population. Example: “In 1990 the mean household income was $39,045. A recent survey of 500 households found a mean of. ...” The recent survey is a sample, but the 1990 value is not a sample, just a number to test against. — go to Node 120
- Two, paired data — go to Case 3
Caution: In paired data, you get two numbers from each
individual or from each “team” (twins, husband/wife, etc.)
- Two, unpaired data — go to Case 4
Caution: In unpaired data, you have two unrelated groups,
and you get one number from each person in each group.
- Three or more — go to Case 8
Requirements: 1. Samples are independent. 2. Data are
normally distributed. 3. All populations have same σ.
(The test is robust, so moderate departures from requirements 2 and 3
are okay, especially if sample sizes are equal or nearly equal.)
Node 120. Numeric data, one pop. mean
Do you know the standard deviation of the population?
- No — go to Case 1
- Yes — go to Case 0
Caution: Do you really know the standard deviation of
the population? When “a survey found a mean of 800 and a
standard deviation of 45”, that’s a sample standard
deviation just like it’s a sample mean.
Node 150. Numeric
data, pop. standard deviation(s) or variance(s)
How many populations are there?
- One — go to Case 1S
Requirement: Population must be normally distributed, not
just roughly normal.
- Two — go to Case 4S
Requirements: 1. Samples are independent.
2. Populations must be normally distributed, not
just roughly normal.
Node 200. Binomial (yes/no) data
How many samples or populations are there?
- One — go to Case 2
Caution: This includes the case where you have a fixed
reference point in a different population. Example: “In 1990,
68% of Americans felt pessimistic about their financial future. A
recent survey of 1500 Americans found that 1089 of
them. ...” The recent survey is a sample, but the 1990
value is not a sample, just a number to test against.
- Two — go to Case 5
- Three or more — go to Case 7
This is a 2-way table, testing homogeneity or independence.
Node 300. Categorical data
How many populations are there?
- One — go to Node 350
- Two or more — go to Case 7 (test of homogeneity)
Node 350. Categorical
data for one population
How many variables are there?
- One — go to Case 6
Here you have one row or column of numbers, representing the
number of individuals with each value of the trait. For example, if
the trait is hair color then you would have an observed number of
blonds, an observed number of brunets, an observed number of redheads,
and so on. You test that against a model of expected percentages or
ratios.
- Two — go to Case 7 (independence)
Here you have a two-way table of one population. The rows
represent levels of one trait, such as educational level, and the
columns represent a second trait, such as marital status.
What’s New
- 21 May 2011: Add text explaining choices for data type,
one or two populations, paired or unpaired data, and number of
variables;
remove requirements that duplicate Inferential Statistics Cases
- (intervening changes suppressed)
- Nov 2007: initial version on the Web
to home page
This page is used in instruction at
Tompkins Cortland Community College in Dryden, New
York; it’s not an official statement of the College. Please visit
www.tc3.edu/instruct/sbrown/
to report errors or ask to copy it.
For updates and new info, go to
http://www.tc3.edu/instruct/sbrown/stat/