Probabilities and Cancer Screening
(Extra Credit Assignment)
Copyright © 2003–2008 by Stan Brown, Oak Road Systems
Copyright © 2003–2008 by Stan Brown, Oak Road Systems
Summary: Surprisingly, most doctors are very bad at explaining probabilities to patients, specifically: if you got a positive test result, how likely is it that you actually have the condition being tested for? In this assignment you’ll do an exercise to learn how to answer such questions for breast cancer (based on mammogram screening) and colorectal cancer (based on the hemoccult test). The purpose is to help you make better sense of risks when your doctor gives probabilities to you.
Directions: If you wish to do this optional assignment, don’t hand in this sheet but work Problems 1–3 on separate paper. Show any calculations, as usual.
A mammogram is an X-ray picture of the breast, looking for possible tumors. Here are the data for women aged 40 to 50 in a particular region of the country who do not have any symptoms:
The probability that [any] one of these women has breast cancer is 0.8 percent. If a woman has breast cancer, the probability is 50 percent that she will have a positive mammogram. If a woman does not have breast cancer, the probability is 7 percent that she will still have a positive mammogram. Imagine a woman who has a positive mammogram. What is the probability that she actually has breast cancer? [1]
The hemoccult test looks for blood in the stool, and is a method of detecting colorectal cancer. For a particular region of the country, here are data for people over 50 who have no symptoms:
The probability that [any] one of these people has colorectal cancer is 0.3 percent. If a person has colorectal cancer, the probability is 50 percent that he will have a positive hemoccult test. If a person does not have colorectal cancer, the probability is 3 percent that he will still have a positive hemoccult test. Imagine a person (over age 50, no symptoms) who has a positive hemoccult test in your screening. What is the probability that this person actually has colorectal cancer? [2]
Problem 1: Based on the data above, if someone has a positive hemoccult test, what is the probability (in percent) that they actually have colorectal cancer?
Be sure to answer this one before you read further! You can answer freely: I’m interested in seeing some rational basis for your answer, not wheher the answer is right.
Okay, got your answer? If not, please stop reading and make an honest stab at it.
We’ll calculate the correct answer below, and you’ll see how easy it is if you approach it right. But first, you might want to know that the most common answer (from 7 out of 24 doctors) was 50%, more than 10 times the correct answer; and the second most common answer was almost as bad.
In principle you could have computed the correct answer by using the laws of conditional probability. But Gigerenzer’s point is that it’s a lot easier to do with natural frequencies. Here’s how it works. Calculations are shown on the left; on the right are statements as you might explain them to someone who does not understand conditional probabilities very well.
| 0.3% = .003 .003×10,000 = 30 |
“Thirty out of every 10,000 people have colorectal cancer. |
| 50% × 30 = 15 | “Of those 30 people with colorectal cancer, 15 will have a positive hemoccult test. |
| 10,000−30 = 9970 | “Of the remaining 9970 people without colorectal cancer," |
| 3% = .03 .03×9970 = 299 |
299 “will still have a positive hemoccult test." |
| “Now imagine a sample of people (over age 50, no symptoms),
who have positive hemoccult tests. ... How many of these people
actually have colorectal cancer?
_____ out of _____” [3]
You should now see your way clear to completing the answer: | |
| 15+299 = 314 | Out of 10,000 people who are screened, about 314 will have a positive result, and about 15 of them actually have colorectal cancer. |
| 15/314 = 0.048 | Therefore, a person with no symptoms but a positive hemoccult result has a 4.8% chance of colorectal cancer. |
You remember that many doctors gave an answer of 50% when the question was posed in terms of probabilities. But when it was written in terms of frequencies, as it was above, a majority were able to give the correct answer. The lesson here is that some probability problems are much easier to solve in frequencies, especially when conditional probabilities are involved.
It can be even easier to interpret with a
frequency tree. [4]
Listing all categories in this way, with numbers of people, makes it a
snap to compute the probabilities.
When you work a problem in terms of frequencies, you can always put it back into probability notation:
P(cancer | positive) = 15/314 = 4.8%
This may help you see how the first group of doctors made their mistake. They were given the probability that a person with colorectal cancer gets a positive test result:
P(positive | cancer) = 50%
and they mistakenly assumed that P(A|B) and P(B|A) are the same thing. The more unlikely one event is, the more different P(A|B) are and the harder it seems to be to think about the probabilities.
A word on rounding: If you followed the calculations, you noticed that any decimal results were quietly rounded to whole numbers. This avoids getting into the whole “.2 of a person” side issue, and usually makes little or no difference to the final answer. Anyway, the probabilities tell you that the numbers will be distributed about this way. The actual numbers will of course vary somewhat from one group of 10,000 people to the next.
People often find it easier to think about numbers that end in zeroes. The exact value of .03×9970 is 299.1, which properly is rounded to 299. But if instead you round it to 300, many people will find it easier to think about. The error is small, and in fact the probability 15/315 is almost equal to 15/314.
Problem 2: Draw the frequency tree for positive mammograms and breast cancer.
Problem 3: Go back to the data on breast cancer. You must give a woman, who has no symptoms, the news that her mammogram came up positive. What is the probability that she has breast cancer? Answer the question first in terms of frequencies and then as a probability.
Please see Recommended Books for this and other book suggestions.
home page | problems with viewing?
This page is used in instruction at Tompkins Cortland Community College in Dryden, New York; it’s not an official statement of the College. Please visit www.tc3.edu/instruct/sbrown/ to report errors or ask to copy it.
For updates and new info, go to http://www.tc3.edu/instruct/sbrown/stat/