|why people need to understand statistics|
A recent article in the New York Times, Do We Understand What Makes Us Healthy?, gives a nice overview of the field of epidemiology, but points out that people (including doctors and scientists) often misinterpret or misunderstand its results. For instance, there were studies showing that hormone replacement therapies helped women avoid a whole bunch of problems. Those studies were not incorrect, but they were short term studies. What we found out later was that in the long term, these therapies caused other problems that were possibly greater than the ones avoided.
It is important (and of growing importance every day) for people to understand how statistics works in order to make intelligent choices. It should be clear, for example, to anyone starting a new drug that a few years of research done to test the drug cannot rule out the possibility that it will have a negative long-term side effect. You have to consider this as a possibility.
This all reminded me of my favorite "misunderstanding statistics" story.
Consider these results of two real studies:
* A group wanted to figure out which kinds of schools were good, so they compared all schools and looked at those for which a very large percentage of the students scored high on national exams or got into top universities. They found that SMALL schools (those with few students) were the ones that had the highest percentages of students excelling.
* Another group wanted to see where cancer rates were high. So they looked at every 10 square miles of the USA and counted up the percentage of people living there who had developed cancer. They found -- surprisingly -- that RURAL areas were the areas having the highest cancer rates.
What should we conclude from these studies?
You might be tempted to say that "small schools are better" and "people are more likely to get cancer if they live in a rural area"...but you'd be wrong.
Don't worry. Lot's of people make this mistake, including people who should know better (such as Bill Gates, who based his donations to schools on this incorrect conclusion about the first study.)
First, let me convince you that these conclusions would be incorrect.
You could do the same studies but ask the opposite questions: which schools have the highest percentages of low performing students? Which square miles have the lowest cancer rates?
As I'll explain in a moment, you find again that small schools are much more likely to have high percentages of students doing poorly and that rural areas are much more likely to have a low percentage of people having cancer!
Consider this thought experiment: Suppose I take 100 people and tell them each to flip a coin 37 times and record the percentage of flips that result in heads. Suppose I take another 100 people and tell them also to flip a coin and record the percentage of heads, but tell them to only flip the coin TWICE.
Now, the question is, if I look for people who had a HIGH percentage of heads, where will I find them? In the group that only flipped twice. In fact, about 1/4 of them will have gotten 100% heads (getting heads two times in a row is not so rare). In the group that flipped 37 times, almost none of them will have gotten 100% heads (getting heads 37 times in a row would be a very rare event). But it does not make sense to conclude from this that heads are more likely if you flip few times. Heads are still no more likely than tails. (The group that just flipped twice will also have a lot of people who got 100% tails while almost nobody in the other group will have gotten that!)
The point is that when the "sample size is small" (when you're looking at a small school or an area where few people live) there is a much higher chance of seeing extremely unusual populations. That's why you can't prove something using one small scientific experiment...you need to repeat it many times. (Or, another way to think about it: if there is a square mile where only one man lives and he has cancer, then there's a 100% cancer rate there. If there's a square mile where only one woman lives and she doesn't have cancer, there's a 0% cancer rate there. But a square mile in a city with thousands of people living in it will never have as high -- or as low -- of a cancer rate.)
Anyway, I'm running out of time -- I've got to get back to my conference. I hope nobody concludes from this that statistics is scary, wrong or useless. Statistics is a very powerful, useful and important tool...but you have to use it correctly!