Grading on the curve: the central limit theorem.

topSometimes there are mathematical truths that transcend reality, facts that follow from pure logic that would be true in any imaginable universe. This post is about such a fact that is also useful. We will also look at what it means to “grade on the curve” and talk about an ubiquitously important mathematical distribution (“the curve”) which is more properly called the normal distribution or the Gaussian distribution after the incomparable Carl Friedrich Gauss. The normal distribution is shown at the top of the post.

The normal curve has a large central hill, and tails off in both directions. The position, on the x-axis, of the peak of the central hill is the average value that sampling this distribution yields (in the picture above, the average is zero). The average is given by the symbol μ. The degree to which the numbers obtained by sampling this distribution jump around is quantified by the standard deviation, multiple of which appear on either side of the average, using the symbol σ. These two symbols, μ for average and σ for standard deviation, are ubiquitous in statistics. When we have μ=0 and σ=1, we say a normal distribution is a standard normal distribution. The areas under a chunk of the curve are probabilities, chances that a number sampled according to the distribution will be in the range covered by the area.

The standard deviation σ lets us measure the spread of the curve. When σ=1, the curve has the classical “bell” shape. When σ is smaller than one, then the distribution becomes taller and more narrow. When σ is more than one, the hill flattens and spreads out. The total area under the curve is always one — which is what forces it to get tall when narrow or short when it is wide.

sad

Grading on the curve: careful what you ask for!

“Grading on the curve” has come to mean, in many classrooms, “please show mercy!” The original, technical meaning is based on the partition of the normal curve shown at the top of the post. If we assume that students get problems right or wrong in proportion to their skill, then it is not unreasonable to assume that their scores will closely follow some kind of normal distribution. If we take the students’ scores and (i) first subtract the average of their scores and then (ii) divide by the standard deviation of their scores, then that will turn the scores into a standard, normal distribution. The students with a score at least 2σ above the mean get an A. Those at least σ above average (but too low for an A) get a B. The students within σ of average get a C. The grades D and F are given symmetrically with B and A, respectively. Since we are currently in a period of grade inflation, this traditional curve may actually seem rather harsh.

B20

The division of the curve into grade zones can be modified to create other curves that are less harsh. The mathematical question, however, is why is that curve appropriate, however we divide it. Suppose we flip twenty coins and count the number of heads. Then we get a number between 0 and 20, but the numbers in the middle are much more common. If we perform the “flip a coin 20 times” experiment thousands of times we get the outcome shown above. It is very close to being a normal distribution, but one with μ=10 and σ=√5. The distribution for flipping coins is the well-known binomial distribution. The first clue we had that something like the central limit theorem exists was that, if you flip a lot of coins, the binomial distribution gets closer and closer to being a normal distribution. This means, when we flip many coins, we may use the normal distribution to estimate the number of heads.

If we had any experiment that could come out two ways, then we can use the same math that we use for flipping coins. Inject sick rats with a serum and record if they got better or not — that’s a “coin flip” type experiment and, if the rats are kept apart (and so are independent), then a large number of rats can be modeled with the normal distribution. What leads to the central limit theorem is that other and in fact most types of independent experiments have this same property that adding up or averaging many experiments leads to a normal-distribution situation.

What does the central limit theorem say?

If we do an experiment that has a random numerical outcome many times, and if the different instances of the experiment don’t affect one another (are independent), then the average of all the outcomes is very close to having a normal distribution. That is the central limit theorem. It also explains why student scores might follow a normal distribution: the different students are a collection of independent experiments. Cheating by copying from another student creates a lack of independence — and that lack is one way that cheating is detected — but since most students are honest, average scores do follow the shape of the normal distribution.

Other than grading things, why does this matter?

There is a huge number of different statistical distributions. The link leads to most of the famous ones, but there is another issue. When we design an experiment and measurements on that experiment, it is somewhere between a little tricky and impossible to figure out which statistical distribution is the right one to describe the outcomes. If we do an experiment several times, however, no matter what the distribution is, the average of a measurement over multiple experiments follows a normal distribution. This lets us assess, compare, and understand outcomes even if we are not smart enough to figure out the actual distribution for our measurement. This is a subtle point, but an incredibly important one.

Occupy Math wants to be clear: if we can figure out the actual statistical distribution for an experimental measurement, that gives us more power. The central limit theorem is a universal “Plan B” that lets us advance in spite of our ignorance. Universal truths are usually a bit limited and the central limit theorem is a good example. On the other hand, the central limit theorem is so useful that many generalizations of it, that address questions like “what if the experiments are not independent?”, have been constructed.

Occupy Math’s editor raised an important point. Suppose that students taking a test have an effective study group. This means they will get better grades, but it also destroys the “independence” hypothesis. This is an example where lack of independence may be a good thing. The post may have made it seem like independence is good — but what is actually going on is that independence makes the math easy, one type of “good”, but others may be more important.

Arrow Graphic Chart Symbol Bar Statistics

Final words on statistics

In many ways, statistics is one of the fields inside of mathematics. It is closely related to probability theory but it does differ from the rest of math in a couple ways. The biggest difference is that statisticians have a different attitude from most mathematicians. They deal with probability and uncertainty while the rest of math has more certainty than any other intellectual endeavor. Another difference is that the majority of mathematicians are theorists and the majority of statisticians do applied work. Six-sigma, quality control, quality assurance, planning, and forecasting are just a few of the fields that use statistics. Experimental science is a huge consumer of statistics.

If you are going to need statistics done on data you are going to gather, you life will probably be much easier if you talk to the statistician while planning how to gather your data. This is related to the statistical discipline of design of experiments. Occupy Math hopes this post has given you some perspective on what sort of power statistics has and what it can be used for. Comments? Thoughts? Requests? Please comment!

I hope to see you here again,
Daniel Ashlock,
University of Guelph,
Department of Mathematics and Statistics

Leave a comment