An Introduction to the Bell Curve

After I give a test in any of my classes, one thing that I like to do is to make a graph of all the scores. I typically write down 10 point ranges such as 60-69, 70-79, and 80-89, then put a tally mark for each test score in that range. Almost every time I do this, a familiar shape emerges. A few students do very well and a few do very poorly. A bunch of scores end up clumped around the mean score. Different tests may result in different means and standard deviations, but the shape of the graph is nearly always the same. This shape is commonly called the bell curve.
Why call it a bell curve? The bell curve gets its name quite simply because its shape resembles that of a bell. These curves appear throughout the study of statistics, and their importance cannot be overemphasized.

What Is a Bell Curve?

To be technical, the kinds of bell curves that we care about the most in statistics are actually called normal probability distributions. For what follows we’ll just assume the bell curves we’re talking about are normal probability distributions. Despite the name “bell curve,” these curves are not defined by their shape. Instead an intimidating looking formulais used as the formal definition for bell curves.
But we really don’t need to worry too much about the formula. The only two numbers that we care about in it are the mean and standard deviation. The bell curve for a given set of data has center located at the mean. This is where the highest point of the curve or “top of the bell“ is located. A data set‘s standard deviation determines how spread out our bell curve is. The larger the standard deviation, the more spread out the curve.

Important Features of a Bell Curve

There are several features of bell curves that are important and distinguishes them from other curves in statistics:
  • A bell curve has one mode, which coincides with the mean and median. This is the center of the curve where it is at its highest.
  • A bell curve is symmetric. If it were folded along a vertical line at the mean, both halves would match perfectly because they are mirror images of each other.
  • A bell curve follows the 68-95-99.7 rule, which provides a convenient way to carry out estimated calculations:
    • Approximately 68% of all of the data lies within one standard deviation of the mean.
    • Approximately 95% of all the data is within two standard deviations of the mean.
    • Approximately 99.7% of the data is within three standard deviations of the mean.

An Example

If we know that a bell curve models our data, we can use the above features of the bell curve to say quite a bit. Going back to the test example, suppose we have 100 students who took a statistics test with mean score of 70 and standard deviation of 10.
The standard deviation is 10. Subtract and add 10 to the mean. This gives us 60 and 80. By the 68-95-99.7 rule we would expect about 68% of 100, or 68 students to score between 60 and 80 on the test.
Two times the standard deviation is 20. If we subtract and add 20 to the mean we have 50 and 90. We would expect about 95% of 100, or 95 students to score between 50 and 90 on the test.
A similar calculation tells us that effectively everyone scored between 40 and 100 on the test.

Uses of the Bell Curve

There are many applications for bell curves. They are important in statistics because they model a wide variety of real world data. As mentioned above, test results are one place where they pop up. Here are some others:
  • Repeated measurements of a piece of equipment
  • Measurements of characteristics in biology
  • Approximating chance events such as flipping a coin several times
  • Heights of students at a particular grade level in a school district

When Not to Use the Bell Curve

Even though there are countless applications of bell curves, it is not appropriate to use in all situations. Some statistical data sets, such as equipment failure or income distributions, have different shapes and are not symmetric. Other times there can be two or more modes, such as when several students do very well and several do very poorly on a test. These applications require the use of other curves that are defined differently than the bell curve. Knowledge about how the set of data in question was obtained can help to determine if a bell curve should be used to represent the data or not.

No comments:

Post a Comment