Ways to Find the Average

Given a set of data, one question that naturally arises is "What is the center?" In statistics the center is known as the average, and there are several different ways to measure this feature of a data set.

The Mode

The mode of a data set is the value that occurs with the most frequency. This measurement is crude, yet is very easy to calculate. Suppose that a history class of eleven students scored the following (out of 100) on a test: 60, 64, 70, 70, 70, 75, 80, 90, 95, 95, 100
We see that 70 is in the list three times, 95 occurs twice, and each of the other scores are each listed only once. Since 70 appears in the list more than any other score, it is the mode. If there are two values that tie for the most frequency, then the data is said to be bimodal.

The Mean

The mean of a data set is what is most often meant by the common use of the word average. It is found by first adding all the data values, then dividing by the number of values. The sum of the test scores above is 869, and there are eleven scores. Thus the mean is 869/11 = 79.

The Median

The median of a set of data is calculated by listing the data in ascending order, then finding the point that is exactly in the middle. In the test score data 75 has five scores above it and five scores below it. Since 75 is the center value of the data, it is the median.
An important exception to this process occurs if there is an even number of data values. In this situation there are two, not one, values in the center. If you are faced with such a situation and want to find the median, calculate the mean of the two center values.
The following are heights in inches of the boys in a first grade class: 45, 47, 47, 48, 50, 51, 52, 53 Since there are eight heights listed, there are two values, 48 and 50, in the center of the data. The mean of 48 and 50 is (48+50)/2=49. Thus 49 is the median, even though it is not in the list of values.

The Midrange

Another measure of center that is not as popular as the mode, mean and median is called the midrange. The midrange is the mean of the maximum and minimum values of the data set. The data set 3, 4, 10, 11 has maximum 3 and minimum 11, so the midrange is (3+11)/2 = 7

Which Average to Use?

The data being studied influences which of the four measures of average is most meaningful. The presence of outliers, can dramatically skew the mean and midrange. Consider this list of hourly wages for employees at a small company: 10, 10, 12, 50
What does the average employee earn? It all depends on what we mean by "average." The mode is $10/hour, the median is $11/hour, the midrange is $30/hour and the mean is $20.50/hour. Although all three are measurements of average, the presence of the one high value has skewed the mean. As a result the mean is not as helpful as the mode and median in determining the center of this data.

Other Remarks

The mode is easy to calculate, but sometimes is not all that helpful. One instance of this is when there is a data set in which all of the values are different. Since none of the values occur more often than any of the others, there is no mode.
The midrange is easy to calculate, but is not used very often because it is dependent on only two values.
When trying to find the center of a set of data, pay careful attention to the presence of outliers. Always remember that there are multiple ways to talk about the center of the data.

No comments:

Post a Comment