The standard deviation can be calculated for a population and a sample data set.  The population data set takes into account all pieces of data that meet the qualities for that set of information.  The sample data set includes a random sampling of the pieces of data that fulfill the qualities for the set of information. 

 

The standard deviation is an important piece of data that describes the measure of spread or distribution for a set of data.  The standard deviation is measured relative to the mean of the data set and is essentially a distance measured above and below the mean within which a certain number of the pieces of data are included.  Since the data in any set can vary greatly, only a certain percentage of data is likely to be present in this area.  When each additional standard deviation is added above and below the original standard deviation, even more pieces of data are included. 

 

When we have a set of data, we can make estimations of the percentage of pieces of data included in one, two or three standard deviations.  For example, we can apply Chebyshev’s theorem for any data set.  For any data set, Chebyshev’s theorem states that minimally 75 % of the pieces of data are included in the range of two standard deviations above and below the mean.  The theorem further states that 89 % of data pieces are included within three standard deviations above and below the mean.  There is also an empirical rule that applies to a data set that when displayed as a bar graph resembles a bell-shaped curve.  In this rule, we can expect 68 % of the data sets to be included in one standard deviation above and below the mean and respectively, 95 % of data in two standard deviations and 99.7 % in three standard deviations. 

 

The sample standard deviation for a data set can be calculated as follows.

 

S = [ Σ(x – X)2  / (N – 1) ] 1/2

 

S is the standard deviation of a sample set, x is the piece of data of interest, X is the mean of all pieces of data, Σ is the summation for all pieces of data and N is the number of pieces of data.

 

An example calculation will illustrate this concept.

 

We have a sample set of 10 Calculus students out of a class of 105 and we want to determine the standard deviation of their midterm test scores.  The students’ scores are as follows: 87, 69, 83, 84, 58, 78, 95, 89, 74, 76.

 

We will calculate the sample standard deviation using, S = [ Σ(x – X)2  / (N – 1) ]1/2 .

 

The mean, X = (87 + 69 + 83 + 84 + 58 + 78 + 95 + 89 + 74 + 76) / 10 = 79.3

 

The total number of data sets, N = 10

 

Σ(x – X)2 = (87 – 79.3)2  + (69 – 79.3)2  + (83 – 79.3)2  + (84 – 79.3)2  + (58 – 79.3)2  +

(78 – 79.3)2 + (95 – 79.3)2 + (89 – 79.3)2 + (74 – 79.3)2 + (76 – 79.3)2

 

Σ(x – X)2 = 59.29 + 106.09 + 13.69 + 22.09 + 453.69 + 1.69 + 246.49 + 94.09 +

28.09 + 10.89 = 1036.10

 

S = [ Σ(x – X)2  / (N – 1) ] 1/2

S = [1036.10 / (10 – 1)] 1/2

S = (115.12) 1/2

S = 10.73

 

If we consider Chebyshev’s theorem, we can assume that 75 % of the test scores will be included in the range of two standard deviations above and below the mean.  This means the upper limit of the range is 79.3 + (2)(10.73) = 100.76 (of course in general unless there is extra credit the scores will not exceed 100) and the lower limit of the range is

79.3 – (2)(10.73) = 57.84.

 

 

REFERENCE:

Triola, Mario F. (1992).  Elementary Statistics (5th ed.).  USA: Addison-Wesley Publishing Company, Inc.