Press enter or spacebar to select a desired language.
Coquitlam Library
Wednesday: 8:00 am - 9:00 pm
PDT
New Westminster Library
Wednesday: 8:00 am - 9:00 pm
PDT
Learning Centre New Westminster
Wednesday: 9:30 am - 5:30 pm
PDT
Learning Centre Coquitlam
Wednesday: 9:30 am - 5:30 pm
PDT
Press enter or spacebar to select a desired language.

statistics

Measures of Variability

Many courses require that students in undergraduate degree programs have a basic understanding of descriptive statistics. Descriptive statistics are statistics that collect, summarize, classify and present data. This guide gives you an overview of one type of descriptive statistics, measures of variability.

Measures of variability are measures that allow you to determine the degree of variation within a population or sample, determine how representative a particular score is of a data set, and determine the scope and validity of any generalizations you wish to make based on your research observations. The measures of variability discussed in this handout are:

  • Range
  • Variance
  • Standard Deviation


Range

The range is the difference between the highest and lowest scores in a distribution. It is calculated by subtracting the lowest score from the highest score.

When is the range useful?
The range gives you a rough guide to the variability in a data set, as it tells you how a particular score compares to the highest and lowest scores.

Example:

14 40 42 47 49 51 71 81

(81 - 14) = 67

The range for this distribution would be (81 – 14) = 67. However, as you can see, 14 is an outlier and skews this distribution because, if it were not in the distribution, the range would be (81 – 40) = 41.

How else is the range useful
The range gives you a rough guide to the variability in a data set, as it tells you how a particular score compares to the highest and lowest scores within a data set. The range gives you only a limited amount of information, as data sets which are skewed towards a low score can have the same range as data sets which are skewed towards a high score, or those which cluster around some central score.


Sum of Squares

The sum of squares is a measure of variance or deviation from the mean.

It is calculated by summing of the squares of each score’s difference from the mean. The total sum of squares is another sum of squares; it considers not only the sum of squares from the factors, but also from randomness or error because the squaring of each score rids the equation of negative numbers. As you can see in the above example showing how to obtain the variance, step 5 requires you to find the sum of squares (SS).

EXAMPLE:

sum of squares formula

  1. Set of data: 2, 4, 6, 8
  2. Square each data point and add them together: 22 + 42 + 62 + 82= 4 + 16 + 36 + 64 =120
  3. Add together all of the data and square this sum: (2 + 4 + 6 + 8)2 = 400
  4. Divide this by the number of data points to obtain 400/4 =100
  5. We now subtract this number from 120: 120 - 100 = 20
  6. This gives us that the sum of the squared deviations is 20.


Standard Deviation (SD)

The standard deviation is the square root of the variance.

formula SD is the square root of the variance

Unlike the variance, the standard deviation is measured in the same units as the raw scores themselves; therefore, one cannot just use the variance. This is what makes the standard deviation more meaningful. For example, it would make more sense to discuss the variability of a set of IQ scores in IQ points than in squared IQ points because they would not be congruent with the score's meaning. 

Variables

a chart of variables

EXAMPLE:

Data Set: 2, 4, 4, 4, 5, 5, 7, 9

Find the mean n = 5

(2+4+4+4+5+5+7+9)/n=(2+4+4+4+5+5+7+9)/5=5

Calculate the deviations of each data point from the mean, and square the result of each:

(2-5)2 = (-3)2 = 9

(4-5)2 = (-1)2 = 1

(4-5)2 = (-1)2 = 1

(4-5)2 = (-1)2 = 1

(5-5)2 = (0)2 = 0

(5-5)2 = (0)2 = 0

(7-5)2 = 22 = 4

(9-5)2 = 42 = 16

The variance is the mean of these values:

(9+1+1+1+0+0+4+16)/n=(9+1+1+1+0+0+4+16)/8=4

Standard deviation is equal to the square root of the variance:

√4 = 2


Variability

Variance is the degree to which scores vary from their mean. The variance uses every score in the data set. The variance is calculated by getting the average of the squared deviations from the mean.

Variance Equation

Population Size  Sample Size

population size formula

or

sample size formula

To calculate the variance for a set of quiz scores:

  • Find the mean (M).
  • Find the deviation of each raw score from the mean (D). To do this, subtract the mean from each raw score. To check your calculations, sum the deviation scores. This sum should be equal to zero. **Note that deviation scores below the mean will be negative.
  • Square the deviation scores (SS). We do this because by squaring the scores, negative scores are made positive and extreme scores are given relatively more weight.
  • Find the sum of the squared deviation scores.
  • Divide the sum by the number of scores. This yields the average of the squared deviations from the mean, or the variance Range.

See example


Other Exercises and Resources

Skewed Distribution: Examples & Definition

Descriptive Statistics

Outlier in Statistics

Skewed Distribution

Variance Example Problem

Example Population Size:

Data Set: 5, 6, 7, 11, 12, 12, 13, 14, 18, 19, 21, 21, 22, 24, 35, 35, 50

Step 1: Find the mean

= 18

mean(n) = 5+6+7+11+12+12+13+14+18+19+21+21+22+24+35+35+50/18=360/18=20

Step 2 and 3: Find the deviation (D) from each score and square the deviation scores (SS).

D = x-μ

SS = (x-μ)2

Deviation Sum of Squares
5 - 20 = (-15) (-15)= 225
6 - 20 = (-14) (-14)= 196
7 - 20 = (-13) (-13)2 = 169
11 - 20 = (-9) (-9)2 = 81
12 - 20 = (-8) (-8)2 = 64
12 - 20 = (-8) (-8)2 = 64
13 - 20 = (-7) (-7)2 = 49
14 - 20 = (-6) (-6)2 = 36
18 - 20  = (-2) (-2)2 = 4
19 - 20 = (-1) (-1)2 = 1
21 - 20 = 1 12 = 1
21 - 20 = 1 12 = 1
22 - 20 = 2 22 =4
24 - 20 = 4 42 = 16
35 - 20 = 15 152 = 225
35 - 20 = 15 152 = 225
35 - 20 = 15 152 = 225
50 - 20 = 30 302 = 900

Step 4: Find the Sum

sum = 225 + 196 + 169 + 81 + 64 + 64 + 49 + 36 + 4 + 1 + 1 + 1 + 4 + 16 +225 + 225 + 225 + 900 =2486/n=2486/18=138.11

Answer: The variance is 138.11

Try the Exercise

Try this exercise

Using the steps in the Variance Example tab, find the variance for the following data set.

8 11 12 14 17 17 18 19 22 29 35 38

Measures of Variability - Answer Key

Data Set:  8 11 12 14 17 17 18 19 22 29 35 38

n = 12

μ = mean

Step 1: Find the mean

μ = sumn=8+11+12+14+17+17+18+19+22+29+35+3812=24012=20

Step 2 and 3: Find the Deviation and Sum of Squares

Deviations Sum of Squares
(8 – 20) = -12 (-12)= 144
(11 – 20) = -9 (-9)= 81
(12 – 20) = -8 (-8)= 64
(14 – 20) = -6 (-6)= 36
(17 – 20) = -3 (-3)= 9
(17 – 20) = -3 (-3)= 9
(18 – 20) = -2 (-2)= 4
(19 – 20) = -1 (-1)= 1
(22 – 20) = 2 2= 4
(29 – 20) = 9 9= 81
(35 – 20) = 15 15= 225
(38 - 20) = 18 18= 324


Step 4: Find the sum

sum = (-12) + (-9) + (-8) + (-6) + (-3) + (-3) + (-2) + (-1) + 2 + 9 + 15 + 18 = 0

Step 5: Find the variance

variance = sumn=98212=81.83

Measures of Central Tendency

Measures of central tendency are the methods of determining central values in a population. The following are the three main measures of central tendency.

  • Mean: the average score
  • Median: the middle score in a sequence of scores in ranked order
  • Mode: the most frequent score

Depending on the shape of a distribution, one of these measures may be more accurate than the others. In symmetrical, unimodal datasets, the mean is the most accurate measure of central tendency. For asymmetrical (skewed), unimodal datasets, the median is likely to be more accurate. For bimodal distributions, the only measure that can capture central tendency accurately is the mode.


The mode is the most frequently occurring number within a data set.

If two scores occur equally as often within a data set, the set is called bimodal because it has two modes. Any data set that has two or more modes is multimodal

There is no equation for finding the mode; you just simply count the number of times each score occurs to find the mode. If the data set is multimodal, then you report all modes.

EXAMPLES:

  1.   12 12 14 15 16 19 22 25 29 33 16 17 18 16 19 16    Mode = 16

  2.   15 16 19 17 14 17 19 17 21 23 25 19 28 26 17 19    Mode = 17 and 19


Median

The median is the middle score in a set of scores that have been ranked in numerical order.

In sequences that have an even number of scores, the median is between the two middle scores and calculated as the middle of those two scores unless the two scores have the same value.

EXERCISE: Order these sequences from smallest to largest to find the median

  1. 12 15 48 23 56 22 21 41 57 52 22 46 41 62 34

    12 15 21 22 22 23 34 41 41 46 48 52 56 57 62

    Median = 41

  2. 11 5 7 32 56 41 23 22 17 18 42 6 27 31 42 8 7 11

    5 6 7 7 8 11 11 17 18 22 23 27 31 32 41 42 42 56
    ↑   ↑           
    Median = (18 + 22)/2 = 20


When should you use the median to describe your statistics?

The median is a measure of central tendency that should be used with frequencies that have scores that are heavily skewed because the median is resistant to outliers.

EXAMPLE: the following sequence of scores has been ranked to illustrate the skew within the distribution:

1 4 5 6 7 17 21 21 22 23 24 26 27 31 32 44 109

As you can see, the frequency is skewed, which is indicated by the abnormally large score at the end of the sequence.


Sample Mean vs. Arithmetic Mean (also known as population mean)

In a sample mean, the scores are from the same sample and the mean is denoted by M. When the scores are from a population, you must use an arithmetic mean, which is denoted by μ (pronounced “mew”). Therefore, the respective equations for the sample mean and arithmetic mean are as follows:

M= ∑​X N

μ = ∑​X N

Notice: the equations are the same; the only difference is the symbol used to represent what kind of mean you are looking at.

EXERCISE: find the mean of the following data sets

  1. 8 11 12 14 17 17 18 19 22 23 25 28

    8+11+12+14+17+17+18+19+22+23+25+28=214
    214/N or 214/12 = 17.83


  2. 41 42 44 46 47 48 48 49 51 61 66 67

    41+42+44+46+47+48+48+49+51+61+66+67=610
    610/N = 610/12 = 50.83


Mean

The mean is the average of a sequence of scores. The mean is calculated by summing scores and dividing that sum by the total number of scores. S, or “sigma”, is the Greek symbol for summing. 

M= ∑ ​ X / N

EXAMPLE: this answer was found by summing the scores within a data set and then dividing by the number of scores.

4 + 5 + 5 + 5 + 5 + 8 + 8 + 9 + 11 + 11 + 11 + 12 + 12 + 14 + 15 = 135

135/15 = 9

When should you use the mean?

The mean of a data set can be helpful when it is a relatively normal distribution. However, the mean can be misleading if the frequency of scores is heavily skewed.


Interactions of the Mean, Median, and Mode

As you can see in Pearson’s diagram below, the mean is equal to the mode and the median in a normal or symmetrical distribution, while in a negatively skewed distribution the mean is to the left of the median and the mode, while the positively skewed distribution has a mean that is to the right of the median and mode.

graph demonstrating skewness


The Mean and Distribution Shape

The frequency distributions below show a normal distribution, a positively skewed distribution, and a negatively skewed distribution.

Symmetrical or Normal Distributions

In a normal (or symmetrical) distribution, the mean is in the center of a distribution.

chart demonstrating normal distribution

Skewness

Skewness is a measure of the lack of symmetry in a distribution. A distribution, or data set, is symmetric if it looks the same to the left and right of the center point. A skew occurs when a population’s mean or mode is shifted to the left or right of the median and/or the mode. They can be negative or positive. If there are outliers within the frequency, the distribution will be skewed and the mean will not be representative of the group. An outlier is a number that lies outside of the distribution’s range. 

EXAMPLE: In a distribution with an outlier or in a heavily skewed distribution (the data is not normally distributed), the mean is pulled in the direction of the outlier or skew, and is thus not the most accurate measure of central tendency. Under these circumstances, the median will better describe the dataset.
 

(2+2+2+3+3)/5 = 2.4 (without outlier)

(2+2+2+3+3+12)/6 = 4 (with outlier)


Negatively Skewed Distributions (tail to the left):

In negatively skewed distributions, the mean is less than the median and the median is less than the mode. The mean is the lowest measure of central tendency in negatively skewed distributions.
 
Why is the mean the lowest measure of central tendency in negatively skewed distributions?
Extreme low scores in the tail pull the mean to the left in negatively skewed distributions.

chart demonstrating negatively skewed distribution


Positively Skewed Distributions (tail to the right):

In positively skewed distributions, the mode is less than the median and the median is less than the mean. Therefore, the mean is the highest measure of central tendency in positively skewed distributions.
 
Why is the mean the highest measure of central tendency in positively skewed distributions?
The mean is pulled to the right by extreme high scores in the tail.

chart demonstrating positively skewed distribution


Kurtosis

Kurtosis is a measure of a distribution’s peak. It can be peaked or flat relative to a normal distribution. Leptokurtic data sets with high kurtosis have a distinct peak near the mean, decline rapidly, and have heavy tails. Platykurtic data sets with low kurtosis tend to have a flat top near the mean rather than a sharp peak. Finally, mesokurtic data sets are symmetrical and have a moderate peak.

chart of leptokurtic mesokurtic platykurtic curves

Definitions

Bimodal: a frequency distribution that has two modes.

Descriptive Statistics: statistics that collect, summarize, classify and present data.

Mean: the average of a sample or a population of scores.

Measures of Central Tendency: the methods of determining central values in a population.

Measures of Variability: Measures that allow you to determine the degree of variation within a population or sample, determine how representative a particular score is of a data set, and determine the scope and validity of any generalizations you wish to make based on your research observations.

Median: the middle score in a set of scores that have been ranked in numerical order.

Mode: the most frequently occurring number within a data set.

Multimodal: a frequency distribution that has two or more modes.

Outlier: A data point that is distinctly separate from the rest of the data.

Range: The range is the difference between the highest and lowest scores in a distribution.

Skew: A skew occurs when a population’s mean or mode is shifted to the left or right of the median or the mode. They can be negative or positive. The mean is less than the median in a negatively skewed population because there are some low scores that shift the mean to the left. The mode is always less than the mean and median in a positively skewed population.

Standard Deviation: Square root of the variance.

Sum of Squares: The sum of squares is a measure of variance or deviation from the mean. It is calculated by summing the squares of each score’s difference from the mean. It is the sum of squared deviations. 

Variability/Variance: Degree to which the scores vary from their mean.