Measures of Variability
Many courses require that students in undergraduate degree programs have a basic understanding of descriptive statistics. Descriptive statistics are statistics that collect, summarize, classify and present data. This guide gives you an overview of one type of descriptive statistics, measures of variability.
Measures of variability are measures that allow you to determine the degree of variation within a population or sample, determine how representative a particular score is of a data set, and determine the scope and validity of any generalizations you wish to make based on your research observations. The measures of variability discussed in this handout are:
- Range
- Variance
- Standard Deviation
Range
The range is the difference between the highest and lowest scores in a distribution. It is calculated by subtracting the lowest score from the highest score.
When is the range useful?
The range gives you a rough guide to the variability in a data set, as it tells you how a particular score compares to the highest and lowest scores.
Example:
14 40 42 47 49 51 71 81
(81 - 14) = 67
The range for this distribution would be (81 – 14) = 67. However, as you can see, 14 is an outlier and skews this distribution because, if it were not in the distribution, the range would be (81 – 40) = 41.
How else is the range useful
The range gives you a rough guide to the variability in a data set, as it tells you how a particular score compares to the highest and lowest scores within a data set. The range gives you only a limited amount of information, as data sets which are skewed towards a low score can have the same range as data sets which are skewed towards a high score, or those which cluster around some central score.
Sum of Squares
The sum of squares is a measure of variance or deviation from the mean.
It is calculated by summing of the squares of each score’s difference from the mean. The total sum of squares is another sum of squares; it considers not only the sum of squares from the factors, but also from randomness or error because the squaring of each score rids the equation of negative numbers. As you can see in the above example showing how to obtain the variance, step 5 requires you to find the sum of squares (SS).
EXAMPLE:
- Set of data: 2, 4, 6, 8
- Square each data point and add them together: 22 + 42 + 62 + 82= 4 + 16 + 36 + 64 =120
- Add together all of the data and square this sum: (2 + 4 + 6 + 8)2 = 400
- Divide this by the number of data points to obtain 400/4 =100
- We now subtract this number from 120: 120 - 100 = 20
- This gives us that the sum of the squared deviations is 20.
Standard Deviation (SD)
The standard deviation is the square root of the variance.
Unlike the variance, the standard deviation is measured in the same units as the raw scores themselves; therefore, one cannot just use the variance. This is what makes the standard deviation more meaningful. For example, it would make more sense to discuss the variability of a set of IQ scores in IQ points than in squared IQ points because they would not be congruent with the score's meaning.
Variables
EXAMPLE:
Data Set: 2, 4, 4, 4, 5, 5, 7, 9
Find the mean n = 5
Calculate the deviations of each data point from the mean, and square the result of each:
(2-5)2 = (-3)2 = 9
(4-5)2 = (-1)2 = 1
(4-5)2 = (-1)2 = 1
(4-5)2 = (-1)2 = 1
(5-5)2 = (0)2 = 0
(5-5)2 = (0)2 = 0
(7-5)2 = 22 = 4
(9-5)2 = 42 = 16
The variance is the mean of these values:
Standard deviation is equal to the square root of the variance:
√4 = 2
Variability
Variance is the degree to which scores vary from their mean. The variance uses every score in the data set. The variance is calculated by getting the average of the squared deviations from the mean.
Variance Equation
Population Size | Sample Size | |
---|---|---|
or |
To calculate the variance for a set of quiz scores:
- Find the mean (M).
- Find the deviation of each raw score from the mean (D). To do this, subtract the mean from each raw score. To check your calculations, sum the deviation scores. This sum should be equal to zero. **Note that deviation scores below the mean will be negative.
- Square the deviation scores (SS). We do this because by squaring the scores, negative scores are made positive and extreme scores are given relatively more weight.
- Find the sum of the squared deviation scores.
- Divide the sum by the number of scores. This yields the average of the squared deviations from the mean, or the variance Range.
Other Exercises and Resources
Variance Example Problem
Example Population Size:
Data Set: 5, 6, 7, 11, 12, 12, 13, 14, 18, 19, 21, 21, 22, 24, 35, 35, 50
Step 1: Find the mean
n = 18
mean(n) =
Step 2 and 3: Find the deviation (D) from each score and square the deviation scores (SS).
D = x-μ
SS = (x-μ)2
Deviation | Sum of Squares |
---|---|
5 - 20 = (-15) | (-15)2 = 225 |
6 - 20 = (-14) | (-14)2 = 196 |
7 - 20 = (-13) | (-13)2 = 169 |
11 - 20 = (-9) | (-9)2 = 81 |
12 - 20 = (-8) | (-8)2 = 64 |
12 - 20 = (-8) | (-8)2 = 64 |
13 - 20 = (-7) | (-7)2 = 49 |
14 - 20 = (-6) | (-6)2 = 36 |
18 - 20 = (-2) | (-2)2 = 4 |
19 - 20 = (-1) | (-1)2 = 1 |
21 - 20 = 1 | 12 = 1 |
21 - 20 = 1 | 12 = 1 |
22 - 20 = 2 | 22 =4 |
24 - 20 = 4 | 42 = 16 |
35 - 20 = 15 | 152 = 225 |
35 - 20 = 15 | 152 = 225 |
35 - 20 = 15 | 152 = 225 |
50 - 20 = 30 | 302 = 900 |
Step 4: Find the Sum
sum = 225 + 196 + 169 + 81 + 64 + 64 + 49 + 36 + 4 + 1 + 1 + 1 + 4 + 16 +225 + 225 + 225 + 900 =
Answer: The variance is 138.11
Try this exercise
Using the steps in the Variance Example tab, find the variance for the following data set.
8 11 12 14 17 17 18 19 22 29 35 38
Measures of Variability - Answer Key
Data Set: 8 11 12 14 17 17 18 19 22 29 35 38
n = 12
μ = mean
Step 1: Find the mean
μ =
Step 2 and 3: Find the Deviation and Sum of Squares
Deviations | Sum of Squares |
---|---|
(8 – 20) = -12 | (-12)2 = 144 |
(11 – 20) = -9 | (-9)2 = 81 |
(12 – 20) = -8 | (-8)2 = 64 |
(14 – 20) = -6 | (-6)2 = 36 |
(17 – 20) = -3 | (-3)2 = 9 |
(17 – 20) = -3 | (-3)2 = 9 |
(18 – 20) = -2 | (-2)2 = 4 |
(19 – 20) = -1 | (-1)2 = 1 |
(22 – 20) = 2 | 22 = 4 |
(29 – 20) = 9 | 92 = 81 |
(35 – 20) = 15 | 152 = 225 |
(38 - 20) = 18 | 182 = 324 |
Step 4: Find the sum
sum = (-12) + (-9) + (-8) + (-6) + (-3) + (-3) + (-2) + (-1) + 2 + 9 + 15 + 18 = 0
Step 5: Find the variance
variance =
Measures of Central Tendency
Measures of central tendency are the methods of determining central values in a population. The following are the three main measures of central tendency.
- Mean: the average score
- Median: the middle score in a sequence of scores in ranked order
- Mode: the most frequent score
Depending on the shape of a distribution, one of these measures may be more accurate than the others. In symmetrical, unimodal datasets, the mean is the most accurate measure of central tendency. For asymmetrical (skewed), unimodal datasets, the median is likely to be more accurate. For bimodal distributions, the only measure that can capture central tendency accurately is the mode.
The mode is the most frequently occurring number within a data set.
If two scores occur equally as often within a data set, the set is called bimodal because it has two modes. Any data set that has two or more modes is multimodal.
There is no equation for finding the mode; you just simply count the number of times each score occurs to find the mode. If the data set is multimodal, then you report all modes.
EXAMPLES:
-
12 12 14 15 16 19 22 25 29 33 16 17 18 16 19 16 Mode = 16
-
15 16 19 17 14 17 19 17 21 23 25 19 28 26 17 19 Mode = 17 and 19
Median
The median is the middle score in a set of scores that have been ranked in numerical order.
In sequences that have an even number of scores, the median is between the two middle scores and calculated as the middle of those two scores unless the two scores have the same value.
EXERCISE: Order these sequences from smallest to largest to find the median
-
12 15 48 23 56 22 21 41 57 52 22 46 41 62 34
12 15 21 22 22 23 34 41 41 46 48 52 56 57 62
↑
Median = 41 -
11 5 7 32 56 41 23 22 17 18 42 6 27 31 42 8 7 11
5 6 7 7 8 11 11 17 18 22 23 27 31 32 41 42 42 56
↑ ↑
Median = (18 + 22)/2 = 20
When should you use the median to describe your statistics?
The median is a measure of central tendency that should be used with frequencies that have scores that are heavily skewed because the median is resistant to outliers.
EXAMPLE: the following sequence of scores has been ranked to illustrate the skew within the distribution:
1 4 5 6 7 17 21 21 22 23 24 26 27 31 32 44 109
As you can see, the frequency is skewed, which is indicated by the abnormally large score at the end of the sequence.
Sample Mean vs. Arithmetic Mean (also known as population mean)
In a sample mean, the scores are from the same sample and the mean is denoted by M. When the scores are from a population, you must use an arithmetic mean, which is denoted by μ (pronounced “mew”). Therefore, the respective equations for the sample mean and arithmetic mean are as follows:
M= ∑X N
μ = ∑X N
Notice: the equations are the same; the only difference is the symbol used to represent what kind of mean you are looking at.
EXERCISE: find the mean of the following data sets
-
8 11 12 14 17 17 18 19 22 23 25 28
8+11+12+14+17+17+18+19+22+23+25+28=214
214/N or 214/12 = 17.83 -
41 42 44 46 47 48 48 49 51 61 66 67
41+42+44+46+47+48+48+49+51+61+66+67=610
610/N = 610/12 = 50.83
Mean
The mean is the average of a sequence of scores. The mean is calculated by summing scores and dividing that sum by the total number of scores. S, or “sigma”, is the Greek symbol for summing.
M= ∑ X / N
EXAMPLE: this answer was found by summing the scores within a data set and then dividing by the number of scores.
4 + 5 + 5 + 5 + 5 + 8 + 8 + 9 + 11 + 11 + 11 + 12 + 12 + 14 + 15 = 135
135/15 = 9
When should you use the mean?
The mean of a data set can be helpful when it is a relatively normal distribution. However, the mean can be misleading if the frequency of scores is heavily skewed.
Interactions of the Mean, Median, and Mode
As you can see in Pearson’s diagram below, the mean is equal to the mode and the median in a normal or symmetrical distribution, while in a negatively skewed distribution the mean is to the left of the median and the mode, while the positively skewed distribution has a mean that is to the right of the median and mode.
The Mean and Distribution Shape
The frequency distributions below show a normal distribution, a positively skewed distribution, and a negatively skewed distribution.
Symmetrical or Normal Distributions
In a normal (or symmetrical) distribution, the mean is in the center of a distribution.
Skewness
Skewness is a measure of the lack of symmetry in a distribution. A distribution, or data set, is symmetric if it looks the same to the left and right of the center point. A skew occurs when a population’s mean or mode is shifted to the left or right of the median and/or the mode. They can be negative or positive. If there are outliers within the frequency, the distribution will be skewed and the mean will not be representative of the group. An outlier is a number that lies outside of the distribution’s range.
EXAMPLE: In a distribution with an outlier or in a heavily skewed distribution (the data is not normally distributed), the mean is pulled in the direction of the outlier or skew, and is thus not the most accurate measure of central tendency. Under these circumstances, the median will better describe the dataset.
(2+2+2+3+3)/5 = 2.4 (without outlier)
(2+2+2+3+3+12)/6 = 4 (with outlier)
Negatively Skewed Distributions (tail to the left):
In negatively skewed distributions, the mean is less than the median and the median is less than the mode. The mean is the lowest measure of central tendency in negatively skewed distributions.
Why is the mean the lowest measure of central tendency in negatively skewed distributions?
Extreme low scores in the tail pull the mean to the left in negatively skewed distributions.
Positively Skewed Distributions (tail to the right):
In positively skewed distributions, the mode is less than the median and the median is less than the mean. Therefore, the mean is the highest measure of central tendency in positively skewed distributions.
Why is the mean the highest measure of central tendency in positively skewed distributions?
The mean is pulled to the right by extreme high scores in the tail.
Kurtosis
Kurtosis is a measure of a distribution’s peak. It can be peaked or flat relative to a normal distribution. Leptokurtic data sets with high kurtosis have a distinct peak near the mean, decline rapidly, and have heavy tails. Platykurtic data sets with low kurtosis tend to have a flat top near the mean rather than a sharp peak. Finally, mesokurtic data sets are symmetrical and have a moderate peak.
Definitions
Bimodal: a frequency distribution that has two modes.
Descriptive Statistics: statistics that collect, summarize, classify and present data.
Mean: the average of a sample or a population of scores.
Measures of Central Tendency: the methods of determining central values in a population.
Measures of Variability: Measures that allow you to determine the degree of variation within a population or sample, determine how representative a particular score is of a data set, and determine the scope and validity of any generalizations you wish to make based on your research observations.
Median: the middle score in a set of scores that have been ranked in numerical order.
Mode: the most frequently occurring number within a data set.
Multimodal: a frequency distribution that has two or more modes.
Outlier: A data point that is distinctly separate from the rest of the data.
Range: The range is the difference between the highest and lowest scores in a distribution.
Skew: A skew occurs when a population’s mean or mode is shifted to the left or right of the median or the mode. They can be negative or positive. The mean is less than the median in a negatively skewed population because there are some low scores that shift the mean to the left. The mode is always less than the mean and median in a positively skewed population.
Standard Deviation: Square root of the variance.
Sum of Squares: The sum of squares is a measure of variance or deviation from the mean. It is calculated by summing the squares of each score’s difference from the mean. It is the sum of squared deviations.
Variability/Variance: Degree to which the scores vary from their mean.