Home > Statistics > Misunderstandings > Misunderstandings of averages
Misunderstandings of averages
There are many colloquial words that are used for the idea of average, such as 'typical' or 'common' or even the word 'average' itself. The mean, median and mode are the three averages, or measures of centre, that the curriculum focusses upon.
For data sets that are symmetric, the median and the mean will be the same (or nearly so) but for skewed distributions they are different. Activities where students collect their own data and compare the mean and median (and sometimes the mode) can be useful in building intuitions for distinguishing the two measures of centre.
Using the median meaningfully is perhaps the most difficult. The two main difficulties are:
- the median is only used for numerical data that can be ordered
- the data must actually be ordered.
Students can believe that it is possible to find the median of categorical data sets. Activities that challenge students' thinking about mean, median, mode, and the difference between numerical and categorical data are needed.
When focussing on the mean, students at times forget outliers, which can affect the summary statistic or a claim, especially if there is no accompanying graph to attract attention. Activities where outliers are included can be useful.
Sometimes the typical part of a data set (or its middle) is better represented by a range of values rather than a unique number. Students need to be exposed to this idea.
Problems with categorical data
Even though the median may be carefully defined as the middle value in an ordered data set, students sometimes try to find the median of categorical data sets.
Medians and categorical data
To help students move away from trying to find the median of categorical data sets, activities that challenge students’ thinking or misconceptions are needed.
Mean, median and mode
This activity distinguishes among mean, median and mode through modelling with paper strips.
Outliers
In considering measures of centre, it is important to be aware of the effect of outliers, particularly on the mean and the line of best fit.
Identifying outliers
Outliers can be identified through observation or the application of more formal guidelines.
Plots and outliers
An outlier in a single variable data set can be identified by drawing a box plot. Outliers in two variable data can be identified if their removal from the data set strengthens the correlation between the two variables.