Outliers

An outlier is an observation that appears to be significantly different from other values in the data set. This may be noticed in a table of values or in a particular graph type.

Box plot extending from 12 to 18 with outlier at 27.

Box plot with outlier.

Outliers are removed if they are errors. Some causes of errors are inaccurate measurement, incorrect recording, or the subject not qualifying for the sample. Outliers can be defined according to a formula. Sometimes outliers identified in this way are not errors but legitimately part of the data set.

 

Dot plot with 29 values less than 1500 and one value at 6500.

Graph of salt content of 30 food products.

 

The decision of whether to keep or delete outliers depends on the question being asked about the data set. In the above plot, the outlier of 6500 was found to be a legitimate value (soy sauce). In cases such as this, the decision may be made that the value is so extreme as to be irrelevant for further comparison with the other products.

There is further information about this in the teaching advice about identifying outliers. The effect of outliers on the median and mean is discussed in measures of central tendency.

The Beware of Outliers: Student Worksheet provides an activity based on outliers and the mean. You can also download the Beware of Outliers: Teacher Notes.

Curriculum links

Year 8: Investigate the effect of individual data values, including outliers, on the mean and median

Source