Home > Statistics > Good teaching > Data reduction

# Data reduction

A data set can be summarised with a measure of centre: the mean, median or mode.

The article on the AAMT website What is 'Typical' for Different Kinds of Data? uses data from Melbourne Cup winners to explore and explain what 'typical' might mean in regards to:

• the mode
• the median
• the mean
• both the median and mean
• using numerical attributes within categorical data.

You can read about the use of measures of centre in context and in the media in What's Average?.

A measure of spread and clusters of data can be used to summarise a data set. In the middle years, spread is usually described with the range of the entire data set. The range of the middle half (50%) of the data is also useful.

Combining the contribution of the median and measures of spread is the box plot created from the five-number summary, which includes the median, maximum and minimum (range), and interquartile range (middle 50% of the data).

The presence of outliers in data sets may influence the measures that summarise a data set.

## Central tendency

Mean and median are two different measures of central tendency for measurable data.

## Box plots

Box plots are a useful graphical representation of a five-number summary for describing, illustrating and comparing the distribution of large data sets.

## Influence of outliers

Outliers have the potential to influence the measures of centre dramatically. The decision to include or exclude an outlier should be based on the context of the data and the value of the outlier.

## Mean, median and mode

The relationship between the mean, median and mode can vary depending on the distribution of the data.