Distributions
The distribution of a variable is the frequency with which the possible values of the variable occur. If you want to show the distribution of a variable graphically, a useful method is the histogram. The x-axis contains the values ‘binned’ or aggregated into ranges, and the y-axis shows how frequently the values appear in the data. For example, Figure 2 shows how many respondents said they received between zero and £10, £10-£20, and so on. Figure 2 shows a ‘reverse J’ distribution, i.e. one that is skewed towards the left side with more respondents reporting low values than high values.
If we had also measured the heights of our respondents, we might get a histogram whose approximate shape is known as the normal distribution:
The normal distribution, recognisable by its characteristic bell-shape, has some important features. It is symmetrical around the mean, with the highest point at the mean, and its tails extend infinitely in both directions. If data are normally distributed, we know that:
- 68.3% of the values are within one standard deviation of the mean.
- 95.5% of the values are within two standard deviations of the mean.
- 99.7% of the values are within three standard deviations of the mean.
We can use this information to predict the likelihood of occurrences of particular values in the data, and to compare this sample to others.