Often we collect sample data and want to know how representative of the whole population that sample is. If we took many samples, we would find that the distribution of the means of the samples tended towards normality even if the population as a whole were not normal. This is the central limit theorem. In Measuring Dispersion we saw that the standard deviation of a sample tells us how well the mean describes the sample as a whole. However, the standard deviation of sample means is called the standard error. The standard error therefore tells us how representative sample means are of the population mean.

We can use this to work out confidence intervals: These are boundaries within which the population is likely to fall. We need to calculate these boundaries because if we collect sample data in an effort to judge the mean of a population, we won’t know how close to the true mean the sample means are. The most common confidence interval to be calculated is the 95% interval: This means that if 100 samples were taken and means calculated, 95 of these samples would contain the true mean for the population. You might also come across a 99% confidence interval.

The 95% confidence interval is calculated using what we know about the probabilities of particular values. 95% of all z-scores fall between -1.96 and +1.96. If our sample had a mean of 0 and standard deviation of 1, 95% of the values in the sample would fall between -1.96 and +1.96. In real life, we are unlikely to have a perfectly standard normal distribution, so we need to recalculate: The lower boundary of the confidence interval will be the standard error multiplied by 1.96 then subtracted from the mean; and the upper boundary will be the standard error multiplied by 1.96 then added to the mean.

You might also want to report a margin of error associated with your findings. Let’s say we have asked 1,000 people if they like carrots, and 610 said yes. Our sample size is therefore 1,000 and our sample proportion is 0.61. To work out the margin of error with a 95% confidence interval, first calculate the standard error: Multiply the sample proportion by the sample proportion subtracted from 1, i.e:

0.61 * (1.0 - 0.61)

Divide the result by the sample size, 1000, and take the square root. This is the standard error: 0.01542. Then multiply the result by the appropriate z-value for the confidence interval, i.e. in this case 1.96. This gives 0.0302. This is the margin of error for a 95% confidence interval. In other words, you can say that based on the sample, 61% of people in the population like carrots and you can have 95% confidence that the margin of error is no more than ±3.0%.

Note that the sample size and margin of error have an inverse relationship: The larger the sample size, the smaller the margin of error. However, as the sample size increases, the diminution in the margin of error becomes smaller. In other words, increasing the sample size beyond a certain point will make very little difference to your margin of error. Increasing a sample size from 1,500 to 2,000 will decrease the margin of error by only 0.3% for a 95% confidence interval, for instance. You will have to decide on a sample size based on this. Don’t forget that your sample can also be biased for any number of other reasons anyway, and you should conduct your research to minimise biased samples.

Statistical tests often start with hypotheses. A hypothesis test is one where we want to see if a statement about a population is justified from the sample data. For example, a drug developer might want to know if a drug performs better than a placebo at increasing white blood cell count. The developer would collect data by testing the drug on one group of individuals and giving a placebo to another similar group. There are two possible statements of outcomes: (1) There is no difference in white blood cell count between populations the groups represent; (2) There is a difference. The first statement is known as the null hypothesis and the second is the alternative hypothesis. In general, a null hypothesis is one which states that there is no relationship or effect to be discerned, and the alternative hypothesis is the logical alternative, i.e. that the null hypothesis is not the case. We use statistical testing to determine the likelihood of a result as extreme as the one we observe in our data if the null hypothesis is true. A highly unlikely result does not mean that the alternative hypothesis is true, simply that we have rejected the null hypothesis.

 Type I and type II errors There are two potential errors which can occur in interpreting results. A type I error is also known as a ‘false positive’, i.e. the incorrect rejection of a true null hypothesis. In other words, you think there is a relationship, but there is not. A type II error is the failure to reject a false null hypothesis. This is also known as a ‘false negative’: You think there is no relationship, but there is.

Statisticians use probabilities to consider the likelihood of results, and to decide whether these results are ‘significant’. Statistical significance does not simply mean that a result is important. It means that a probability level – or p-value – has been decided upon in advance, and that the null hypothesis has been accepted or rejected at that level. If the null hypothesis is true, the p-value is the probability of getting by chance a sample collecting our data or more extreme data. For example, in our white blood cell count experiment, if the drug is ineffective and we select very many random samples, we would see as big a change in blood count as we observe in the data a certain proportion of times. That proportion is the p-value. Before conducting the experiment, we would decide what proportion would be acceptable. The common convention is that a p-value of 0.05 (i.e. 5%) or less is statistically significant, and at that point the null hypothesis is rejected. We can never be sure that either hypothesis is correct, so we calculate the probability of our prediction being true in a world where there is no effect in the population.

When you conduct a statistical test, the result will often come with a p-value (marked Sig. in SPSS) telling you the chance of the result occurring under null hypothesis conditions. If the value is less than 0.05 – or whatever value you decide is acceptable – then reject the null hypothesis. But don’t forget that this is only a convention: There is no objective reason for 0.05 to be the cut-off point rather than 0.03, 0.06 or any other value.

T-tests are a group of statistical tests which compare the means of variables for samples. The mathematics behind a t-test essentially compares the number of standard errors the sample means are from each other, and checks how likely the difference is according to the t-table – a table of probabilities adjusted from those for the standard normal distribution to reflect uncertainty in samples.

If you have a group of cases and want to compare them before and after an intervention or event, use the Paired t-test. For example, you might have a sample of individuals on whom you have tested a drug and you want to see whether their white blood cell count is significantly different after taking the drug. The sample contains the same individuals, so their before and after values are considered pairs for this test.

Alternatively, you might have two samples containing different cases and you want to see if they display similar means for a particular variable. For example, you might have given the drug to two different age groups. None of the people in group A are also in group B. The appropriate test is the Independent Samples T-test. The independent samples t-test lets you know the likelihood of the means of the samples being the same under null hypothesis conditions. It is best to use this test on data which are normally distributed, but with large enough samples this consideration is less important. If the assumption of normality is violated (i.e. the distribution is too non-normal for the test to work), you can use the Mann-Whitney U-test (the non-parametric equivalent of the independent samples t-test) or the Wilcoxon matched pairs signed ranks test (instead of the paired t-test).