Collecting data
Once you have identified a potential source of quantitative data, take some time to familiarise yourself with its content, especially if you have to digitise it yourself (i.e. manually enter the data on to computer). For example, you might have a dataset giving information about the populations of different counties at different times. How many cases are there? What would be the most appropriate way of storing the dataset digitally? How long will it take to enter it? Considering these questions will help you to plan your work effectively.
One important question in entering data is, what should be rows and what should be columns? In general, each row of your spreadsheet or database should be a case and each column should be a variable relating to the cases. A case is an individual entity in your dataset – for example each place – and things that describe each case are variables such as population, area, population density etc. Each field in the spreadsheet contains a datum(plural data). Each case should have a unique identifying number (UID).
Variables
Column headers
UID
COUNTY
POPULATION
AREA_KMsq
DENSITY
Cases
001
Norfolk
859,400
5,371
160.0
002
Suffolk
730,100
3,801
192.1
It is also worth saving metadata associated with your dataset. Metadata is ‘data about data’ and might include source citation, authorship, date of creation, date and nature of amendments, licensing information or terms of use etc. If you save your data in Excel or OpenOffice, consider adding a sheet to your file containing metadata.
Don’t have all the data? Don’t worry.
If you have all numbers for all the units you want to study, you have the population (not necessarily people – it’s the term for the whole collection of any units). If you have data for just a few of those units, you have a sample. You can do statistical tests to determine how representative results obtained from your sample might be of the whole population. The process of collecting data about a population is a census. A parameter is a measurement that relates to a whole population. A statistic, strictly speaking, is a number relating just to a sample.