Organising and Designing Quantitative Data
2. Modelling Data Structures
Most historians do not have the luxury of working only with pre-existing, carefully prepared and documented statistical data.
Sources that are irregular in 'shape', such as textual sources with long narrative accounts written in paragraphs and chapters and so on, or databases of image/sound/video collections, are particularly problematic when it comes to converting their information into data; but the problem will also arise in the more structured sources (such as census listings or taxation assessments), which are never quite as simple as they might appear.
Analysing or 'modelling' data structures, designing databases, making decisions about categorisation, normalisation, and so on, are as important for effective management of quantitative data as file naming and organising folders. However, this is in most cases considerably more complex than the simple tree structure of files and directories, and historians undertaking quantitative analysis will often need to learn to use specialist tools and techniques. Therefore, it must be emphasised that the discussion here is merely intended to provide some basic guidance and resources.
Every project has to be tailored to the challenges of the historical sources it uses and the research questions being asked. Unfortunately, many books and online resources for quantitative data management are aimed at social scientists or scientists, and while these can be useful for learning about concepts, techniques and general issues involved in quantitative analysis, they do not address the particular issues historians face in attempting to transform typically variable and messy historical sources into regularly structured data that can be used for quantitative and statistical analysis.
Even if you don't plan to use databases (but see further on for reasons why you might!), discussions of database design usually have wider application for thinking about how to model data - ie, how to analyse your historical sources and 'unpack' its underlying structures, categories and relationships.
A further consideration for a historian at the planning stage is whether you will need to be able to return easily to the full original source. If you are collecting aggregate data (or data for aggregation) for statistical analysis this may not be an issue, but it will be important if your methodology uses quantification primarily as an entry-point for deeper, qualitative work, or will move frequently between the two modes of analysis.