Quantitative Data: Examples of Data Structures

Site: Postgraduate online research training
Course: Module 1: Introduction
Book: Quantitative Data: Examples of Data Structures
Printed by: Guest user
Date: Saturday, 26 September 2020, 10:51 PM

1. Introduction


These examples are provided to show different kinds of data, of varying complexity, that historians use for quantification. They indicate that often the ideal tool for managing data for quantitative history depends both on its structure and on what you want to do with it.

Hence, it is important to model your own data - and work out your research needs - before you decide on software tools. Sometimes the best way to do this will be old-fashioned pen and paper!


Key issues to consider are:

  • How complex is the structure of an individual source?
  • Do you need to link together different sources of different kinds and structures?
  • How big are your sources, individually and as a whole? (You may need to use samples to estimate this)

2. Simple Data Structures

In essence it is possible to consider quantitative data as either simple or complex in terms of its structure. Simple, as we will show over the next few pages, is not as limited as it might sound, but refers to the complexity of the original source data in terms of how easy it is to unpick the statistical data.

Thus, simple data can include very straightforward statistical information that only involves one or two types of statistics. Simple, can also be more complex than this, drawing in multiple numbers from a short range of documents. Generally this type of data is easily identifiable (i.e. the original document contains the statistics in easy ways to extract) and it is usually easy to calculate and categorise.  

2.1 Seventeenth-century Bill of Mortality

Page from Bill of Mortality CC licensed, Wellcome

Weekly "bills of mortality" were published in London (and some other cities) during the outbreak of the plague in 1665-6, recording not simply deaths from plague but other causes of mortality as well. This page summarises deaths by their causes; other pages list deaths by parish. This source would be well suited to recording in a spreadsheet as it has already been aggregated and is in consistent tabular formats.

While the categories used by the compilers are standardised to some degree, the main challenges for a modern historian are deciding how to define and classify some of the one-off and accidental deaths, and how to interpret unfamiliar categories such as "Rising of the Lights".

The causes of death data could be presented in a simple two-column table:

Stated cause of death

Total

Abortive

5

Aged

43

Ague

2

Apoplexie

1

Bleeding

2

Burnt in his Bed by a Candle at St Giles Cripplegate

1

Canker

1

2.2 Eighteenth-century Workhouse Admissions Register

St Botolph's Workhouse Register London Lives

This is a register of paupers admitted to a London workhouse in the mid-18th century. For documentation for this source (explaining the table in depth), see: London Lives Workhouse Admission Registers.

This is a more complex source than the first example, for which the choice of software could be more dependent on the nature of the research project.

The register lists in total around 2000 people so is certainly not too large for a spreadsheet. It is mainly text rather than numerical, but it has a regular tabular structure, and several columns consist of easily structured data such as gender, age and dates. The "admitted" and "discharged" columns would probably require some post-collection standardisation to enable analysis.

The first few rows of a table might look like this:

surname given sex age admitted adm_date disch disch_date
Dixon Herbert m 7 P Committe 11/10/1741 died 21/06/????
Middleton James m 37 passd fro: White Chappel 12/10/1741 discharged 28/10/1741
Middleton Elizabeth f 31 passd fro: White Chappel 12/10/1741 died 25/10/1741
Middleton James m 1.5 passd fro: White Chappel 12/10/1741 died 25/10/1741

In terms of data structure, it could be suitable for a spreadsheet which could be used, for example, to explore patterns of gender, age, seasonality, etc in admissions to workhouses.

However, there is a problem with the source that complicates matters: the year of discharge is not consistently recorded even though sometimes it is later than the year of admission. If this information were added from other sources, it would both create extra labour and might require some thought about how to document the insertions. If the sources used for obtaining the additions were being recorded in their own right (rather than just being used as a supplement for this one purpose), it might be more efficient to create two separate, linked tables in a relational database.

Moreover, if the research project involved collecting data from a variety of related records of which workhouse registers were just one (for example, to carry out nominal record linkage), a database would probably be more suitable as it would facilitate collection and linkage of more diverse records.

3. Complex Data

What do we mean by complex data? This is statistical data taken from documents which is not necessarily easy to transfer and unearth from the original material. Thus, we are talking about complex statistics with overlapping purposes and meanings which require thought as to how they should be extracted, categorised, and analysed. 

3.1 Criminal Trials


Dennes Brannam and William Purcel, were indicted, for that they, on the King's highway, on Thomas Whiffin , did make an assault, putting him in corporal fear and danger of his life, one hat, val. 8 s. one peruke, val. 10 s. from his person did steal, take, and carry away, Dec. 13 .

...Both Guilty. Death.

- Old Bailey Online

Eighteenth- and nineteenth-century English criminal trials are ideal sources for quantification. Offences, verdicts and punishments are categories of information that can be standardised and counted in various ways to investigate patterns over time, and often gender and age. They were carefully recorded by officials and many of the records have survived.

However, they are also highly complex sources. Trials can contain highly variable numbers of defendants, victims, charges, verdicts and punishments - many trials only have one defendant, but some may have dozens.

The above trial has one offence (highway robbery) but two defendants (a "one to many" relationship). In this trial, both are given the same verdict and sentence, but that is not always the case. Meanwhile, other trials may contain multiple offences, and where this is the case there can be multiple verdicts and, when there are guilty verdicts, multiple punishments ("many to many" relationships) for a single trial.

Spreadsheets cannot store complex information like this (or at least, not without a great deal of data duplication and inefficiency). The solution to recording and storing this kind of information for analysis is to use a relational database with multiple linked tables

3.2 Prosopographical Data

Prosopography is the study of groups through collective study of their members. This requires biographical data for large numbers of individuals, but the information available for many individuals may be quite limited, and consist of scattered references in different documents that need to be connected together and recorded in databases that can facilitate investigation of larger patterns.

One example of this type of work is briefly discussed below by Dr Richard Gorski, who developed a prosopographical database for his PhD thesis back in 1999. The thesis looked at the fourteenth-century sheriff in English local administration during the late middle ages.  

This thesis can be accessed at the University of Hull Hydra data repository here.

For a second example, the diagram below shows the data model of the Individuals module of the Early American Foreign Service Database (EAFSD). This module (or section) of the relational database "models how people relate to each other and their various occupations".

There are several related tables in this module (linked using the "id" primary key): basic information about individuals, occupations, occupation titles, occupation types, relationships (between individuals) and relationship types. The diagram also shows how this module links to others in the database - residences, locations, assignments, correspondence, organizations and references.

It is worth noting the use of easily interpreted descriptive labels (eg name, birth_date) for the tables, and the incorporation of basic administrative metadata (eg created_at) within the tables - it is likely that the timestamps are automatically updated when entries are created or edited.