Designing Databases for Historical Research

E. Entity relationship modelling

E1. Introduction

Throughout this Handbook so far reference has been made to the translation and conversion processes involved in taking information from sources and turning them into data within the database. This section describes precisely the tasks involved in performing these processes, which are collectively known as Entity Relationship Modelling (ERM). The mechanics of ERM are in fact a lot less intimidating than the name implies, but it is nevertheless a complex activity, and one that is likely to prove challenging at the first few attempts. Luckily, however, the various stages of ERM draw very heavily upon the skills and experience that the historian utilises as a matter of course during their research anyway, which, unlike most aspects of database use, places the historical researcher at something of an advantage. The difficulty of the ERM process is directly proportional to the complexity of the source(s) being used in the research, with some types of sources being (relatively) simpler to model than others. Highly structured sources like census returns, lists of inhabitants, poll books and so on will be easier to model than ‘semi-structured’ sources such as probate inventories, which in turn will present fewer problems than completely unstructured material such as narrative texts and interviews, and so on. However all will have their own particular features and problems to complicate the modelling.

The process of ERM serves a number of purposes. Firstly, it makes the historian decide upon what it is the database is to achieve in terms of its functions. Secondly, it identifies the types of information that can be obtained from the sources, and in conjunction with the database’s chosen aims, aids the historian in deciding upon which information from the sources should be entered into the database, and which can be can be excluded. Thirdly, ERM makes the historian think in detail about the components of the database, its tables, fields, relationships, datatypes, and so on, decisions on all of which are crucial to a successful database design. Finally, it encourages the consideration of the layers of the database, what information needs to be entered into both the Source layer and the Standardisation layer, what can be entered only into the latter, and how extensive the latter needs to be. Once these tasks have been conducted, the historian is left with a very precise idea of what the database will look like, and, on a more practical note, will be left with the design of their database on paper (an Entity Relationship Diagram [ERD]).