Designing Databases for Historical Research
B. Sources, information and data
In this section we are going to address some of the issues that historians face when it comes to thinking about building and using a database for their research. Quite what ‘using a database for their research’ actually means is a subject that we will return to in Section C of this Handbook, as it is a subject that encompasses a range of issues which are likely to impact upon the design of a historical database. Essentially what this section will focus on is the difference between ‘information’ and ‘data’ – the former being what sources provide, the latter being what databases need – and it will begin the process of considering how to move from one to the other.
Unfortunately, the historian is faced with particular kinds of problem when it comes to converting sources into a useful database resource, problems which are not shared by most other database users. This (as we shall see) boils down to two separate inescapable realities of historical research:
- The historian often does not know precisely what kinds of analyses they want to conduct when starting out on their research
- The extent and scope of the information contained within the historian’s particular sources cannot usually be anticipated fully
In other words the sheer unpredictability of many historical research projects, the various tangents and new lines of inquiry that open up as soon as you get to grips with the sources, as well as the constant promise of unearthing a type of information that you were not expecting, make designing databases a difficult proposition for historians. Indeed, in many ways, these two factors provide conditions which are entirely contrary to the environment required by the structures and functions of a database. The difficulty for the historian is that what is required is to take information that is informal and unstructured, translate it, and make it fit into a rigidly formal and structured medium. Reconciling the two – the milieu of the historian and the rules of databases – is the principal aim of this Handbook.
Much of what is discussed here is about good practice and ensuring that the most common and critical mistakes are avoided at the most important stage of database creation. Errors at this juncture will have an effect on how useful the database will be: they will make data entry more laborious and more difficult; and more seriously, they will have a significant impact upon the database’s ability to retrieve data for analysis. It is very important therefore to design the database as ‘correctly’ as possible, initially, to minimise the need for retrospective restructuring further down the line (although some of this will inevitably be necessary).