Designing Databases for Historical Research

C. Fundamentals of database design

C2. The Purpose of the database

As we shall see in Section E, the very first step in the formal process for designing a database is to decide what purpose(s) the database is to serve. This is something that is perhaps not as obvious or as straightforward as one might expect, given that databases in the abstract can indeed serve one or more of a number of different kinds of function. In essence, however, there are three types of function that the historian is likely to be interested in:

  • Data management
  • Record linkage
  • Pattern elucidation/aggregate analysis

Each of these functions is a goal that can be achieved through shaping of the database in the design process, and each will require some elements of the database design to be conducted in specific ways, although they are by no means mutually exclusive. And this latter point is an important one, given that most historians will want to have access to the full range of functionality offered by the database, and will likely engage in research that will require all three of the listed types of activity. Or, to put it another way, many historians are unlikely to know precisely what it is they want to do with their database at the very beginning of the design process, which is when these decisions should be taken. This is why, as we shall see later in this section, many historians are inclined to design databases which maximise flexibility in what they can use them for later on in the project (a goal which will come at the price of design simplicity).

The data management aspect of the database is in many cases almost a by-product of how the database works, and yet it is also one of its most powerful and useful functions. Simply being able to hold vast quantities of information from different sources as data all in one place, in a form that makes it possible to find any given piece of information and see it in relation to other pieces of information, is a very important tool for the historian. Many historians use a database for bibliographical organisation, allowing them to connect notes from secondary reading to information taken from primary sources and being able to trace either back to its source. The simpler tools of database software can be used to find information quickly and easily, making the database a robust mechanism for holding information for retrieval.

Record-linkage is where the database, and particularly where the relational database (see Sections D and E), comes into its own. Connecting people, places, dates, events and themes across sources, periods and geographical or administrative boundaries is clearly an incredibly useful task to perform, and whilst the database can do this, the efficiency and accuracy of the linkages will be dictated by both the design of the database structure and the nature of the data model (see Section E).

Finally once the information from your sources has been converted into data, the database software can be employed to group information together. Once records can be aggregated, then it becomes possible to count them, meaning that statistical analyses can be performed and structural patterns can be identified within the information. Again, however, the efficiency and accuracy of this kind of function will depend on the design of the database and the manner in which the information has been converted. In particular, this kind of functionality will depend a great deal upon the latter, and if the historian aims to perform this kind of analysis extensively, then there will need to be a considerable effort put into applying a ‘standardisation layer’ to the data (see Section C4).