Designing Databases for Historical Research

C. Fundamentals of database design

C3. Conceptual models of database design

Whilst it is true that every database ever built has been designed specifically for a particular conjunction of purpose and data, and is therefore to a greater or lesser extent distinctive, it is also true that there are two principal overarching approaches to designing databases. The two conceptual models are known as:

The Source-oriented  approach (sometimes called the Object-oriented approach)

and

The Method-oriented approach (also known as the Model-oriented approach)

These two models should be viewed as polar opposites at the ends of a sliding scale, where the design of a database is based on an approach somewhere between the two extremes. Every database design will be something of a compromise, and no database will ever constitute the ‘perfect source-oriented database’, nor will there ever be the ‘perfect method-oriented database’.

 

C3i – The two conceptual approaches to database design

The Source-oriented model of database design dictates that everything about the design of the historical database is geared towards recording every last piece of information from the sources, omitting nothing, and in effect becoming a digital surrogate for the original. The information contained within the sources, and the shape of that information, completely ordains how the database must be built.

The lifecycle of an ideal source-oriented database can be represented thus:

C3ii – Lifecycle of the Source-oriented database

This approach to database design is very attractive to the historian as it places the sources at the centre of the database project. Entering data into a database is a very time consuming activity, however, and this becomes much more so if you are taking pains to record all of the information that exists in your sources. Ultimately you will need to make choices about which information you will exclude from the database, contrary to the principles of the Source-oriented model, which will undermine the database’s role as a digital surrogate for your sources but which will at least allow you to perform your research within a reasonable period.

The Source-oriented approach, if rigidly applied, can lead to a design that quickly becomes unwieldy as you try to accommodate every last piece of information from your source, some of which may only occur once. But, it does allow for wider analytical approaches to be taken later, so that potential queries are not reliant on the initial research agenda, meaning that the database does not restrict the directions your research might take. It also allows you the reassurance of not having to anticipate all of your research questions in advance, which the Method-oriented model does. The Source-oriented model transfers the source (with all its peculiarities and irregularities) in a reasonably reliable way into the database with little loss of information – ‘everything’ is recorded (or at least what is excluded is done so by your conscious choice), and if later something becomes interesting, you will not have to go back to the source to enter information that you did not deem interesting enough to begin with. The Source-oriented model also enables you to record information from the source ‘as is’, and lets you take decisions about meaning later – so ‘merc.’ can be recorded as ‘merc.’, and not expanded to ‘merchant’ or ‘mercer’ at the point of entry into the database. [1]

At the other end of the scale, the lifecycle of the Method-oriented model database could be represented in a different way:

C3iii – Lifecycle of the Method-oriented database

This approach to database design is based on what the database is intended to do, rather than the nature of the information it is intended to contain. Consequently, if adopting this model for designing your database, it is absolutely vital that you know before you begin precisely what you will want to be able to do with the database – including what queries you will want to run. The level of precision needed here should not be underestimated either, given that the database requires a high degree of granularity to perform analysis –the database will not be able to ‘analyse the demographic characteristics of the population’, for example, whereas it will be able to ‘aggregate, count and link the variables of age, gender, marital status, occupation, taxation assessment, place of residence’ and so on. When designing any database it will be necessary to think at this latter level of detail, but if you are designing a Method-oriented database then it becomes much more important.

Method-oriented databases are quicker to design, build and enter data into, but it is very hard to deviate away from the designed function of the database, in order to (for example) pursue newly discovered lines of enquiry.

Ultimately, historians will need to steer a middle course between the two extreme models, perhaps with a slight tendency to lean towards the Source-oriented approach. When making decisions about what information you need from your sources to go into the database, it is important to take into account that your needs may change over the course of a project that might take a number of years. If you want to be able to maintain the maximum flexibility in your research agenda, then you will need to accommodate more information in the database design than if you are very clear on what it is you need to do (and what that is will never change). If you do not know whether your research needs will change, err on the side of accommodating more information – do not exclude information about servants unless you are absolutely sure that you will never want to treat ‘households with servants’ as a unit of analysis, because if you have not entered that information, then it will not be there to query later on.

However you should not dismiss the Method-oriented model out of hand when considering the approach to your database design. If you know your source(s) very well in advance, and you have definite pre-determined research needs, and you know you will not be attempting to recover all the information from the source, and you know in advance exactly how you will treat your data and what questions you will ask of it – if all this is true, you can use the Method-oriented approach. Alternatively, if you are creating a database which is not actually for historical research, but is designed to be a resource with pre-defined functionality and a limited set of tools that a user can use,[2] then a Method-oriented design is also appropriate.



[1] Leaving this kind of ‘normalisation’ until later in the project is beneficial as it allows you to make decisions about the meaning of data until you have the full body of data to act as context.

[2] Such as an online database with fixed search and retrieval functionality, for example Old Bailey Online (http://www.oldbaileyonline.org/, accessed 23/30/2011).