Designing Databases for Historical Research
Historical Research Handbook: Designing Databases for Historical Research
If you apply the principles and techniques discussed in the design of your database, you may well find that you spend a considerable amount of time in the process. Unfortunately there is no getting around this: designing databases simply is a time consuming business, especially if you have adopted a Source-oriented approach and you are working with a range of different, rich and complex sources. However, the time you spend working on the design will be more than repaid when it comes to the data entry and data analysis stages of the database project – and this cannot be overstated. Historical sources will give rise to all manner of complications and problems, intellectual and in terms of the mechanics of databases, and the more you can anticipate these and accommodate them in the design of the database, the more efficient and less frustrating the subsequent use of the database will be.
Before you begin the process of designing your database, and producing your Entity Relationship Diagram, it is worth spending a little time seeing how other historians have designed their databases (see the resources listed in the Further Reading section). You should also read through the other HRH Handbooks on Databases by Mark Merry as these describe in detail the processes of building databases and performing analysis respectively; and it completely necessary to see what is required in order for these processes to work smoothly, so that the design can facilitate take these requirements into account from the very beginning.
Finally, it is worth reiterating that designing databases is difficult, and there is no substitute for practice. No database is ever perfect, and the only indicator of quality, or success, when it comes to database design is whether or not it serves the various functions that you intended. If you can manage the information from your sources in the way that you need, and if you can perform the analysis that you require, and if you can be as flexible as you need in both of these areas, then your design is successful. But you do not have to wait until the latter stages of your database use to find out how successful you have been in the design – you can and should test the design of the database very early on. After producing your Entity Relationship Diagram, build a structural prototype of your database (that is, with only the tables and relationships, without worrying too much about the other tools that go into creating the database application) and spend a week entering data. If you are using multiple sources, enter material from each of the sources. As soon as you start entering data you will very quickly begin to see where any deficiencies in the design might be – look out for:
- Information that you would like to analyse which appears repeatedly, but you have nowhere specific to put it (i.e. for which you will need to add new fields)
- If you find yourself repeating information from record to record, you will need to think about re-ordering your relationships to prevent this (see Section D)
- Watch out for your datatypes, and change them where they are unhelpful
- Look for data that could be standardised or classified
- Look out for information that you had not anticipated when designing the database
It is likely that you will find examples of all of these in a very short space of time. Once you have spent some time entering data, design and run some queries to test whether or not the research questions you know you will want answers to can actually be answered by the current design. Running queries is the ultimate test of whether the database design works or not, and it is likely that you will find yourself rearranging fields in the light of what you learn. The queries will also highlight (often starkly) how much standardising of information you will need to engage in.
Once you have finished this testing, and moved on to design and rebuild ‘Version 2’ of the database, you will be well on the way to creating one of the most powerful research tools available to the historian. It will be a struggle to begin with, but it will be worth it in the end!
Databases for Historians (HRH) by Mark Merry is licensed under a Creative Commons Attribution 2.0 UK: England & Wales License.