Designing Databases for Historical Research

Historical Research Handbook: Designing Databases for Historical Research

E. Entity relationship modelling

E3. Conclusion

The process of Entity Relationship Modelling (ERM) is difficult, and rapidly becomes more difficult if you are blessed with a number of different kinds of sources, each of which contains rich information about a variety of subjects. If you are using multiple sources, it is a good idea to avoid creating entities that are source specific: for example, if you are using census returns and taxation lists, both of which contain information about people, do not create two tables for people (one containing the information from one source, the other from the second). Stick to the abstract logic of the information – what is important to your research is people, so accommodate all of the information about people in the same place. Not only does this make sense from the point of view of logic, but it will also make it much easier to find data about specific individuals later on (either manually or via queries): looking for a person is easier to do if everyone is located in one table rather than several.[1]

No Entity Relational Diagram (ERD) will ever be perfect, as with so much else involved in database design it will be a matter of compromise. The success of an ERD is something that can only be determined in one way – by the database performing the tasks it was intended and designed to do, and this is something that will not become evident until after you have begun entering data and using the database for analysis. This is why the creation of the ERD is (or should be) swiftly followed by a period of intense testing of the database ‘in action’, in order to quickly identify where the design is impeding the database’s purpose (see Section G).

[1] Ultimately of course this is a matter of personal judgement: you may decide that your entity is not ‘people’, but is in fact two separate entities comprising ‘census return’ and ‘tax payer’, in which case you would be able to argue for two separate tables. You would still face the problem of having to look for individuals in more than one table, however, should the need ever arise.