Organising and Designing Quantitative Data
3. Choosing the right tools
3.3 Spreadsheet or Database?
If you are only creating a small amount of quantitative data, or its structure is very simple (especially if it is primarily numeric) and will fit comfortably into a single table, then a spreadsheet will probably serve your needs, and it will not be as time-consuming to learn or set up as a database.
When to Use a Spreadsheet
Characteristics of your sources or research that might make a spreadsheet a more appropriate choice:
- your sources already resemble spreadsheets - ie, they are in a regular tabular or list format
- your sources consist of mainly numerical information
- your sources are already aggregated information (even if not yet digital), suitable for statistical analysis without significant intermediate processing
- you do not need to link together different sources
- you are not creating large amounts of data
European State Finance Database - "an international collaborative research project for the collection, archiving and dissemination of data on European fiscal history across the medieval, early modern and modern periods." The database contains a range of aggregated tabular data deposited by researchers, which can be downloaded in CSV text format as well as viewed in graphical forms on the website.
1831 Census Data - downloadable datasets with accompanying documentation, made available by the Staffordshire University Victorian Censuses project. Again, these are aggregate data ideal for a spreadsheet.
When You Need a Database
On the other hand, if several of the following apply, you probably need a database:
- you are compiling data from varied sources that you will have to aggregate for statistical analysis yourself
- you are collecting data from related sources that you will want to be able to cross-reference and link together
- your sources are mainly text rather than numbers
- your sources are too complex to fit into a simple flat table
- you will be creating a lot of data
In any case, if you think your research is outgrowing the spreadsheet format, spreadsheet software should normally have convenient facilities to export your data at any time; conversely, once you have data in a database, you will have options to compile it into subsets of aggregate data for analysis in a spreadsheet.
The Old Bailey Online - a database of reports of nearly 200,000 trials held at the Old Bailey in London between 1674 and 1913. Not only is this a very large dataset, but trials are complex sources for quantification. They may contain multiple defendants, charges, verdicts and punishments - "many to many" relationships. Subsets of the data, however, can be generated in tabular format for spreadsheet analysis using the site's Statistical Search.
Family Reconstitution Data, from Cheapside parish registers, c.1540-1710 - a relational database created by the People in Place project. Family reconstitution is a technique used by demographic historians using parish register data between the 16th and 19th centuries, which involves "linking series of births, marriages and burials in the same family and comparing the results across thousands of families" to generate data on long-term demographic trends.
Already using a spreadsheet?
If you answer yes to several of these questions, you probably should consider switching to a database.
- Are you duplicating a lot of data in spreadsheets?
- Are you having to make changes across multiple spreadsheets when you change one of them?
- Are your spreadsheets becoming unwieldy from trying to manage too much information?
- Are you finding it difficult to locate specific data because of the size of your spreadsheets?