Scenario A: all information in one table

In this scenario, you might design a table with the following fields to capture all of the people- and hat-related information available from your sources: Note that database design principles say that you should not combine different entities into a single table (in this case ‘people’ and ‘hats’ being the entities), as this confuses the underlying ‘meaning’ of your data. Good practice would say: hat owners are one entity, hats are another, so create different tables for them. However, it is possible to combine entities into single tables, and for our purposes here it is useful to see the consequences of doing so. For more on entities see Section E2. 

PersonID

Forename

Surname

Occupation

Residence

Age

Gender

HairColour

Income

HatID

HatType

HatColour

HatMaterial

 

 

 

 

 

 

 

 

 

 

 

 

 

Entering a number of records into this table would result in data that resembled:

PersonID

Forename

Surname

Occupation

Residence

Age

Gender

HairColour

Income

HatID

HatType

HatColour

HatMaterial

1

Samuel

Spade

Builder

Peckham

23

M

Blonde

£25,000

1

Baseball cap

Black/red

Polyester

2

Philip

Marlowe

Accountant

Dulwich

35

M

Dappled

£34,533

2

Bowler

Pink

Satin/fur

3

Hercule

Poirot

Plumber

Chelsea

26

M

Bald

£250,000

3

Captain’s hat

White/blue

Silk/braid

4

Miss

Marple

Doctor

Surbiton

58

F

Grey

£1.50

4

Nightcap

Grubby yellow

Rags

With a table like this we would be able to perform some sophisticated analysis on types of people, types of hat, materials used, the correlation between income, age and hat type, occupational and gender distribution of hat types and so on, which would obviously be of enormous benefit to historians interested in this kind of ressearch.

As we continue to examine our sources, we find that some people own more than one hat. This is a one-to-many relationship: one person may have more than one hat (for one-to-many and other types of relationship see Section D3). So, in this database design scenario we might find records appearing in the table in the following manner:

PersonID

Forename

Surname

Occupation

Residence

Age

Gender

HairColour

Income

HatID

HatType

HatColour

HatMaterial

1

Samuel

Spade

Builder

Peckham

23

M

Blonde

£25,000

1

Baseball cap

Black/red

Polyester

2

Philip

Marlowe

Accountant

Dulwich

35

M

Dappled

£34,533

2

Bowler

Pink

Satin/fur

2

Philip

Marlowe

Accoumtant

Dulwich

35

M

Dappled

£34,533

3

Crash helmet

Green

Kevlar/fur

3

Hercule

Poirot

Plumber

Chelsea

26

M

Bald

£250,000

4

Stetson

Brown

Leather

3

Hercule

Poirot

Plumber

Chelsea

26

M

Bald

£250,000

5

Captain’s hat

White/blue

Silk/braid

4

Miss

Marple

Doctor

Surbiton

58

F

Grey

£1.50

6

Nightcap

Grubby yellow

Rags

 

 

As you can see the table now has six records rather than four, and we have entered two of our people twice (Marlowe and Poirot). The reason we have entered these two twice is because each of them owns two hats, as you can see from the hat-related fields. As a result of our lucky multiple-hat owning individuals, nine of our fields have duplicated values in them, and this is both a problem, and a clue to the fact that this table might be better designed as part of a relational data model. Duplicating information across records in this way should be avoided for a number of reasons. Firstly, data entry is time consuming enough without having to enter the same information on more than one occasion. Secondly, the more times you enter the same piece of information into the database the more scope there is for entering something incorrectly, as we have done here with Marlowe’s occupation value.

This particular error could have been avoided through the use of a variety of tools within the database that are designed to mitigate data entry errors.

This may see a trivial point, but given the way that databases connect information, the different spelling of ‘accountant’ will mean that the database will recognise the two records as belonging to entirely different individuals.

 For more on this see Section F1.

 But the most serious problem that this duplication of information raises is a third problem – which is that this will adversely affect some types of analysis by providing false results to queries. If we wanted to we could design a query to answer the question: ‘how many fur hats were owned by accountants?’ The query would look at the data, identify which records were about fur hats and which were about accountants, and then count the number of records which were about both, and this would be the answer to the question. In this case the query would tell us that there were two fur hats owned by accountants, and this would be correct. If the question we wanted an answer to was: ‘how many accountants owned fur hats?’, then the query would operate in exactly the same way – count how many records of fur hats and accountants, and tell us how many records had both. Thus the answer would again be ‘two’, but in this instance, however, that would be inaccurate, as only one individual who was an accountant owned fur hats: he appears twice in the records and is therefore counted twice, erroneously, when he should only have been counted once.

 

Return to Handbook

Last modified: Thursday, 25 July 2013, 1:39 PM