Linked Data Handbook
1. Linked Data. What is it?
The two essential ideas of linked data are:
- use a universal format
- publish your data openly
By openly I mean for anyone to use (i.e. available without paying a fee) and in a format that does not require proprietary software.
If datasets are published for all to use, and they all use the same format, then it will be possible for someone to interrogate all of the data at once. This is clearly much more powerful than individual datasets dotted around in what are known as silos. The data must be structured the same way and use the same means of referring to the same thing. For example, let's say we have data in this format:
If everybody who creates a dataset that mentions that person uses the exactly the same number and in exacty the same format to refer to that person, then we can reliably find them in all of those datasets. Let's make up an example using Anne Hathaway.
Let's make Anne Hathaway, the film star
And let's make Anne Hathaway, Shakespeare's wife,
We can now search for person
15601 and, providing
the data is correctly marked up, know that we are getting the
same person - in this case, the film star - every time.
At this point you might be thinking, "that's what a library catalogue does". It's true that the key idea here is that of the authority file, which is central in library science (an authority file is a definitive list of terms which can be used in a particular context, for example when cataloguing a book). A library catalogue could be linked data but an authority file alone does not contain the next, key step in creating linked data:
The next step is to have a way of describing the
relationship of Anne Hathaway, our
something else. Suppose we have a film that Anne Hathaway acted
in. How do we indicate that? In linked data this is done using a
'triple'. Let's make one up now:
person:"15601", role:actedIn, film:"One Day"
The triple, not surprisingly, has three parts. These are conventionally referred to as subject, predicate and object:
1 person 15601 (the subject) 2 actedIn (the predicate) 3 "One Day" (the object)
Hold on a minute! There's more than one film called One
Day. Let's fix that with another arbitrary number:
person="15601", actedIn, film="873823"
As you can see, this triple is horrible to read. That's because triples aren't meant for humans to read but for computers. Linked data consists of lots and lots of these triples; machines have to do the work for reading them: we don't. We'll come back to triples later. For now there are three key points to remember from this part of the course:
- Linked data must be open and available to anyone on the internet
- Linked data tries to standardise ways of referring to things
- Linked data consists of triples which describe relationships