Case Study: LIPARM
Project Title: LIPARM
Type: Linked Data
Introduction and Definition
Project definition linked data is exposed data which can be read by machines along with other data in the same format. A distinction is sometimes made between linked data and the semantic web, in which the latter applies to machine reasoning over sets of data; however the two terms are sometimes used interchangeably.
A standard format for linked data is RDF (Resource Description Framework), which is a W3C recommendation. RDF data is stored in triples having the structure subject – predicate – object and makes extensive use of URIs (Uniform Resource Indicators). The use of publicly shared URIs (for describing things) and vocabularies (for describing the relationships between them) is what makes linked data interconnect.
RDF can be queried using SPARQL, which can both query and update datasets in a similar way to SQL in database contexts. See the SPARQL endpoint at http://data.gov.uk/sparql for details and tutorials. For example, the structured data within Wikipedia articles has been made available as RDF on Dbpedia (http://dbpedia.org), and this RDF can also be queried via Dbpedia’s SPARQL interface: http://dbpedia.org/snorql/.
For more information on the potential value of linked data, see the JISC reprt ‘Review of the evidence of the value of linked data’: http://www.jisc.ac.uk/whatwedo/programmes/inf11/jiscexpo/RevLinkedDataApproach.aspx.
A book-length introduction to both SPARQL and RDF is Learning SPARQL by Bob duCharme (O’Reilly, 2011).
LIPARM – Linking Parliamentary Records Through Metadata – is a JISC-funded project to link parliamentary records via a common metadata scheme. The project is a collaboration between King’s College, London; the History of Parliament Trust; the Institute of Historical Research; Queen’s University, Belfast, the UK Parliament Web and Internet Services, and the Northern Ireland Assembly.
The LIPARM project will make it possible for the first time to search parliamentary records simultaneously, via a unified metadata scheme. This will mean that it would be possible to search for all mentions of a bill, an act, a member, or a constituency, across multiple parliamentary sources.
The initial LIPARM project will link the records from the Parliament of Northern Ireland (Stormont), from 1921 to 1972, and the Westminster Parliament in the same period.
The project has three phases, each with an output:
- The creation of the metadata scheme itself (called Parliamentary Metadata Language (PML))
- The creation of authority lists, including unique identifiers for each item in the lists and MADS and RDF outputs of the authority lists
- the generation of PML records for the two collections
- The creation of a union catalogue and front end for querying the data
As of September 2012, LIPARM is an ongoing project, and the union catalogue which is a central part of the project outputs, will be released in late 2012.
Use of tool
- Metadata scheme.
A key component to the project is the construction of the metadata scheme PML itself. The scheme is published as an XML schema document which defines core components of parliamentary records and key linkages between them. The full scheme has been published here http://sas-space.sas.ac.uk/4315/4/pml.xsd and is freely available.
- Controlled vocabularies.
These are essential for cross-linking because they ensure that the same entity is referred to in the same way in each resource that uses LIPARM now or in the future. A Uniform Resource Indicator (URI) is constructed for each entity, and that functions as a unique identifier in every case.
There were two types of entities that needed to be identified. The first type consisted of simple and short lists that could be constructed by the parliamentary experts from their own knowledge. For example legislation types (bill, act, private member’s bill). The second type required more extensive data gathering, often by collating and transforming available sources, to construct lists of the members of both parliaments, bills, acts, constituencies, parliaments and sessions.
The authorities are published in MADS (Metadata Authority Description Schema): http://www.loc.gov/standards/mads/. MADS is a relatively new standard and so the project is a good test of its capabilities and will provide an exemplar for future projects using it. The controlled vocabularies were first constructed in a simple XML format and then converted to MADS using XSLT (Extensible Stylesheet Language Transformations). For example, a snippet of two of the parliamentary constituencies for the Parliament of Northern Ireland, looks like this in MADS markup (note the use of a URI as a unique identifier):
- Union catalogue as front end
A web-based interface for the PML records generated by the project is being constructed at the National Library of Wales. This will allow browsing and searching by key components of PML files, including by people, legislation and proceedings. This interface will go live in November 2012.
All of the data produced by the project – the metadata scheme and the controlled vocabularies in MADS – will be freely available as open data on a Creative Commons licence. They can be accessed here: http://sas-space.sas.ac.uk/4315/. At the time of writing the RDF was due to be published on the same institutional repository as the other material. Indeed, this open publishing is essential to linked data, since it means that the vocabularies and URIs published by the project can be used by others.
The project has been explicitly designed to be extensible to any parliaments and legislatures, either internationally or historically. As far as possible it is agnostic as to particular local procedures: local features such as royal assent are recorded as generic "proceedings objects" and defined more closely by using XML attributes to tie them to controlled terms. The controlled vocabularies published by the project can be used in any applicable legislature or the semantic range of the schema can be extended to others by the application of additional terms in the controlled vocabularies.
Although this case study was written at an intermediate stage of the life of LIPARM, its progress to date has been smooth and the project is likely to produce a robust and replicable proof of concept for linking data sets through a shared metadata standard.
The project would actively encourage any interested party to apply the LIPARM scheme to further legislatures, whether national, local or historical. Indeed it should be possible to use LIPARM to the records of any voting body. The use of LIPARM in this way cannot yet be determined. The authority lists are explicitly intended to be added to as time goes by and anyone contributing further items will be able to follow the simple URI protocol developed by the project.
More broadly, the project’s openness with its data and its methodology should make the approach applicable to other linked data projects. The metadata schema is highly flexible and may be adopted without change in the context of different types of legislature by adopting new controlled vocabularies. The use of URIs to provide semantic identifiers should make PML-encoded records readily interoperable with any other linked data corpus using the same methodology.