RDF triples make web connections

23 September 2010

At the beginning of 2010, when the UK government opened up its one-stop portal for government data (data.gov.uk) it didn’t just link to data sets in popular document formats such as Excel, but also re-published much of the data as Linked Data. This enables related data to be connected across the web by combining several concepts. Firstly, there are URIs (Uniform Resource Identifiers), which are strings of characters used to identify a name or resource on the internet. Then there is HTTP, the request-response protocol of the web. The third concept that is enabling data to be connected is RDF (Resource Description Framework).

RDF is a set of specifications that are used to describe objects and the relationships between them in the form of expressions that include three parts: subject; predicate; and object. These so-called RDF triples allow for the encoding of a semantic web, one that can be read by computers as well as humans.

In its simplest form, the concept could be applied to a sentence like ‘John drinks tea’ or ‘David likes apples’. Here, ‘John’ and ‘David’ are the subjects. The predicates are ‘drinks’ and ‘likes’ and the objects are ‘tea’ and ‘apples’. The idea of RDF triples builds on this: essentially RDF triples tie related resources and data together by indicating what something is, what attribute it has and how the attribute relates to it. A scientific paper can use RDF triples to express associated bibliographic information, whilst the relationships between individual tests, experiments, and their results can also be linked. Linking things in this way enables computers to pull up relevant data and results from all over the internet.

However, the power of RDF triples goes beyond linking specific words or phrases. Any of the parts of an RDF triple can be replaced with URIs, which are unique to a particular thing or concept. In the simple triple example ‘David likes apples’, it would not be clear to a machine whether ‘apples’ refers to the fruit or the computer, leading to ambiguity and irrelevant terms appearing in a semantic search. The distinction can be achieved by replacing the literal ‘apples’ with a URI to either the fruit (dbpedia.org/page/Apple) or the computer company (dbpedia.org/page/Apple_Inc.).

The use of URIs also allows established ontologies and vocabularies to be built. For instance, when describing a website, it is possible to make use of both Dublin Core metadata elements for describing resources and Library of Congress Subject Headings. Using established ontologies (or devising and making public your own ontologies if none exist already in your field) helps computers to be able to find all the related information. See the box (below) for examples of how RDF triples are written.

RDF is important to librarians as both publishers and consumers of information. Libraries are responsible for the publishing of a wide variety of information on web pages, in catalogues, and increasingly in institutional repositories of both journal articles and data sets. The value of this information can be increased by making it machine readable, and connecting it to otherwise unconnected data sources. At the simplest level it may be including RDFa markup in staff contact pages, or at a more advanced level making the whole catalogue available in RDF and connecting it to other sources such as the Library of Congress. As consumers of information, librarians need to be aware of the range of data that is being published online and the tools available to make use of it, from browser toolbars for extracting event information and sending it straight to their calendar, to interfaces for querying the data stored in triplestores.

As has already been seen in the area of Web 2.0 mashups, when data is made publicly available, it is used in a multitude of ways unthought-of by the original organisation. The same will be true as an increasing amount of information is made available using RDF.

David Stuart is an independent web analyst and consultant and honorary member of the Statistical Cybermetrics Research Group at the University of Wolverhampton, UK