David Stuart reports back from a meeting in London to explore the opportunities of linking data on the web
Most people think of the web as a collection of documents interconnected by hyperlinks. Increasingly, however, organisations from across the different sectors of society are making large amounts of raw data available online. The web is changing from a web of documents to a web of data; from a web that can be read by humans to one that can be read by machines.
The way organisations choose to make this data available has important implications for how others can make use of it. Users might be required to learn the idiosyncrasies of isolated data on each publisher’s website, or they could, through the use of dereferenceable URIs on the web, add value to the data by making links to the data. The latter, the Linked Data approach, was the subject of the 2nd Linked Data Meetup in London at the end of February, where 200 people came together for a day of talks and workshops on the topic.
Almost a decade after Tim Berners-Lee and James Hendler first described their vision of a semantic web in the journal Nature, a general pessimism regarding the imminent approach of a semantic web would be understandable. However the list of organisations sponsoring and attending the Linked Data meet-up in London demonstrates the broad support and interest by some big names. The research community, content providers, commercial organisations, and government were all represented, with sponsors including JISC, the BBC, Talis, Garlik, and the UK’s Cabinet Office.
The day was very much one of two halves, a morning of talks and an afternoon of workshops, and began with Tom Heath introducing the topic of Linked Data. Heath, a researcher at Talis, emphasised the shift in thinking as we move away from a web of documents and start assigning identifiers to things we want to talk about. A subtle shift that is ‘either trivial or revolutionary’. With few exceptions, those within the Linked Data community consider it revolutionary.
When objects have identifiers it is easier to explore both the attributes and relationships between objects. These features were reiterated in the talks that followed, both in Tom Scott’s presentation about the BBC’s Wildlife Finder, and John Sheridan and Jeni Tennison of the Cabinet Office’s ‘How the web of data will be won’. Following the BBC Wildlife Finder talk, the presentation from the Cabinet Office illustrated two important issues for Linked Data: trust and permanence. The BBC Wildlife Finder automatically draws data from external sources, including the crowd-sourced Wikipedia, meaning that if someone changes data on Wikipedia it will automatically be changed on the BBC. While the inclusion of the appropriate attribution may be good enough for BBC, it would be unlikely that it would be acceptable on government web sites.
The semantic web is not only about the government and international organisations putting data online, but, as Lin Clark of DERI Galway showed in her talk, there are free and open-source content management systems, such as Drupal, available for companies of all sizes to embrace the semantic data. It is through a distributed embracing of Linked Data by different organisations that it will best work. As Georgi Kobilarov of Uberblic Labs put it, each data set will be taken care of by those who understand it: the BBC for news, EU Stats for stats.
The practical nature of the meet-up was emphasised in both the panel discussion, ‘Putting information to work – when it moves beyond the niche’, and in the afternoon’s workshops, which included one on Drupal RDFa (a language to represent information about things on the web), building a linked data application, and an introduction to SPARQL (the language that can be used to query linked data). Linked Data is not some vague vision of a semantic web on the horizon, but a practical approach to the semantic web today.
As more data is made available online, the need for organisations and individuals to bridge the gap will only increase, but while this is a role ideally suited to the library and information scientist, the attendees at the meeting were primarily from a computer science background. In part this is attributable to the event being an add-on to a larger programme of JISC programmer developer days, but nonetheless, there is a need for the new opportunities provided by the semantic web to be embraced by the community of library and information scientists.