Project resolves rights management issues

5 April 2007

Last month representatives of publishers and search engines gathered in London to try to break the stalemate over permissions for using different types of content. Recently it has been all too common a story for such disputes between these two types of businesses to end up being resolved in court.

The problem until now, according to Mark Bide of Rightscom, who co-ordinates the new ACAP (Automated Content Access Protocol) project, has been that the existing method of telling search engines which content they can include, the robots.txt protocol, is ‘a very blunt instrument’. As Bide told www.researchinformation.info, ‘It basically just says that content is in or out.’

Such a binary situation falls down in situations where, for example, the abstract of a paper is freely available but the body text requires payment, or where a story in a newspaper includes an image that is from an external source and so has different copyright conditions from the text of story. Another situation where problems can arise is where eBooks are aggregated from different publishers that have different restrictions on their content. ‘We need a more sophisticated way of communicating,’ he said.

The new 12-month pilot project, ACAP, is taking on this task by developing a system so that the owners of content published on the internet can provide information relating to access and use of their content in a form that can be recognised and interpreted by a search engine ‘spider’. This should enable the search engine operator to systematically comply with the content owner’s policy or licence.

The project, an initiative of the World Association of Newspapers, the European Publishers Council, and the International Publishers Association, began in January and is already making good progress, according to Bide. At the recent two-day meeting, representatives from publishers of journals, books, newspapers and magazines sat down with major search engines, the British Library and independent experts. Together they identified some 12 to 15 distinct publisher-contributed Use Cases – including how to deal with fragments of content and eBook content from multiple publishers – for the project to address.

The next step, Bide explained, is to do the technical work to ensure that search engines can read the more detailed permissions. ‘ACAP is about the semantics, defining the words and what they mean. The syntax will then be implemented within robots.txt but also in other ways,’ said Bide. The result will be a new metadata standard, which should be quite straightforward for publishers as they already have large amounts of metadata attached to their content, he added.

And this work is not limited to relationships with search engines. There are other relationships that need to be managed. This is one of the reasons that the British Library is involved in the project – it ‘has a similar but by no means identical set of challenges in developing policies around web archiving for the maintenance of the cultural heritage,’ said Bide.

This pilot project is due to end in December 2007 but that is unlikely to be the end of the story. As well as the technical work to include these permissions with publishers’ content, ACAP also wants to see this work become part of the standardisation process. ‘We have streams within the project to look beyond 2007 and how we propose to manage the process beyond the pilot,’ said Bide. ‘Everybody has a pretty open mind on how this will work out.’