New life for old resources

23 September 2010

What does your job involve?

I am part of the New Directions Group, which was set up two and a half years ago in recognition of the fact that technology developments are happening rapidly and that we need to be agile and respond quickly to changes.

Whilst we have long-term publishing plans, we also need to be able to prototype new ideas outside this framework. Cambridge University Press has many divisions. I work across the Press prototyping new concepts for all the divisions so that people can demonstrate their ideas to colleagues.

What interesting developments are you seeing?

There is so much information out there that there is a need to help researchers find what they are looking for.

One Web 2.0 way to resolve this is by sharing and recommending using the social web. There are no international standards for folksonomic tagging. However, if a colleague you trust tags in a certain way, you are more likely to think those tagged items are interesting. There is a risk of amplification of certain resources with this, but this already happens with citations. What’s more, people will still turn to what they see as reliable journals and reliable publishers. The same is true for books too; researchers will go to publishers and university presses that they trust.

There is a lot of interest in the semantic web as another potential way to help researchers find what they need. This makes the material machine readable and connects related information. It enables researchers to focus more accurately on what they are looking for.

The Utopia reader being developed by a group of academics at the University of Manchester is an interesting project in this area. It involves PDFs but brings in the semantic web and sources of data, taking advantage of the various data sources available. It also allows users to look up the definitions of words in an open data repository. Users could perhaps click on a compound mentioned in a chemistry paper to get a 3D diagram of the molecule. All this can be done in real time, provided the user has an internet connection. With all the open data out there, much of it from very reliable sources, it makes sense to use it.

The semantic web is going to develop much more, initially in areas where there is standardised vocabulary such as in chemistry and medicine. In the arts, it’s going to be a slower pace of development.

What are the challenges?

Much of the information going online now is not textual. Although projects such as TinEye are making great progress in image search, you can’t do full text searches of video, for example, in the way you can with a web page. The explosion in video content means that this is something that has to be addressed. If something won’t turn up in a Google search because it is inadequately tagged, it may as well not exist. To be able to search these resources, we need to provide full metadata tagging for non-text information and it will be vital to establish taxonomies for this that we can share.

How are you helping users find content?

We’ve also been looking at legacy content. This is important for us because Cambridge University Press has been publishing for over 400 years. Vast amounts of research went on in the past and if that research is not online, it is essentially lost.

The Press is digitising old books and also bringing them back into print as print-on-demand titles in our Lazarus project. This is helping people to find resources they would not have easily found before. For example, we published a book in the 1928 called ‘The Bibliography of Sponges 1551-1913’, which has been out of print for many years. This is not Dan Brown territory – there are probably only copies of it in about five major libraries now. However, since we digitised it, we’ve sold 35 copies to people to whom this book is relevant to their research and who probably would not have had access before. By bringing back such titles, we are adding to the volume of knowledge.

In our Cambridge Library Collection, we are also bringing things back that weren’t originally published by Cambridge University Press, but that are no longer under copyright and were identified by experts as important to particular fields.

Another thing that we are doing is going through and digitising our journal archive. When we launched this in 2009, we included 450,000 papers from more than 180 journals and we will be adding to that.

What other trends are significant?

Mobile information is another key development on the web at the moment. It’s a challenge though; it’s one thing to update your Facebook status on the move, but another thing to read a scholarly article on a mobile device.

Another trend is open data, bringing it together with our own data and doing mashups. The challenges with this are establishing reliable data sources and making sure that what is being done is of value, not just done because it can be done.

Interview by Siân Harris