Connecting digital heritage

30 May 2013

How are things going?

Things are going tremendously well. We’ve had a huge, positive response and huge traffic. It is one of the most exciting projects around, bringing together so much US heritage under one roof.

We are just getting started. Right now we have 2.4 million items in the collection but we are hoping to make it an order of magnitude, or even two orders of magnitude, bigger.

The approach is to work through partnerships so the DPLA is a collection of collections. Big institutions such as the National Archives and Records Administration and the Smithsonian Institution have given us hundreds of thousands of items. We have about half a dozen of these content hubs.

We also have service hubs, bringing together smaller collections. These places are helping us to go out and pull together things at a regional level.

I haven’t looked at each of the 2.4 million records but, with those I’ve seen, I’ve been really impressed by the quality of the digitisation.

It’s up to content partners what they share with DPLA. For some, it is just a small fraction of what they have – but, as people get comfortable with the DPLA, we will see more. Any resources that might be unique will be very interesting.

What are the challenges?

One of the huge challenges is getting the metadata right. Metadata quality and completeness can vary enormously. A large part of the behind-the-scenes work is on metadata and, for example, getting things geo-coded. This means that people can browse DPLA content on a map. It can also be viewed on a timeline. For all these serendipity modes of discovery we need to work on the metadata.

Going forward, we are also interested in adding additional linked data so that content can be linked by subject or theme, for example, and we are using JSON-LD [a lightweight linked data format].

Obviously one of the things with content partners is that things can move. We are constantly updating links.

There could also be the issue of duplication. I’ve had experience of this with the Zotero project [a research tool that helps users build personal libraries]. It’s a very hard challenge. There are things that you can do algorithmically, but it also needs a human touch. There will be a way for DPLA users to provide feedback and help with this.

We could collapse items together but with rarer and older collections this could be harder to do computationally. Researchers may also want to see different versions of an item side by side, with different notes in the margins, so we don’t want to just delete one record.

We have unique IDs in our system and certain aspects of linked data can link ID systems. It’s a complex problem that we need to look into.

How does the DPLA work with partner libraries?

There are so many university libraries and even smaller libraries. Even public libraries often have unique, special collections for their towns. I view part of the DPLA’s role as raising money not just for us, but for our partners too.

All of our metadata in the system is licensed under the CC0 licence, but we link out to content under a range of licences. We don’t mandate that content is made available under a Creative Commons licence. Although we would love to see things being more open, we are being pragmatic.

The majority of the materials will be in the public domain and we want to enhance open access to US heritage. However, the copyright landscape is very complex. I’d prefer to have solutions that work globally, hence the use of Creative Commons.

I don’t think many people are happy with the licensing landscape with e-books. We have some creative ideas brewing with partners about e-book sharing. One test case I’ve thought of is: if a publisher gave us for free the first book in, say, a nine-book series, then the publisher would probably do really well.

We’re also making collections available through our API. We see the DPLA as a platform and already have developers creating apps. We will see more apps emerging, such as educational apps, smartphone walking tours, and augmented reality.

What about international partnerships?

Europeana has been critical for us, as they have been doing a similar thing for five years. We have taken so much inspiration from them and borrowed their data model. This enables the opportunity to cross-link with them. The Europeana team has really helped us think about licensing and technical issues too. I think that’s one of the reasons we’ve been able to launch the DPLA so quickly.

One of the nice things is that we’ve been able to do a combined exhibition with Europeana about immigration and emigration. The USA is, in a large part, a nation of immigrants and many people came from Europe.

We are seeing more digital heritage projects around the world too. One I’m particularly interested in is National Library of Australia’s Trove project. This is pulling together a lot of materials from Australia, and is a great aggregate project.

This is happening on a global scale so we can think about ways to bootstrap our way to a global digital library.

Interview by Siân Harris