Archiving without borders
For most librarians the daily challenge is to get the most up-to-date information to the people who need it. But certain libraries are also repositories that will be used by future generations. As information is increasingly being published electronically, the repositories can no longer just fill underground tunnels with dusty racks of books and bound journals. Websites may come and go, electronic formats constantly change and, so, new ways have to be found to save the intellectual heritage of the world.
Since the early 1990s the Dutch National Library, Koninklijke Bibliotheek (KB), has been creating systems to make sure that information in electronic formats is saved in a form that ensures it is available to future generations. Its first concern was to preserve all material published in the Netherlands, but it rapidly worked out that, in the modern age, the place where something is published is a rather nebulous concept – is it where the server is based, where the publishing company is based, or somewhere else? With this dilemma in mind, the library set about creating a global archive.
The KB was established from the original collection of William of Orange. For the last 200 years it has benefited from a voluntary arrangement with all Dutch publishers to get a copy of everything that was published in the country. In fact, it has better coverage than many other countries that have legislation requiring deposits of all published material with the national library. Traditionally, it kept these publications in 70km of underground tunnels.
In the early 1990s it started receiving publications on CD ROMs and, for the first few years, simply stacked them on shelves like books. As CD ROMs became ubiquitous it decided it had to do something a little more adventurous and make the information in them available online. At the same time publishers were starting to move towards electronic publishing. The library still received bound copies of journals but some publications were emerging that were available only through the web.
Johan Steenbakkers joined the KB as deputy librarian in 1987. He was born on the Caribbean island of Aruba but, after working as a teacher in Curacao, decided to study biology and biochemistry at the University of Utrecht, the Netherlands. After graduating, he spent four years as a researcher at the university before being dragged into the job of consolidating the university's libraries. When he started, every professor had his or her own library and there were about 140 of them throughout the university. He started by consolidating the 50 libraries of the faculties of biology and chemistry and eventually consolidated all the university libraries and introduced automatic cataloguing and other new technologies. This experience served him well in his brief at the KB, of bringing the national library into the modern age.
'When I came here there was very little automation, so I spent the first few years brining in new systems. We then started thinking about what we were going to do about the increasing amount of material that was being published electronically, particularly the problem of making it accessible in the future,' explained Steenbakkers. 'About 10 years ago we started negotiating with publishers to start receiving publications electronically. We developed a Gopher system with access from our reading rooms. We wanted the information to be as accessible from outside as it would be in our reading room, while respecting the charging mechanisms of the publishers,' he added.
'We started experimenting with some of the small-scale systems that were available at the time, but they were really intended for researchers to keep reprints for their own use. We also started cooperating with AT&T on building a system that would be comparable to our central stack. Our priority was to create an archive, which meant that firstly we had to maintain the integrity of the digital objects and, secondly, we had to make sure that the information could be read permanently. Formats, software and hardware all change and can become obsolete. We needed both to store the information and to register what was needed to get access to that information for future generations.'
In 1999, the KB started working with IBM Netherlands on developing its core system and since then has been loading its collections into its database. Work is continuing on making sure that the data will always be accessible.
As soon as this work started it became clear that the collections policy needed looking at. 'If you look at what is available through the Net, it no longer makes sense to only store journals that have the Amsterdam imprint. Elsevier, for example, has about 350 journals with the Amsterdam imprint. But we noticed that the imprint was actually quite arbitrary and it could change,' pointed out Steenbakkers.
'We wanted to create an archive that was credible with librarians. They were concerned that, as publications became electronic, they were no longer receiving anything that could be archived like a printed journal. What would happen if that electronic publisher went out of business? We wanted to provide something that libraries could turn to.
'We had already been collecting English-language publications from Dutch publishers. We started working with Elsevier, archiving all its publications, and gradually other publishers have come to us and we have added them to our collections.
'We had to explain to the ministry, which is responsible for the library, that it did not make any sense to collect only Dutch imprints. Also, it did not involve any real extra cost or effort to add other publications. So we became an international depository, especially for those publications which do not have any real homeland,' he said. Fortunately, international publishers were also thinking along the same lines. As Steenbakkers explained, 'they actually wanted a few trusted places where their material could be archived'.
The result became known as the e-Depot system. Several other national libraries have decided to follow suit, but the KB is several years ahead. The British Library is currently developing its own system, which is expected to be available within a few years, but the German National Library decided just to buy a copy of the KB system. The two libraries are now sharing the work on future developments. National libraries do not compete, of course, and there has been a great deal of cooperation throughout Europe. Steenbakkers said that one of the reasons for its lead has been having one of the world's largest publishers on its doorstep. Elsevier allowed it to work with real data in the development phase, rather than having to create a theoretical system and hope to interest publishers in cooperating after the event. Elsevier was also investing heavily in electronic publishing and was driving the market, so it was also one of the first to realise the importance of an independent formal archive.
Steenbakkers said: 'They were looking for a partner and they were lucky enough to find one on the doorstep that was ready for them.'
Research is continuing on the long-term preservation of digital material. The problem for all national libraries is that there has been little focus on these issues in fundamental computer science research. People have mostly concentrated on the here-and-now issues of getting access to their current information, without thinking about future generations. Steenbakkers has been instrumental in trying to get a research framework going across the whole of Europe, so that resources can be put into this field. He believes that there are many commercial organisations that will also soon have to start dealing with these issues. He cites the examples of banks, which have huge archives of information that they need to keep. At the moment they are doing it by keeping legacy systems running and, when they finally give in, doing bulk conversions to new formats, which one day will themselves become obsolete. Similarly, insurance and pension companies need to keep records for very long periods, as do courts and criminal justice systems in every country. He is hoping that governments across Europe will see the need for serious research efforts now, rather than waiting another 20 years when the volume of data that needs to be accessible is many times what it is now.
Steenbakkers said: 'It may look like just a problem for libraries but if you go to hospital today and have an X-ray taken it will be digital. So, the problem of information getting lost or not being readable after just a few years is becoming a more general problem in society. The first organisations that have thought about these issues are the national libraries. We need to interest the major IT companies which have a serious research capability to start thinking about durability and permanent access.'
The Dutch government has held the presidency of the European Union for the last six months and Steenbakkers has used this as an opportunity to give the problem some attention. 'We have been discussing what we want to see on the research agenda for the next few years, so that we can promote the development of new technology to deal with durability. The problem is partly political because we have to make sure resources are made available,' he explained.
'There is a great tradition of co-operation between national libraries, particularly amongst those that are most innovative. We have received support from the European Commission for research. But it takes a lot of effort to get funding for research like this. We have to persuade our ministry that we are doing something important and have to learn the right language to talk to them. The real problem is getting money for fundamental technology research because we are seen as being in the cultural sector. If you compare us to 'big science' we are not seen as being so important. The challenge that we have is to get them to see us in the same terms. But national libraries are an icon of the country and also they are taking a leading role in issues around the archiving and durability of information, so hopefully they can be persuaded to put a bit more money than they are used to into this area.'
CURRICULUM VITAE
Education
1969
MSc Biology and Biochemistry, University of Utrecht, the Netherlands
Employment
1969-1973
Researcher in bio-membranes, University of Utrecht
1973-1987
Librarian, Faculties of Biology and Chemistry, University of Utrecht
1987-1993
Deputy librarian, Koninklijke Bibliotheek, the Netherlands
1993-1998
Director of management and information technology, Koninklijke Bibliotheek
1998-1999
Director of information technology and facility management, Koninklijke Bibliotheek
1999-present
Director of e-strategy and property management, Koninklijke Bibliotheek