Archive programmes gain momentum
Electronic archives of published and unpublished results are becoming popular with academic institutions but they are not without controversy. Nadya Anscombe investigates
The internet has dramatically changed the way that academic institutions around the world safeguard their research results and make them accessible to a wider audience. Many universities now have an institutional repository - a web-based, electronic, open archive of papers, theses and many other kinds of data.
In some countries, such as the Netherlands, every single university has an institutional repository and in others, such as the UK, around 60 per cent have one, but the rest are at least planning one. Italy has fewer - research has shown that there are around 77 universities but only 11 institutional repositories - while in Germany there seem to be more institutional repositories than there are universities because many of the individual research institutes have their own.
While institutional repositories are very valuable tools for academics and the institutions they work for, publishers of academic journals could have cause for concern. 'Until now, most publishers have not been worried about letting authors post an early version of papers somewhere on the internet,' says Sally Morris, chief executive of the Association of Learned and Professional Society Publishers (ALPSP). 'But imagine if all these individual articles (albeit not necessarily final versions) were linked up through networked institutional repositories. It could happen that the majority of papers from a particular journal become available for any researcher to find. This could lead cash-strapped libraries to stop buying that journal, which would make it no longer viable.'
Until now, institutional repositories have not obviously harmed publishers' businesses. For example, the physics community has one of the longest running and most comprehensive subject-based repositories (ARXIV), but some leading physics publishers say that it has not affected their subscriptions so far. However, downloads from publishers' sites do seem to be declining and this could cause problems. As Morris explains, 'librarians can now take advantage of comparable usage data between journals in order to make decisions about continuing subscriptions. If readers access journals through repositories this usage will not show up in the data. The librarian might therefore decide to cancel a subscription even though the same numbers of people still access the journal.'
And it is not just the publishers who are affected. Institutional repositories could also be seen as a threat to traditional libraries. However, libraries do not just provide access to research information. There is plenty of information that is not available digitally and most libraries offer a host of other services so their roles still look fairly secure.
Despite these concerns, many organisations are pushing hard to encourage the development and usage of institutional repositories. For example, in the US, the National Institutes of Health (NIH) has recently called on scientists to publish public manuscripts from NIH-funded research in its online archive within 12 months of final publication. It has also urged publishers to help authors implement this policy.
The UK's Wellcome Trust has a similar policy. From October this year, all papers from new research projects funded by this trust must be deposited in PubMed Central within six months of publication. And from October 2006, all existing grant holders must deposit any future papers produced from this funding into PubMed Central or UK PubMed Central. These announcements are also supported by the eight UK Research Councils, under the umbrella of Research Councils UK (RCUK). RCUK has proposed that it be mandatory for research papers arising from council-funded work to be deposited in openly available repositories at the earliest opportunity.
What the researchers think
Research suggests that authors will comply with these requests. A report commissioned by the UK's Joint Information Systems Committee (JISC) found that the vast majority of authors (81 per cent) would willingly comply with a mandate from their employer or research funder to deposit copies of their articles in an institutional or subject-based repository. A further 13 per cent would comply reluctantly while only five per cent say that they would not comply with such a mandate.
At the time of the survey only 30 per cent of respondents were using specialised search engines to navigate open-access repositories while 72 per cent of authors were using Google to search the web for scholarly articles. The subsequent arrival of GoogleScholar, which indexes the content of open-access repositories as well as general websites, will probably increase the level by which institutional repositories are searched and therefore on the impact of the articles deposited in them.
The same report, carried out by Key Perspectives, found that use of institutional repositories had doubled in 2004 when compared with the previous year and that usage of subject-based repositories had grown by 60 per cent over the same period. The report also found that authors have frequently expressed reluctance to self-archive because of the perceived time required and possible technical difficulties in carrying out this activity. Another author worry is the danger of infringing copyright agreements with publishers.
Author apathy and reluctance to change seem to be the biggest challenges, even for the most successful of repositories. Leo Waaijers is manager of the SURF-DARE programme in the Netherlands - the first national network to link up institutional repositories from all universities in one country. He told Research Information: 'Our biggest problem is convincing academics that they have something worth preserving and that an institutional repository is the place to do it.'
The result of the SURF-DARE programme, DAREnet, was set up in record time. Although only a third of Dutch universities had a repository at the start of the project in 2003, DAREnet took only a year to set up. Waajiers says the main challenge during this time was software. 'There are 15 institutional repositories and six different software programs were used to set them up,' he says. 'This causes problems the more sophisticated you become and it took a lot of work to get them to all work together properly. If we could start again, we would advise universities that they should all use the same software.'
The main problem now, however, is encouraging people to use the repository system. To do this, SURF created the Cream of Science project. 'We asked 200 of the Netherlands' top scientists to deposit their entire portfolio into DAREnet,' says Waaijers. 'This was a lot of work. Anything published before 1998 we had to convert to digital format; we had to locate many papers and sort out copyright problems but now we have more than 40,000 records in the Cream of Science project. Now we have demonstrated how successful it can be, other academics have seen this and are keen to use institutional repositories.'
Libraries could be in control
Waajiers believes that academic publishers and libraries could use institutional repositories as a tool rather than seeing them as a threat. 'For example, they could offer services that use institutional repositories such as making virtual research environments or publishing overlay journals where the journals are based on peer-reviewed content in various institutional repositories,' he says. 'I believe there is still a lot of work for libraries but I think it will change in the future. Libraries could, for example, take on the managing and maintaining of institutional repositories.'
This is already the case at the University of Cambridge, UK. Its institutional repository, DSpace@Cambridge, is managed jointly by the computing service and the university library. This repository was one of the first users of the now widely-used open-source DSpace software which was developed at the Massachusetts Institute of Technology, US and released in 2002.
DSpace@Cambridge is still officially a project and not a full service yet, but Peter Morgan, project director for DSpace@Cambridge, is hopeful that he will find funding from the university to develop the service. 'The university hierarchy is sympathetic to the idea of a repository and believes a repository should be part of the university's infrastructure, just like the library,' he says. 'Many people underestimate the cost of an institutional repository. It can be set up very cheaply with open-source software, so no-one should be able to say they can't get started, but as soon as you want to develop the system, provide support and store different kinds of data, hardware and personnel costs start to rise. Here, we need at least two people on the technical side and one librarian. Any institution that wants to start a repository should seriously consider the long-term costs.'
He agrees that the main challenge for institutional repositories is getting people to use them to their full potential. 'Most academics are uneasy about copyright issues and believe that in order to succeed in the annual Research Assessment Exercise they must publish in the best journals,' says Morgan. He hopes that the recent RCUK position statement will help to change this but believes institutional repositories have a larger role to play than just publishing academic work. 'Some academics believe that if their work is stored on the department computer, there is no need to worry about it. There is widespread ignorance about data loss,' says Morgan. 'Data has to be managed properly. We believe strongly in long-term data preservation and we migrate files into new file formats regularly, preserving them for future use.'
It is in this area of data preservation that institutional repositories really show their worth. By storing files in a managed repository researchers will be ensuring that their work can be read by future generations, for free. Just like in a library.
The right tools for the job
There is now a range of adequate, easily-available software for creating and maintaining institutional repositories. Some are commercial packages but many others are available free under open-source licences. The two leading software packages are DSpace (MIT, US) and EPrints (Southampton, UK) but there are plenty of others to choose from, including the following:
For a detailed market report on web-based repositories, published by Mark Ware Consulting, see www.palsgroup.org.uk and click on 'Pathfinder research on institutional repositories'
For general information about institutional repositories from the point of view of the Scholarly Publishing and Academic Resource Coalition (SPARC) see www.arl.org/sparc/repos