SUBSCRIBE TO Research Information
SUBSCRIBE TO RINEWSLINE
Building an information infrastructure in the UK
Collecting electronic theses, sharing image collections between universities and automatically transferring content from institutional repositories to national libraries are just some of the projects going on in the UK, as Julie Allinson and Roddy MacLeod report
Since 2002, the UK's JISC (Joint Information Systems Committee) has shown a strong commitment to an emerging trend in research institutions and digital repositories. This commitment has been reflected in its Digital Repositories Programme as well as a £14m investment in the Repositories and Preservation strand of its Capital Programme.
Repositories are digital stores that manage and provide access to resources and metadata. They come in many shapes and sizes : subject repositories, both national and international; data archives and data centres; learning object repositories; digital libraries; and institutional eprint repositories. According to the OpenDOAR directory, around 50 of the UK's 200 higher education institutions already have institutional and/or department repositories .
This growing role of repositories in UK research and education is reflected in the 25 projects that make up JISC's Digital Repositories Programme. This programme which aims to enable institutions to make better use of repositories across research, teaching, information and administration.
Projects going on
One early success story from this programme is the Repository Bridge project, which developed a way to automatically deposit electronic theses from the University of Wales to the National Library of Wales. This project, and similar work done by the EThOS project led by the University of Glasgow, has shown that this mechanism can be achieved using open-source software and open standards. The resulting software tools enable repositories based on DSpace or EPrints software to fully interact with Fedora open-source systems, to migrate and store items in a persistent manner. 
And this development is not the only result from the project. On the road to that outcome, the project has documented its understanding of processes (theses deposit), technologies (DSpace and Fedora) and standards (METS and OAI-PMH). The collaboration with the EThOS project has also helped ensure interoperable development.
Collaboration, advocacy and awareness-raising are vital when projects are investigating different aspects of the same topic. They help to ensure that duplicate activity or divergent practices do not arise. JISC is keen to support projects in working together, through the work of its support team. This team's remit includes dissemination and synthesis, as well as facilitating a set of cross-programme cluster groups to help projects working on similar themes share knowledge and experience.
Clusters help projects to collaborate
One example is the data cluster, which includes a number of projects looking at the deposit and curation of primary research data. These projects include workflows for the deposit of experimental data (R4L, SPECTRa), linking research data with research publications (R4L, StORe), mechanisms for citing research data (CLADDIER) and the reuse of geospatial data (GRADE). These projects have also developed links with related activities such as the Digital Curation Centre, the e-Bank project and DART, an Australian initiative.
Beyond the cluster groups, many project outputs relate to the work of other projects. The GRADE project, for example, has produced a detailed report of geospatial use cases as a basis for examining copyright issues relating to selected data sets. Other projects looking at digital rights issues are TrustDR, and Rights and Rewards. The latter is also examining barriers and potential reward mechanisms to motivate researchers to deposit into repositories.
Projects in the programme are also working with the JISC (UK) and DEST (Australia) e-Framework, a service-oriented framework for education and research. One project, ASK (Accessing and Storing Knowledge), is utilising the e-Framework to document a reference model and design for a repository software system.
This kind of service-oriented approach to establishing interoperable services is of interest to a swathe of other projects falling within the remit of the ‘Integrating infrastructure' cluster. For example, the EThOS project is working with the British Library towards a fully electronic e-theses service; SPIRE is investigating the use of peer-to-peer technology for sharing resources; MIDESS will enable three universities to collaboratively manage their image collections; and IRIScotland is scoping a national repository-search service for Scotland. The PerX project offers a subject perspective on cross-searching of repositories (see box on page 25-26).
Looking forward, the Repositories Roadmap, produced within the Digital Repositories Programme, offers a vision for 2010 of ‘a technical infrastructure that supports the deposit, discovery, access and use of objects in repositories by software applications.' This roadmap goes on to say that this infrastructure must work across both open-access and closed repositories. Underpinning this will be widespread agreement about the machine-to-machine interfaces (the services) that open-access repositories should support in order to ingest and make available content and metadata. 
The Linking UK Repositories report, produced by Alma Swan and Chris Awre, also has many recommendations for the programme. It concludes that, ‘the creation of a system of open-access repositories across the UK with user-oriented services built across them will not happen properly unless it is led by an organisation with vision and focus. The essential issues in the process are planning, communication and coordination. 
Delivering on these visions for an interoperating infrastructure of repositories and services is no easy task, but the work that has been and is being done by the wide range of JISC-funded projects is already having an impact and this is set to continue. The UK repositories search service, for example, will offer a single access-point to search repositories across the UK. And the EPrints metadata application profile , backed by the Digital Repositories support team, will enable the service to offer a much richer set of search features. What's more, many projects will begin over the next three years, offering new tools and mechanisms to support widespread open access to resources.
Julie Allinson is JISC digital repositories support officer, UKOLN, University of Bath, UK. Roddy MacLeod is senior subject librarian at Heriot-Watt University, UK.
PerX project tackles engineering resources
PerX (Pilot Engineering Repository Xsearch) is based at Heriot-Watt University, with partners at Cranfield University, the Institution of Civil Engineers/Thomas Telford Ltd, University of Arizona and RSC East Midlands. It has been funded for two years, from June 2005, to explore the provision of subject-based resource discovery services.
An early task was to produce a list of significant repositories relevant to a particular subject (engineering) and to provide examples via type and coverage. This listing (www.icbl.hw.ac.uk/perx/sourceslisting.htm) revealed that there is a wide variety of digital repositories of interest. This includes repositories where actual content is deposited, and metadata repositories that contain only metadata about content.
Following on from this listing, an analysis of the engineering digital repositories landscape (www.icbl.hw.ac.uk/perx/analysis.htm) revealed several interesting things. Firstly, despite the overall number of repositories, there are some significant gaps in the provision of engineering resources. These gaps include research data, subject-based access, technical reports, journals, and assessment materials. Secondly, the means and levels of interoperability of the identified repositories vary widely. Some are un-interoperable, some have non-standard interoperability (i.e. proprietary APIs), while others have fully-functional interoperability based on established standards such as Z39.50, SRW and OAI-PMH. Thirdly, the information landscape of engineering is quite complex. It includes resource-types such as technical reports, standards, patents and trade literature, alongside more obvious types such as peer-reviewed scholarly articles. The final discovery was that the differences between disciplines in information needs and information retrieval habits need to be carefully considered when developing subject-based resource discovery services.
This analysis raised questions about what is meant by ‘interoperability' and ‘metadata' and their importance for data providers and service providers. It also asked why a standardised approach to interoperability is important and how standards can facilitate content syndication.
To answer these questions, PerX published a document:‘Marketing' with Metadata – How Metadata Can Increase Exposure and Visibility of Online Content (www.icbl.hw.ac.uk/perx/advocacy/exposingmetadata.htm). This explains, in non-technical language, all of the above and outlines how content providers can share, or embed, their descriptive data (metadata) with other websites, in standard and reusable ways. This document has received favourable feedback and attracts a considerable number of downloads.
A major landmark in the PerX Project was the creation of a pilot service (www.engineering.ac.uk) allowing numerous digital repositories to be cross-searched from one interface. The repositories vary considerably in size, content and type. They range from a large subset of the arXiv.org repository to the much smaller Geotechnical, Rock and Water Resources Library (GROW) Digital Library.
Feedback showed considerable agreement about the need for a subject-focused service that cross-searches numerous collections in engineering. Various suggestions for improvements to the pilot interface have been made, but what is most obvious is the need to increase the number of digital repositories of various kinds being cross-searched. This mirrors the findings of our analysis: a service that focused only on materials in repositories, and ignored materials found in other sources for which metadata repositories may be available, would be unlikely to be regarded as an essential information retrieval tool.
PerX has also found that metadata harvested from OAI-compliant repositories too often contains non-valid or ill-formed XML documents which need to be corrected before further use. Another limitation, especially important in the context of subject-based services, is the lack of uptake of OAI ‘sets' by many data providers. A very basic subject-type standard for sets would make the identification, by aggregators/subject-based services, of relevant records from multi-disciplinary repositories much easier. Quality of metadata is an issue that needs further attention.
Information about all of the projects mentioned here, along with links to their websites can be found from the Digital Repositories Programme wiki: www.ukoln.ac.uk/repositories/digirep
 As noted in the Digital Repositories Review (www.jisc.ac.uk/uploaded_documents/digital-repositories-review-2005.pdf), work is ongoing on a typology and ecology by the Digital Repositories Programme Support team (www.ukoln.ac.uk/repositories/digirep/index/Typology_and_ecology) and the CD-LOR project report on Learning Communities and Repositories (www.ic-learning.dundee.ac.uk/projects/CD-LOR/CDLORdeliverable1_learning
 Repository Bridge. Final Report, v1a, 28/06/2006 www.jisc.ac.uk/uploaded_documents/Repository_Bridge_Final_Report.pdf
 Heery, Rachel and Powell, Andy. Digital Repositories Roadmap: looking forward, April 2006 www.ukoln.ac.uk/repositories/publications/roadmap-200604
 Swan, Alma and Awre, Chris. Linking UK Repositories, 2006 www.jisc.ac.uk/uploaded_documents/Linking_UK_repositories_report.pdf