Collecting research output

Share this on social media:

Catherine Jones gives a case study of how one research council has dealt with the challenge of collecting together all the research output from the facilities it funds

The positive stance that many research councils take on author and institution self-archiving is a familiar topic. But this goes beyond passive support of the concept. Some research councils are also providing tools that help other organisations to build and maintain their repositories. One example of this is Scitate. This institutional-repository software support package has recently been launched by the UK's Council for the Central Laboratory of the Research Councils (CCLRC) and is the platform used by CCLRC's own repository.

The research output of the Chilbolton Observatory, as well as that from other facilities, is stored in CCLRC's repository ePubsĀ 

CCLRC is one of the eight UK research councils and provides large-scale scientific facilities, rather than giving grants to individual researchers. These facilities are primarily for the UK academic sector, but the complete user community is worldwide. The facilities include ISIS, a neutron spallation source, and the Synchrotron Radiation Source. These are both used to examine the structure of matter. The research subjects covered include particle physics, eScience enabling technology, space science and technology and lasers. CCLRC has three sites: Chilbolton Observatory in Hampshire, Daresbury Laboratory in Cheshire and the Rutherford Appleton Laboratory in Oxfordshire.

The results of all this research go into CCLRC's repository, ePubs, which has been externally accessible since May 2004. It has over 22,000 metadata records dating back to the mid-1960s, and more than 500 of these have full text attached. It is designed to record and disseminate the research done both by CCLRC staff and by the users of the facilities we provide to give one place to find out about the science done at CCLRC.

Creating the repository

The project to create CCLRC's institutional repository started in 2002, following a requirement from our library committee for a centralised record of publications for the organisation. Our feasibility study conclusions showed a keenness within the organisation for a publicly-accessible and centrally-funded repository. There were strong indications that it should be linked into existing business processes and infrastructure and should have a light-touch input mechanism which could be used by busy scientists. An unexpected finding from the study was that those depositing into ePubs should not be limited to CCLRC staff but should include all those who use the facilities in order to give a complete picture of the science done at CCLRC.

CCLRC funds a wide range of research that needs to be collated and accessed

The feasibility study included a technology review to look at the well-established DSpace and ePrints tools. However, these software packages had been written with universities in mind, and were not easy to modify to the rather different requirements of a research organisation such as CCLRC. For this reason we decided, in late 2002/early 2003, that it would be more effective to write our own software to meet our needs. The resulting software, which underpins our ePubs repository, is known as Scitate.

The conceptual model for Scitate is based on IFLA's Functional Requirements for Bibliographic Records. One of the major features of using this model is that it groups related works in the same metadata record. Examples of this are the ability to link the preprint with the final published version or having the conference slides and paper together. This reduces the number of near-duplicates in the collection.

The functionality offered to all users of the system includes two search options: a basic one, which word-searches all the fields in the record, and an advanced one that allows the user to define which field the data should be in.

There are nine browse options that allow the user to navigate through the content. Some of these are standard to any organisation but there are two specialist ones. The first of these is report series because as a research organisation we publish our own technical reports. The other option is collaboration, which is mainly designed for the particle physics community. Most particle physics papers have over two hundred authors so we have allowed users to put in the collaboration name rather than each individual author to reduce the barrier to input.

Adding and editing content

The sign-in feature uses standard organisational authentication system technology, negating the need for additional user IDs. Logging on enables the user to add and edit records and see additional information about the record, including any annotations that other registered users have added, and a full audit trail detailing the changes to the record. The input process is divided into four stages. The first describes the work, giving the title, subject, language and abstract. The second describes the author(s) and organisational context, with details such as affiliations, department or group. The third stage captures the bibliographic details of the item in question, with the details required altering depending on the format of the work. The final stage shows the completed record for checking and submission.

Although there is the potential to capture a high level of metadata, there are few required fields. This allows the person adding the record to decide on the level of detail required. At any stage the work in progress can be saved as a draft, which enables the user to come back to the record at a later date.

There are three levels of authorisation within Scitate. Standard users can add records but these are put in a submitted queue for checking before they go live. The departmental administrator is responsible for the records of a particular department and can check these records. Finally, the system administrator can manage the records from any department.

Reducing confusion

To ensure consistency in journal titles, there is the facility to produce an authorised list of serials which the user can choose from. Being able to identify journal titles effectively can be a complicated process, especially if the author is only using the abbreviation, so this pick list of known titles is provided to assist the user.

The other feature to ensure data consistency is known as disambiguation. CCLRC's ePubs implementation of the software uses the CCLRC source of staff information to link a particular author to their unique staff number. This means that if the author is inconsistent about how they name themselves on published works, all the works will be linked together to the approved form used within CCLRC. Authors who have been disambiguated are shown in italics. Non-CCLRC authors can be disambiguated to a single entry within the ePubs system, even if they are not registered as visitors within the CCLRC information systems.

To ensure data quality there is a suite of reports looking for metadata records that are sparsely completed or have not used the authorised lists. This ensures that the system administrators, in CCLRC's case professional librarians, can check that the data within the system is of acceptable quality.

What it holds

CCLRC's implementation of Scitate, ePubs, includes journal articles, preprints, laboratory technical reports, conference presentations and papers, theses and final project reports.

We have allowed each department to choose whether they would like to have a departmental co-ordinator who manages their publications or whether the library and information services staff check and approve publications. This has worked very successfully.

Metadata records are available dating back to the mid-1960s as a result of importing data from other systems into ePubs. This was not originally part of the development plan. However, we used this data originally to test the system and found that having a body of information within the system proved to be a critical tipping point and encouraged people to add more information to ensure the picture wa s complete.

As our remit includes authors who are not members of staff, we do not aim to have the full text of every article deposited within ePubs.

Beyond CCLRC

One of the remits of CCLRC is to exploit scientific and technical developments for the benefit of the UK, so we are now exploring the potential market for Scitate beyond our ePubs implementation.

Using the model developed by other suppliers, the Scitate software will be available for free download as an open-source product. However CCLRC is also offering support services to set the system up, to share our expertise in populating the repository, and to provide ongoing support and upgrades to the software.

Scitate has been in use at the CCLRC for over two years and is used by both scientists and administrators. It has provided a robust infrastructure to enhance the visibility of the work of CCLRC staff and facilities. We are looking forward to building partnerships with other organisations that would like to share the benefits that using Scitate brings.

Catherine Jones is library systems development manager for CCLRC Library and Information Service. For further information, contact Nick Trigg ( or Catherine Jones (