The never-ending story
From deposits to discovery, Rebecca Pool looks at the long and winding journey of the institutional repository
When metadata librarian Nina Watts started working with the UK-based University of Westminster’s institutional repository in 2005, the wonderful world of online scholarly collections was a very different place. ‘WestminsterResearch’ was about to open with 2,692 records dating back to 2001, and of these, less than 10 per cent had an attachment such as an accepted author manuscript.
‘We launched in 2006 and this was quite early on in the landscape of repositories – Westminster was considered to be an early adopter here,’ she says. ‘In our project initiation document, you can see this mistaken impression that most works would be self-deposited with academics simply uploading publisher PDF... the repository deposits turned out to be entirely mediated.’
‘Different schools also had different responses to the whole idea of open access,’ she adds. ‘One head of school actually said they didn’t see the point of open access, although they have since totally changed their mind.’
Fast-forward 15 years, and the repository is now home to some 22,705 records, of which 27 per cent have an attachment. The percentage of full-text attachments for actual articles published since 2016 is 88 per cent; this figure came in at only 20 per cent for articles from 2006. And, where possible, content is made openly available with types of entries including journal articles, chapters, books and conference papers as well as a rising number of practice-based works such as exhibitions, digital and visual media, artefacts and designs.
According to Watts, 2015 was a watershed for WestminsterResearch in terms of the types of works as well as sheer numbers published. The university had just started working with UK-based Haplo, using the repository provider to ingest workflow, while still relying on its original free and open source Eprints package for the public repository interface. As a result, the repository could much more easily support practice-based research and non-traditional data-sets.
At the same time, the REF open access mandate had just been announced, stating journal articles and some conference proceedings had to be publicly accessible within three months of acceptance for publication in order to be eligible for submission for the post-2014 research excellence framework. Given the double-whammy of easier depositing and REF urgency, WestminsterResearch saw self-deposits rocket from less than one per cent to more than 99 per cent while practice-based/non text-based entries mushroomed by 246 per cent.
‘The Haplo repository and REF open access mandate came at a similar time and the combined power of both led to this massive increase in self-deposits,’ highlights Watts.
‘The mandates really helped people to comply to open access,’ she adds. ‘And we believe that factors contributing to more practice-based research included vastly improved templates and fields for these outputs... in the past, the repository just couldn’t take this content.’
Following these results and the looming REF2021, WestminsterResearch switched to a full Haplo open source-set up in 2018, and entries have continued to rise. As Watts put it: ‘I don’t think we’d have been able to support the increase in open access deposits without this rise in self-depositing.’
Watts’ words echo the sentiments of many institutional professionals and end-users that have been working with repositories over the last decade or so. And given the steady development of repository platforms and tools, these words clearly haven’t fallen on deaf ears.
Jean-Gabriel Bankier, managing director of Bepress as well as Digital Commons for Elsevier, has spent a least a decade developing institutional repository cloud-based services. During this time, he has grown the Digital Commons community to more than 600 academic institutions, which have deposited some 3.5 million articles, leading to more than one billion downloads. Along the way, he has watched open access evolve – as he says, ‘in my mind, there’s never been any kind of flashpoint event with open access’ - and is very conscious of the trials and tribulations that have ensued.
‘The work involved in finding, posting, sharing research – the building of an open access repositories – is a significant amount of work, but progress is being made,’ he says. ‘Tools have been getting better and we’ve been really learning how to cut down on some of the more arduous steps of the workflow.’
‘This is where a lot of effort has taken place over the last several years,’ he adds. ‘For example Sherpa-Romeo [a Jisc tool to aggregate open access policies] was a big step in that direction and we’ve seen various efforts to make collecting metadata easier and faster.’
For its part, Bepress has spent 16 years honing Digital Commons, and is now launching a new harvesting tool that allows users to swiftly harvest Scopus data into Digital Commons, and promises to slash the time taken by institutional repository managers to add faculty publication records into repositories, by up to 50 per cent. This, in combination with a new Outbound API, means the library can reuse the metadata from its institutional repository to manage information about its research contribution too.
‘Digital Commons has always been the showcase repository; we’ve been the Swiss Army Knife of discovery – no matter what your content is, we will help you it showcase it beautifully,’ highlights Bankier. ‘However, we also envisage [the repository] being a hub where accurate and comprehensive researcher data is gathered, organised and then shared.’
‘Today’s institutional repository must support the essential reputational needs of the university, serving as a research showcase and as a database of researcher outputs, describing the research being done,’ he adds.
Discovery matters
Workflows aside, Bepress has also focused on making open access repository content discoverable as well as demonstrating impact. To this end, the company incorporated Digital Commons Dashboards that allow administrators to browse real-time download activity and generate usage reports on demand. And impact analytics have expanded beyond downloads of the full text to include metrics such as citations, captures, mentions, and social media likes and tweets with the integration of PlumX Metrics. ‘This is about encouraging additional contributions from authors and is so important,’ says Bankier. ‘It’s like the tree falling in the forest; if no-one hears that, did it really fall? The more authors are alerted to the impact that their work is having, the more they will contribute in the future.’
Other key industry players are developing methods to drive both discovery and impact forward. Earlier this year, Figshare launched a faceted search page across its platform – which, according to chief executive, Mark Hahnel, is designed to make it easier for users to ask questions and find content.
The move was, in part, prompted by Figshare’s rapidly rising volumes of text-based content. While the platform was originally designed with visual searches in mind, it has been increasingly used by institutions as a paper and thesis repository while high-volume preprint services, including ChemRxiv and SAGE Advance, also rely on its infrastructure.
‘We’re also adding quality filters to the search, such as filter by citation count, which provides an indicator of what we can do,’ highlights Hahnel.
‘Importantly, the faceted search also provides a way to ask questions, such as ‘how many downloads of chemistry content have there been in June’ on a human level, and without using a [search] API.’
Liz Bal, director of open research services at Jisc, is pleased to see a rising number of tools designed to make life easier for repository users. ‘We have more repositories than ever, more research than ever and as an end-user navigating this, it can be challenging.’
In a similar vein to Bankier and Hahnel, she believes discovery is a crucial part of the growing repository landscape. ‘At Jisc, we are interested in discovery as we believe this is an area that is ripe for improvement, will help people access the right information and help us realise the benefits of open research,’ she says.
Indeed, Jisc has been outlining many mechanisms and key tools to drive discovery forward. For example, the organisation recommends that any repository registers with the global directory of open access repositories – OpenDOAR – so users can find the most relevant repository when searching for content. Importantly, registration also ensures a repository is picked up by services such as CORE, a global aggregator of open access content, delivered by Jisc and The Open University.
‘In response to the recent UKRI open access consultation, one of our key recommendations is that registration with OpenDOAR becomes mandatory,’ says Bal.
‘This will help with content searches, quality assurance and provide information around the status of the national and global repository infrastructure,’ she adds. ‘We’re keen to link up such services to add depth and resilience to the repository infrastructure.’
As Bal also highlights, Jisc’s aggregator, CORE, includes a web-based search engine, programmable access to metadata and full-text for text mining to promote content discovery. What’s more, customised search and analysis tools can be created using CORE index and its API.
In line with other key player’s sentiments on impact, Jisc has also launched national aggregation services, IRUS-UK, IRUS-ANZ and most recently IRUS-US, which provide download statistics for content within a repository within the UK, Australia, New Zealand and the US. ‘We’re excited to grow our portfolio of IRUS services because institutional repositories can now look at their usage at the national level as well as benchmark at the international level,’ says Bal.
Indeed, Jisc has been quick to put together a table showing the use of coronavirus-related content in IRUS, featuring items from all three services. ‘We are already seeing some useful insights from comparing usage in this way,’ highlights Bal. ‘In working towards a global picture, we’ll have a better understanding of content usage, and the important role of institutional repositories in providing open access to content, across the current fragmented landscape.’ (See ‘Repositories far and wide’, below)
Coronavirus catch-all
But while the Covid-19 pandemic has highlighted how repositories provide an indispensable platform to collate and disseminate research swiftly, it has also flagged up an underlying need for screening.
Indeed, within a few weeks of the coronavirus outbreak, bioRxiv and medRxiv had enhanced screening procedures on manuscripts to weed out dubious research results.
As Figshare’s Hahnel highlights: ‘We’ve seen this with our ChemRxiv clients and have said they need to add a banner to such content, pointing out that it is not peer-reviewed. I believe that data as well as pre-prints fall into this category of research that needs to be published quickly but needs some level of checking.’
Indeed, Hahnel has spent the past year working with the US-based National Institutes of Health on a generalist data repository to store and re-use NIH-funded research data. Crucially, the repository has been curated by trained data librarians that check detail on, say, licensing and metadata, to ensure data aligns with the FAIR principles. And Hahnel thinks this low-level screening has made a huge difference.
‘Our State of Open Data reports indicates that many people like assistance while publishing data... and this NIH repository is providing a safe way to do this,’ he says.
‘Also, initial data [from the NIH] on content use and downloads shows that the impact of checked files is significantly higher [than files that haven’t been checked].’
So it would seem that even simple screening benefits repository content and users. Still, as Hahnel points out, going forward, who checks and who pays? ‘We’re in a transition period here, and the business model for this is still up in the air,’ he says. ‘But I think its going to have to involve human curation.’
So what now for repositories? Without a doubt, the community can expect to see more advances in discovery and impact as well as interoperability, data ingestion and re-use. As Bankier says: ‘We have the momentum and the change that is coming is good.’
At the same time, Jisc’s Bal firmly believes that the increasing number of repositories is raising the awareness of open access and open research, and this will continue.
‘Open access is widely seen as the norm, and this cultural shift is very important,’ she says.
‘To that end, we welcome the focus on open research in the UK Government’s Research and Development Roadmap, and remain committed to providing the research community with the systems, repositories and intelligence to truly reap the benefits of open research.’
And for repository user, Watts from WestminsterResearch, the rise in tools and open access has already delivered much-needed results. ‘At first [depositing content in] an institutional repository was a nice thing to do, but with open access this has become a compliance issue’ she says.
‘This has definitely led to better repositories.’
As worldwide demand for repositories rises, France-based MyScienceWork has just signed a deal with distributor, Vozbits, Mexico, to provide its open-source Polaris OS repository technology across Mexico, Chile and Colombia.
Polaris OS serves as an institutional and research data repository, as well as a multimedia archive and library management system. According to MyScienceWorks, the system allows users to create high-quality, robust and scalable repositories that support complex functions with little to no programming skills.
Jisc recently listed MyScienceWork as an official supplier of Research Outputs Repository Systems with Polaris OS. And the company is also a Phase 1 winner of OpenAIRE’s Open Innovation challenge to develop products linked to scholarly works, repositories and data management.
‘Latin America is a momentous focus for MyScienceWork,’ says Yann Mahé, managing director of MyScienceWork. ‘Our presence in Mexico shows our deep commitment to strengthen global innovation cooperation... [and provide] solutions to address open science matters and more specifically open access.’