Making the most of metadata

13 June 2014

What have you been proudest of with CrossRef?
I’ve probably been proudest of CrossRef’s impact on scholarly communication generally. We have built up lots of credibility and reputation for innovation in providing this collaborative infrastructure.

I have been amazed at how far the basic CrossRef service of providing persistent identification and reference links has extended. We started with journals but CrossRef now includes half a million book titles, as well as conference proceedings, technical reports, standards and data. Today we have 67 million content items.

We’ve also seen a huge growth in membership, with around 5,000 organisations in 100 different countries. We’ve seen huge growth in Brazil, Turkey, Asia and other places. Two thirds of our members are in our lowest fee category, and a majority are non-profit.

CrossRef has consciously been neutral in the battles about, for example, business models. We are an association of scholarly publishers but that is a very broad church, including universities, libraries, academics and pure open-access publishers as well as commercial subscription publishers.

What have been the challenges?
Along the way there have been lots of challenges. One is the collaborative aspect of CrossRef – getting people to work together and agree common rules. There are many different interests and a huge range of publishers involved. It takes a lot of time, patience and discussion.

It is sometimes frustrating that we haven’t been able to move as fast as we could have done. For example, in 2002 we trialled a full-text search tool called CrossRef Search. The pilot worked really well but we couldn’t get consensus to launch a full service – and then a few months after that Google Scholar launched. However, this helped us to focus. We’ve never really seen ourselves as a discovery tool and could have been very distracted if we had done end user discovery. Our role is as a hub for metadata, validation and other areas that are valuable to scholarly publishers.

What changes have you seen?
Publishers’ attitudes to metadata have changed. When we started it was difficult to get publishers to even give us article titles because they saw this as a threat. The amount of information we collect has really grown. Publishers today are expected to be more open and explicit about things like peer review.

Funders are becoming important stakeholders and driving changes through mandates. In the past there was no way to track funding information but FundRef has enabled publishers to capture this information.

We are also now capturing licensing and text and data mining (TDM) information, as well as the information to power CHORUS and SHARE [two alternative approaches to meeting the requirements of US open-access funding mandates]. However, we still have a long way to go in getting publishers to submit funding and licensing metadata.

Our TDM solution extends what we have done with funders to licensing information. Our focus has been on making it easier and more efficient for researchers to do TDM and helping them to get access to the information they subscribe to. The big thing over the next year or so will be rolling out the TDM solution.

What else are you working on?
We are doing a lot more international outreach and we are, for example, running workshops in India and other parts of Asia.

We are also looking at persistence of less formal scholarly communication. Many URLs are already cited in papers and those links can break. If something is cited we need to make sure that it persists. As part of this initiative we are looking at better integration with Wikipedia.

Things like bit.ly are very useful but they are not great if you are talking about persistent linking and tracking. There is a security concern: will they last? Are they archived? There is a short DOI service but we haven’t really pushed it much.

Where publishers have supplementary data we can assign CrossRef DOIs. We also work very closely with DataCite and we have some joint services to make those connections. It is very important to have published literature connect to the data.

Versions of articles and content are a very big issue that we are trying to grapple with at the moment. Because of the new public access mandate in the USA people need to know whether something is an accepted manuscript or version of record. CrossMark, which is being rolled out with publishers now, helps with this, making it transparent.

We have an interesting project around connecting journal articles and clinical trials. We are looking at using CrossMark to link to clinical trials at different stages and looking at ways to deal with different versions. Things like EndNote and Mendeley will also be able to provide content status alerts via CrossMark.

This issue of versions is also important in dealing with corrections and retractions, which can be a challenge if articles are in a repository. Our goal is to get repositories to link using CrossRef DOIs, which they can do free of charge if used in an institutional repository.

We have done some surveys with researchers and found that they have pretty good awareness of DOIs but much less about CrossRef. DOIs can apply to anything but there is some confusion that it is a mark of quality – it isn’t. We are looking to apply the CrossRef branding more obviously with our services.