Text mining of subject archives will enable new facts to be discovered

Share this on social media:

The Wellcome Trust, which funds research in the life sciences and medicine, was the first research funder to make open access a condition of its grants. Robert Terry is the organisation's senior policy adviser.

What is your view of open access?

RT: We want the digital versions of papers to be available to all in an unrestricted way and for them to be available forever by putting it in an archive or institutional repository. Anyone who receives one of our grants has to put the digital versions of their published articles in PubMed Central (or in UK PubMed Central once it has been developed) on the day of publication or no later than six months after publication.

We are very strong on supporting subject-based archives. Science is conducted at a subject level. Archives such as PubMed Central introduce standardisation and ensure articles are in XML, which is an archive format. Once in the archive we can start tagging and making associations between, for example, PubMed Central and PubChem, another subject archive.

What is in PubMed Central today is a tiny proportion of the research that is done but it will become the norm to publish open access on the day of publication. Once a critical mass is reached, text mining will enable new facts to be discovered that would not be possible by humans, such as information about gene associations. Data meshing will also start to happen where, for example, you could look at associations between supermarket loyalty cards (to find out what people eat), their health records and gene make up. This will have a huge impact on public health.

Could institutional repositories do this?

RT: Institutional repositories will not deliver the same things as subject-based archives. They won't be standardised – even simple things like choosing a different way of writing dates will make it difficult to find things in the same journals. And in institutional repositories it is not always obvious what you will get back whereas searches in PubMed Central will only return peer-reviewed literature.

Having said that, I still think there is a need for institutional repositories. They showcase the work of the university; help it in research assessment exercises; help researchers to compile their own work for applying for promotions or grants; and provide a way of storing grey literature and teaching material.

What are the benefits of open access?

RT: Open access is better for research. Publishing research in journals worked very well in a paper-based format but people do not work like that now. Often now the first port of call for finding out about other research is a search engine. The subscription model fragments research information behind lots of different deals, different copyright rules and different formats.

Open access is also a more transparent system financially. We make funding available for researchers to publish in open-access journals. We are associated with around 4000 research papers in any one year and we give out around £400 million in grants in any one year. We have estimated that if all papers are open access and charged at £1000 then this would be about 1 per cent of our budget. If papers are charged at £2000 then it would be 2 per cent of our budget. However the reality would be less that that as every paper that mentions the Wellcome Trust generally mentions at least one other funding body so the costs could be shared. In addition, as open access builds up, institutions should start to see savings in their subscription costs.

In the past few years we have seen some increases in subscriptions of around 200 per cent. Authors give their research to publishers for free and do the peer reviewing but then might not be able to access the research. No single library can supply every journal that the researchers need so you get a Googleisation of science where researchers only know about the other research that they can access. Open-access publishing will drive prices down. Journals that want to charge a high fee will have to demonstrate that they add value.

Do other funding bodies agree with you?

RT: Over 120 funding bodies signed the Berlin declaration on open access. Few have announced a mandate or provided funding for open access so far but I think other funders will follow. The Arthritis Research Fund will have a similar policy to ours from January 2007 and we expect Research Councils UK to have a similar approach. We also work closely with the National Institutes of Health in the USA and it seems that they are also moving towards a mandate.

An additional benefit of open access is that it gives funding agencies access to the research they fund. Without this they have to pay additional fees to see these papers. This is a problem because the first step of evaluation is clearly identifying what has been funded.

How have publishers responded to your mandate?

RT: About 130 publishers are associated with research that the Wellcome Trust funds so it is quite a confusing picture. With the Joint Information Systems Committee (JISC) we are supporting an extension to the Romeo directory at Nottingham University so that researchers can find out which journals meet our open-access requirements.

Publishers' reactions have been variable. It is a huge change and all change creates a range of reactions. The ones that have been the most positive are those with open-access journals. There has been a mixed response from some of the commercial publishers but those that we have spoken to have been constructive. The model poses more problems for learned societies that might have just one journal as their only source of income. If they remain in the denial phase then their own predictions about going out of business are likely to come true. They need to look at other sources of income such as conferences and training courses and to start to offer open-access options.

What are the big challenges?

RT: The main challenge is the researchers themselves. There is a lot of passive inertia. They either don't know or don't care about open access. They publish work in journals but they don't even know how much the subscriptions cost. There is also a strong link between where researchers publish and their career prospects. We have lost sight of the value of research and become more interested in journal titles.

We have made this easier for the researchers by phasing the introduction of our mandate. From October 2005 the mandate only applied to new awards but it will apply to all grants from October 2006. Once that comes into force it will start to identify whether there are any tensions in the system.