Text mining promises huge benefits but copyright law can limit its use

Share this on social media:

A new JISC report argues that current copyright law limits the possibilities of text mining for unlocking science

When 1.5 million academic articles are published every year and two new pieces of research are uploaded to UK PubMed Central every single minute of the day, how can researchers possibly sift through, understand and make new discoveries from the torrent of data in their field? The simple answer is that they can't. But computers can.

This is the notion at the heart of text, or data, mining. Computers have an almost limitless capacity to “read” and can be programmed to analyse enormous datasets, identifying links, trends or patterns in data which reveal new scientific discoveries, create new products or services, develop new medicines more quickly and cost effectively, or provide huge cost savings around managing data.

Unfortunately, copyright law has not kept up with the pace of technological change. While the extraction of facts or individual words is not subject to copyright law (a human being copying a word or a fact from an article with a pen or pencil is perfectly free to do so), because a computer must make a copy of an entire in-copyright work in order to perform the same activity, the process of data mining becomes subject to copyright law. As a result, the availability of material for mining is limited – most text mining in the UK is based on open-access publications – and researchers face legal uncertainty as they negotiate a maze of licensing agreements. A case study by the Wellcome Trust found that a researcher could easily spend 62 per cent of their research time purely asking publishers for permission to text mine.

A new report from JISC offers a solution to this problem by advocating a copyright exemption for non-commercial research to support text mining and analytics, as proposed in the recent Hargreaves report.

According to Mark Walport, director of the Wellcome Trust, this proposal ‘is a complete no-brainer.’

He explained this view at a recent event about the topic: ‘This is scholarly research funded from the public purse, largely from taxpayer and philanthropic organisations. The taxpayer has the right to have maximum benefit extracted and that will only happen if there is maximum access to it.’

Others, within the commercial sector, agree. ‘There are about 7,000 diseases out there and we can cure about one per cent as an industry at the moment.  We're all patients at the end of the day and we need to discover medicines. That's the priority,’ commented Philip Ditchfield, manager of contracts and licensing at GlaxoSmithKline. ‘We're a very compliant industry and we want to work with publishers, not undermine their intellectual property. Publishers often say you can mine our content - you just have to ask us. That's very easy to say and very hard to achieve. It is like in the early days of motor cars when you were allowed to drive down the road but you had to have a man with a red flag running in front of you.’

While the move towards a more relaxed copyright environment for text mining would be helpful for the academic world, it also has implications for economies as a whole.

Walport observed about the UK, ‘Research is a global enterprise and UK research is known to be exceptionally good. Benefiting UK researchers is likely to have benefits for UK PLC. But the challenge is how effectively the research is translated into economic activity.’

Text mining a much more straightforward activity in some countries. It is permitted in Japan, for example, while academics and technology companies in the USA can assert fair use. And Diane McDonald, one of the authors of the JISC report, believes that there is a risk of new and innovative companies ore straightforward if it is too hard to do in their home countries. This has implications for jobs and revenue.

The answer, says McDonald, is not only to implement copyright exception but for policymakers to consider the evidence for market failure and issues of equity relating to text mining and current copyright law. She added that the higher education sector needs to collaborate with content publishers and service providers to explore potential new business models.