The development of electronic publishing, and the insistent demands from funders, are fuelling an interest in new metrics for research evaluation. But measuring the quality of science is still a difficult task as Tom Wilkie heard at the STM Innovations seminar in London last month
The whole of science has become ‘Big’ in the half century since the publication, in 1963, of Derek de Solla Price’s prescient book Little Science, Big Science. A consequence is that scientists no longer have untrammelled academic freedom to manage their research teams but have to justify their expenditure to politicians and funders.
De Solla Price’s second contribution to the field – scientometrics – has become a central management tool in that effort to monitor scientific quality. Yet the main measure – bibliometrics – may not be up to the task. New metrics are needed – especially with the rise in electronic publishing – but most candidate altmetrics are not yet fully developed. Innovation Impacts in STM: New Metrics, Big Data, Cool Apps, STM’s Innovations Seminar held in London on 7 December 2012, spent most of the day exploring the potential of new metrics in science and technology.
In his keynote address, Poul Wouters, professor of scientometrics and director of the Centre for Science and Technology Studies at Leiden University, argued that scientometrics occupied a position ‘between narcissism and control’. Speaking by video link from the Netherlands (because the weather had made it impossible for him to get to the UK and present his paper in person) he explained that scientometrics acts as a mirror to researchers, reflecting back to them what they have been doing. In this sense, it is a technology of narcissism, he said. Bibliometrics has forward-looking functions, as it can discern new developments in science, such as the emergence of new topics and the growth (and decline) of patterns of collaboration.
But bibliometrics is also a technology of control, he warned. Used retrospectively, funders and others could try to evaluate the impact of the research that has been done. Increasingly, he pointed out, those with their hands on the purse strings are interested not just in the scientific but also the wider social impact of a research publication.
In some countries, the linkage between quality and external control is explicit, according to Gianluca Setti from the University of Ferrara, Italy, who is also IEEE vice president for publication services and products. The Chinese Government pays scientists for publishing in journals that have a high impact factor, he said. But this carries the danger, he continued, that the measure – high impact factor – becomes the target and so ceases to be a measure. Measures of scientific quality should be immune to external manipulation but impact factor is liable to manipulation – for example by excessive self-citation. The main problem, he said is that true quality cannot be measured and we only have indirect observables of quality. The point then is not to use a single indicator but several and to correlate them.
This point was taken up by Peter Shepherd, director of COUNTER, the not-for-profit international organisation whose mission is to improve the quality and reliability of online usage statistics. He presented a new metric for journal impact: usage factor. ‘Any metric can be gamed, whether it’s citations or usage,’ he said. There is safety in using a range of measures, he argued, because ‘it is difficult to game all of them across the board.’
From the results so far, there is a very weak correlation between citation and usage, Shepherd noted, as one measures author and the other reader behaviour. COUNTER started its usage factor project in 2007 and it is now in its final stages. It initially covered 326 journals in five areas, with more than 150,000 articles. The project uses the median number of views rather than the arithmetic mean because some articles tend to be very frequently viewed and this would distort the picture if the arithmetic mean were used; the counting period goes back two years.
Publishers report their own usage factors. To audit these statistics, COUNTER has been working with the ABC, the UK-based circulation audit body, which has extensive experience in auditing traffic to websites of newspapers, magazines, and other commercial publications. One issue is to exclude from the statistics automated access to sites from entities such as web bots. There will be additional costs to publishers, he conceded, but they have to weigh up the benefit of having additional measures of impact.
Institutional repositories are excluded from the COUNTER scheme because only peer-reviewed publications that have been accepted for publication are considered countable. ‘If institutional repositories want to be treated as publishers then they have to start acting as publishers,’ he said. One particular issue is that institutional repositories would have to spend money setting up systems to filter out the automated web spiders.
Wouters told the meeting that his team had evaluated 16 tools but that most of them are not yet currently useful. In his view, altmetrics are not yet suitable for most research assessments. But it would be a mistake, he said, to do nothing at present, as there is an urgent need to promote additional metrics. He said that citation data, for example, did not yield a fair assessment in the social sciences and humanities.
The best metrics should fulfil three criteria that are important in measuring impact, he continued. The measure should be scalable – that is it should be possible to aggregate to the level of research groups. It should be transparent in data management – the sort of data that is being used should be transparent to researchers who should be able to argue back to the evaluators – and the measure should allow for normalisation and context. If someone is interested in the social impact of a piece of research, he pointed out, citation data is useless for this purpose.