How do we improve peer review?
Peer review has evolved over the past 300 years to become the bedrock of today’s scholarly publishing system. When a researcher downloads an article from a reputable peer-reviewed journal, it is typically approached with a level of trust – the reader believes that it will have been validated by an expert in the same field as the original author.
Peer review is not perfect, and most scholars will have some personal experience of its limitations. Nevertheless, it is generally considered the best way we currently have to ensure the validity of research.
As Chris Graf, Research Integrity Director at Springer Nature, explained, it is also an essential part of putting the research community at the heart of research publishing: “The way we do research publishing is built around peer review. The fact that it’s done by members of the scientific community who are researchers, for the authors of the papers who are researchers, and governed by editors who are also researchers, puts the research community right at the heart of everything we do as research publishers.”
The challenges of peer review
The problem, however, is that peer review is a process designed for small scholarly networks in an analogue age and it has struggled to meet the challenges of a global system with increasingly digital capabilities.
The two main challenges, according to Coromoto Power Febres, Research Integrity Lead at Emerald Publishing, are sourcing enough peer reviewers and ensuring integrity of the published works: “The scale at which we produce knowledge is far greater than the system was designed for. As the number of articles that are produced increases, the number of humans that are able to review these articles is not increasing at the same rate. There is a lag between producing a person and producing an article, so there are pressures on a limited resource, which is what the reviewers are, who already have inordinate pressures upon their time. So it’s increasingly difficult to actually source reviewers and find the right number of people to look at the manuscripts.”
Graf said this problem is compounded by an ever-increasing specialisation and hyper-specialisation of research, limiting the pool of peer reviewers and contributing to reviewer fatigue.
The challenge of growth in the system has also been accentuated by those acting in bad faith, most noticeably with the emergence of ‘paper mills’, creators of fake science papers to order. While there have always been occasional rogue researchers, it now seems to be more systematised and more difficult to detect. As Power Febres explained: “You will have seen the considerable number of retractions that publishers across the board have done, in which paper mills have succeeded in manuscripts reaching publication through peer review manipulation.
“Peer review has been manipulated more than we had ever anticipated, and has become something that can be used in misconduct. So peer review manipulation and how to identify it, how to verify the identity of people who review a manuscript – all of those concerns are ever-growing. That is connected to the issue of scale, because the bigger the number of people that you are dealing with, the more difficult that is to do.”
In an age when scholarly networks are global, ties are weak, and promotions may be reliant on researchers publishing a certain number of academic papers, the system is extremely vulnerable to manipulation. As Graf explained: “We start from a position of trust, but paper mills exploit that trust. They turn a profit by stealing identities, by selling authorship credit on papers to authors, and manipulate our publishing processes so that they can ensure that those papers – fake science – is published. We can’t tolerate that. There’s certainly been thousands of probable paper mill papers retracted in the last few years, but we’ve published many millions of papers over the last few years, so it’s an important problem, but it’s important to recognise the scale of it as well.”
Is the solution technological or human?
With many of the problems of fake science enabled by technological innovations, it is reasonable to ask whether new technologies may also play a role in solving the problems. According to Graf, there is the potential for technical innovations throughout the peer review process: “We have technology that is getting quite good at detecting artificially generated text in papers. It’s not foolproof, currently anyway, but it means that we are able to withdraw papers before they even get presented to an editor, and then presented onwards to peer reviewers, thus reducing the burden on the whole system.
“One area that’s commonly discussed is image integrity. Western blots are easy to manipulate in an unsophisticated way, and researchers or paper mills might do that to beautify the data that is contained within a picture of a western blot for presentation in a journal, or to fake it in the first place. There’s the opportunity to use technology to interrogate those images and to identify the overlapping regions within the image, the scrub marks that have been left by the use of an eraser tool in Photoshop, for example, and we’re able to spot the fact an image has been manipulated.”
The potential for even greater technical augmentations is also being enabled by new powerful artificial intelligence tools such as ChatGPT.
Graf said: “Right now I’m particularly excited with how we might use large language models with the right carefully decided prompt engineering to support peer review and editorial decision-making, for technical checks and fact-checking. I’m also excited about how translation engines used in the middle of the peer review publishing process might enable peer reviewers to peer review in their
own language.
“What happens if I’m able to extend an invitation to peer reviewers in their own language and to provide, via a translation engine, a good enough translation of the submitted paper? They will also be allowed to submit their comments in their mother tongue and we will repackage that back to the editorial office to help the editor make a final decision in the language that the editors are comfortable with. It opens up the ability to diversify who we ask to do peer review. The reviewer fatigue is countered by that because we’re tapping into new researchers who we don’t often ask to peer review, and we’re doing it in such a way that they are likely to say yes.”
Graf said publishers are investing heavily in augmenting peer review to understand which papers should even get to peer review, and enable editors to make decisions with augmented peer review reports.
“Yes, the peer reviewer reports, but also large language model reports might be useful to give the editor a fuller picture of the piece of research and enable the editor to convey that feedback to the author and give them more value and more feedback on their piece of work.”
As tools get better for identifying manipulation, similar technologies are also likely to evolve for evading those tools, potentially leading to an arms race between paper mills and publishers. Therefore, people are likely to continue to be at the heart of the review process, and it is not surprising that the one technology identified by both Graf and Power Febres was one for the identification of peer reviewers, expanding the networks of peer reviewers and helping peer reviewers to be asked to review papers they want to review.
However, once reviewers have been found there is still a lot to be done to improve the peer review process, to improve the quality of the review and ensure that appropriate credit is given.
As Power Febres explained: “The purpose of reviewing is to improve the work; it’s to help the authors make their work into something that serves academia, society and knowledge production as much as possible. There is a need to learn how to do that constructively. Universities need to teach people how to review. There’s so much institutional bias, from an academic perspective, in how you approach this because it’s just how you’ve been schooled within your institution, and that really changes from one place to another, both as far as institutions within a country and different approaches in different countries.
“In an ideal world, it would be wonderful if credit was given to people who conduct peer review. If academic institutions started to take on board, and somehow quantify, the contribution that is made through that, as opposed to just focusing on published outputs. That will incentivise and create a more valid reciprocity, which was there perhaps in the past when the system was much smaller. Then it would be much easier for people to review and to have time to do it. The fact that you can’t validate these things because you can’t really quantify how many reviews someone has done if their names are not attached to it will mean there are greater moves to open reviewing”
Open transparency is one of the many suggested improvements that can be made to the system that are not necessarily new, but, as Graf suggested, they may not have been fully exploited yet: “Many innovations maybe don’t feel like innovations anymore, but I don’t believe their use has been fully explored yet, including the use of double-blind peer review and transparent peer review. I’d like to see some of the older innovations become more established in the future, so we can really figure out whether they do provide a benefit or actually whether they are a distraction and are taking away from other things we need to innovate in peer review. For example, peer review as a discussion with a group of authors instead of being a linear process where a paper is sent to a peer reviewer and a peer reviewer offers a report and the editor makes a decision and then that decision is communicated to the author.”
Conclusion
Peer review will have to adapt to meet the challenges of the 21st century, and while it seems likely that will be a combination of technological augmentation and changes to how peer reviewers are treated, it may also be a change to our expectations of peer review. As Graf put it, “is it really realistic to ask peer reviewers to detect fraudulent or problematic research? And is it really realistic to expect them to address concerns about reproducibility alone?”
It seems likely that peer reviewers will continue to be, as Power Febres put it, “the unsung heroes of academia”, but as she went on to say, “we’ve reached the point whereby we’ve exhausted human hours and just the sheer number of people and what they have to do that we need to start considering that process, and not just how we got to the end result.”
Having researchers at the heart of research publishing is an essential part of the trust in the publishing system, and if we lose that, the system that emerges would not only look very different to what we have today, but may be far less reliable. The good news, however, is that plenty of ways have been identified for peer review to adapt to meet the challenges on the horizon.