Silent, yet indispensable

Share this on social media:

Andrea Chiarelli explains how identifiers, metadata and shared infrastructures bolster research integrity

With the Retraction Watch database hitting almost 50,000 items and an ever-growing number of high-profile research misconduct cases, questions around the veracity of published research findings are becoming uncomfortably common. Verifying the accuracy of published research is a meticulous and rigorous process. Today, this requires painstaking and mostly manual work, although technology is rapidly coming to the rescue: following decades of digital research infrastructure development, we can start to imagine a future where humans and machines can work together to validate the public record and help corroborate its trustworthiness.

Away from the spotlight of groundbreaking discoveries and intellectual discourse lie some of the unsung heroes of research: persistent identifiers, metadata and shared infrastructures. These silent forces can bolster research integrity, ensuring traceability and attribution, discouraging plagiarism and promoting responsible research and innovation practices.

In this article, we highlight how the tools we have grown accustomed to are quietly underpinning research integrity and helping to foster a culture of transparency and trust.

Enhancing accessibility, credibility and transparency in scholarly communication

Persistent identifiers are indispensable in scholarly communication, and examples include Digital Object Identifiers (DOIs) for research objects, ORCID iDs for individuals, International Standard Name Identifier (ISNIs) for contributors to creative works and those active in their distribution and Research Organizations Registry (ROR) or Ringgold IDs for organisations. Today, DOIs are the most widespread form of persistent identifier in the academic publishing landscape and serve as permanent markers for digital resources: they connect users and machines to content, no matter where this is physically hosted, and they ensure its availability regardless of location changes due to platform shifts or system architecture updates. This not only supports the long-term preservation of research objects but also their future accessibility.

Meanwhile, identifiers for individuals and organizations help in augmenting the transparency and credibility of published works, for example by flagging that an output is linked with authors from one or more research organisations or has been supported by a specific funder. The use of identifiers for peer reviewers has historically been less common but is starting to rise up the agenda in light of growing numbers of misconduct cases involving peer review manipulation. As Kim Eggleton – Head of Peer Review & Research Integrity, IOP Publishing puts it: “Our industry has been built on trust, but with paper mills now a sad reality for all of us, we can't continue to rely on trust alone. We need a better record of what was created, how and by whom, plus how it has changed over time.”

Metadata complements the use of persistent identifiers and provides the necessary context and understanding for the reuse of digital research objects over time. It contains insights into how a given research object was obtained or generated and facilitates its correct interpretation and meaningful application. In some cases, scholarly metadata is also used to describe each author’s specific contribution to the output, for example through CRediT (Contributor Roles Taxonomy), which can further characterize the provenance of a research object.

Independent consultant Haseeb Irfanullah said: “As an editor, it is very difficult to identify whether all the authors have contributed to an article. Even if I asked them directly, how could I check?”

Shared scholarly infrastructures can take a variety of shapes, but their main role is to act as a bridge between persistent identifiers, metadata, and research objects. Metadata flowing between digital infrastructures can also communicate and surface information on retractions, expressions of concern or correction notices, which are essential to understand when and why the scholarly record has been modified.

Additionally, enhancing the connectivity between research objects and metadata across different systems can also have a positive impact on reproducibility, meaning the ability to duplicate the results of a study using the same methods and data as were used in the original investigation. If a researcher can not only access research objects – a matter of discoverability and openness – but also understand them – which stems from complete and accurate metadata – they will be far better placed to try and reproduce the results in question.

Hylke Koers - Chief Information Officer, STM Solutions, said: “Well-executed infrastructure has the potential to help researchers find, access, quote and reuse information that is scattered across the web and different content sources, in a way that is easy to navigate and interoperable.”

From mitigation to prevention: the importance of trust and interoperability

Interoperability is the ability of different systems, devices, applications and products to connect and communicate in a coordinated way, without effort from the end user. In the case of scholarly communication infrastructures, this can be enabled by adopting identifiers that describe individuals, organisations and research objects. With different infrastructures working in unison, regardless of data provenance, we can promote a more integrated digital ecosystem where suspicious behaviors or information can be surfaced and brought to the attention of research stakeholders.

Laura Cox, Senior Director, Publishing Industry Data, at Copyright Clearance Center, said: “Interoperable metadata attached to research objects can be used to inform the detection of suspicious patterns and to develop early warning signs of potential misconduct.”

In this context, the reliability and trustworthiness of (meta)data should not be overlooked. It is not enough for information to be complete and interoperable; it must also be accurate and reliable. The challenge for research integrity professionals lies in discerning what information is reliable and which is not. In practice, poorly collected or incorrectly linked data can lead to misleading assumptions and conclusions, causing more harm than good.

Sabina Alam, Director of Publishing Ethics and Integrity, Taylor and Francis Group, explained: “As part of research integrity checks, we may find the same ethical approval or funding reference number in absolutely unconnected papers. And this is difficult to address because verifying whether these are even real, in a quick and scalable way, is not always possible.”

Tools based on artificial intelligence can leverage high-quality and trusted (meta)data to facilitate human decision-making, for example in emerging use cases such as the automated detection of research outputs associated with paper mills. Thanks to advancements in machine learning and knowledge graphs, exemplified by initiatives like the Open Research Knowledge Graph, and increased information sharing within the sector through platforms like the STM Integrity Hub, we are witnessing fast-improving capabilities to automatically identify and scrutinise potential misconduct on a scale that was previously unattainable.

Recognising the power of trusted persistent identifiers, metadata and shared infrastructures

Metadata, persistent identifiers and robust digital infrastructures, though not always visible on the surface, form the bedrock of research integrity. As discussed above, however, taking action around research integrity also requires metadata and infrastructures that can be trusted, plus an extent of collaboration between scholarly actors.

To maximise the potential of these silent enablers of research integrity, all research stakeholders can take several steps:

  • Invest in robust and trusted systems which use data quality frameworks and metrics, supported by validation to manage and standardise metadata and PIDs, ensuring consistency and ease of data exchange;
  • Prioritise training and education around the importance of these elements, including highlighting their impact on research integrity; and
  • exEchange knowledge and best practices with peer organizations, championing the creation and use of complete and interoperable information and tearing down barriers to sharing and reuse.

The rising complexity of the scholarly ecosystem, coupled with the transformative potential of emerging technologies such as artificial intelligence, demand a clear commitment for robust metadata, persistent identifiers and digital infrastructures from all individuals and organisations involved in research. The steps we take today to invest in and enhance these foundational elements will shape the future integrity and trustworthiness of our collective knowledge and pave the way for advancements in understanding and improving our world.

Dr Andrea Chiarelli is a Principal Consultant at Research Consulting, a mission-driven business working to improve the effectiveness and impact of research and scholarly communication