The importance of research information citizenship

Simon Porter, VP of Research Futures at Digital Science

As VP of Research Futures at Digital Science, Simon Porter’s role is to help the company – and the wider ecosystem – anticipate how research is changing and what infrastructures will be required to support it. He focuses on the intersection of metadata, persistent identifiers, analytics, and emerging technologies like AI, and how these can create a more trustworthy and connected research environment.

“I began my career at the University of Melbourne, working for 15 years across the library, research office, and IT services. That experience showed me how research information actually moves – and where it breaks. 

In 2015, I started working for Digital Science, where I’ve worked on research knowledge graphs, PID benchmarking, institutional analytics, and data visualisation. Across all of it, my goal is the same: to improve the flow, quality, and utility of research information across the ecosystem.

Driving interest and changing perspectives

What motivates me is the recognition that even small improvements in research information can have system-wide impact. 

Early in my career I focused on how information within institutions could be optimised – breaking data out of silos (for instance HR, finance, research, and student systems). Over time, while the silos have become bigger (research communities, institutions, funders, and publishers) the deepest challenges have remained cultural, not technical. 

Interconnected information systems mean that we don’t just create information for ourselves and our immediate purposes – we also need to consider how the information that we create will be used by others.

Research information citizenship is the idea that organisations share a collective responsibility to steward accurate, persistent, open metadata that the whole ecosystem relies on. 

Good information doesn’t happen by accident; it requires intentional behaviour. What is fascinating is that research information citizenship is measurable in the metadata that we create in sources like Datacite and Crossref. 

Pressing challenges in a complex landscape

There are two challenges that stand out. Firstly, trust: as AI becomes embedded in research evaluation, decision-making and content production, organisations need clarity about data quality, provenance, and the assumptions behind analytical tools.

Secondly, capacity for stewardship: many organisations support PIDs and open metadata but lack the resourcing to sustain consistent, high-quality information practices.

These challenges are interconnected; without shared standards and shared responsibility, the research system risks becoming increasingly opaque at the very moment it needs greater transparency. 

Open data is essential, but it’s important to distinguish open from FAIR (Findable, Accessible, Interoperable, and Reusable). In research information infrastructure, we’re fortunate to have four key FAIR sources – Crossref, DataCite, ROR, and ORCID – which provide authoritative, community-governed metadata.

Algorithms, affiliations and authority

Whenever we look beyond these, we inevitably introduce algorithmically generated metadata such as inferred researcher identities or affiliation statements. This data can be extremely useful, but it should never be treated with the same authority as the original FAIR sources, regardless of whether it is openly licensed. 

Open data is not automatically FAIR data, and we must be mindful of how algorithms introduce bias, error, and structural blind spots into our analyses.

When choosing a platform, organisations should also consider where their effort is being directed. Are they being asked to clean or correct data for a particular tool, or could that time be more usefully spent improving the quality of their own authoritative metadata in Crossref, DataCite, ORCID, or ROR?

The less time we spend “working for algorithms,” and the more time we spend strengthening the shared research information commons, the better off the entire ecosystem becomes. The question isn’t open versus commercial, but how to use both in a sustainable, trustworthy way.

A connected and contextual approach

As open infrastructures like Crossref, DataCite, ORCID and ROR increasingly meet the community’s needs for high-quality foundational metadata, the innovation horizon naturally shifts. 

The 2010s were about building persistent identifier infrastructure; the 2020s have been about implementing it at scale. If current trends continue, it’s entirely plausible that by 2030 we’ll reach saturation for ORCID adoption among researchers and ROR adoption among institutions. (Even though ROR adoption is just beginning) 

With this improved foundation, the question is no longer “How do we get basic publication or affiliation metadata?” – we will have that, and it’s getting better every year. Instead, the question becomes: How does my organisation connect to the wider research world of funding, datasets, patents, policy, and collaboration?

Dimensions’ strength lies in providing this contextual layer: linking publications to grants, researchers, institutions, datasets, patents, and policy contributions. This enables organisations to see how their work participates in – and is shaped by – broader scientific, social, and economic systems.

In a world where core metadata becomes increasingly standardised and open, strategic insight comes from understanding relationships, pathways, and context. Dimensions provides that connective tissue, helping organisations see not just what they produce, but how it fits into the larger research landscape.

A question of trust

A great deal of work behind the scenes goes into ensuring that Dimensions provides a trustworthy representation of the research landscape. 

Although the future of researcher identification is clearly ORCID (and the sooner the better!), today we still need to rely on researcher disambiguation algorithms to join publications and outputs correctly. The design of these algorithms matters. Dimensions’ approach is built to support use cases in research integrity and research security, where the consequences of incorrectly joining records can be more damaging than keeping them separate.

While no algorithm can identify all researchers perfectly, reducing the likelihood that an individual is misidentified or incorrectly associated with inappropriate or undesirable activity is essential in today’s research environment. Dimensions therefore prioritises conservative, precision-focused matching to minimise false joins.

Another important aspect is classification. Instead of imposing a proprietary taxonomy, Dimensions uses externally defined research classification schemes – such as Fields of Research, Units of Assessment, and the Sustainable Development Goals. This makes Dimensions’ analytics far more interoperable with broader sector analysis and ensures that insights can be aligned with institutional, national, and international reporting frameworks.

AI enhances both the extraction of information and the speed at which insights can be delivered. In Dimensions, AI helps classify content, infer relationships, detect emerging topics, and improve disambiguation.

The real opportunity for Dimensions lies in explainable, responsible AI built on a strong knowledge graph. Because Dimensions links grants, publications, datasets, patents, and policy, AI can be applied over structured relationships rather than isolated records, producing deeper, more contextual insights.

The power of community

What excites me most about Dimensions’ future is not the technology itself, but the community that is forming around it and the way Digital Science continues to reassess how we can play a constructive role in the broader research information ecosystem. This ties directly back to my earlier point about research information citizenship: the real progress happens when organisations work together to strengthen shared infrastructures and make high-quality metadata more accessible to everyone.

In that spirit, I’m particularly proud of two recent initiatives. The first is the reintroduction of our Scientometric Access to Data Program, which streamlines access to Dimensions data for Researchers through Google BigQuery. The second is our work with organisations like ORCID to make their data more readily analysable in the same environment. 

These kinds of collaborations – grounded in transparency, interoperability, and shared responsibility – are what will shape the future of Dimensions and, more importantly, strengthen the research information commons as a whole.

Keep up to date with all the latest industry news and analysis – SUBSCRIBE to the Research Information Newsline!

Back to top