Why scientific content cannot be overlooked

Adam Churchill is Product Solutions Manager at Copyright Clearance Center

As AI adoption accelerates, content quality, usability, and permissions are now critical constraints, writes Adam Churchill

With the boom of AI, many organisations have invested heavily in tools, pilots, and other systems, but far less attention has been paid to the scientific and technical content on which those systems depend.

That is a mistake. In research contexts, one of the most real constraints on AI is often not adoption. It is whether the underlying content is sufficiently complete, usable, and has the relevant permissions to support meaningful outcomes.

This is where the conversation needs to become more rigorous. Scientific literature is not simply a large body of text waiting to be mined. It is a highly structured body of knowledge, shaped by methodology, evidence, and context. When that content is reduced to abstracts, distorted through conversion, or used without clear permission, the issue is not just inefficiency. It is a weakening of the informational foundation on which AI systems are expected to perform.

That problem tends to show up in three places.

The first is completeness.

In AI workflows, especially those designed for speed and scale, there is a tendency to rely on easy content, or what is most readily accessible. However, scientific and technical work is not afforded that luxury. For example, an abstract could inform an AI workflow. It’s easily accessible, but it only provides a high-level overview of the information contained in the article or research paper. The real substance and use of AI often lie deeper in the methods and nuanced framing that enable findings to be properly used.

That is critically important because AI systems are increasingly expected to do more than retrieve information. They are asked to help analyse evidence, identify patterns, and support decisions. All of which are tasks that depend on more than surface-level access. When organisations build systems that rely only on a limited view of the underlying research, they are not just reducing the depth of knowledge. They are increasing the risk of shallow or misleading outputs.

The second issue is usability.

A surprising amount of scientific knowledge still moves through formats designed for human reading rather than machine interpretation. PDFs remain essential to distribution, but they are far from ideal as raw material for AI workflows. Extracting and structuring content from PDFs can be messy, inconsistent, and lossy. Tables can break apart. Figures can become disconnected from the text that explains them. Formatting artifacts can distort meaning. Important distinctions can flatten in the conversion process.

This is often treated as a technical inconvenience. It is more than that. In research-driven environments, formatting problems can quickly become comprehension problems. And comprehension problems can become decision problems. If organisations want AI systems to operate dependably, they need to think more seriously about whether the content entering those systems is machine-usable while preserving its integrity.

The third issue is permissions.

In many organisations, access to scientific content is still conflated with the right to reuse it in AI systems. Those are not the same thing. Content may be publicly available for reading, reference, or accessible via specific subscriptions obtained by an organisation for its employees. However, access to view or to cite a work does not automatically include the rights needed to use the work in other ways, and depending on the intended use, additional licenses or permissions may be required. This includes the rights to input, summarise, or otherwise use works in connection with AI-related activities.

That distinction matters more as AI becomes embedded in routine workflows, particularly within corporate and academic settings. Researchers and analysts are already using AI tools to interact with third-party content in ways that go beyond traditional reading and citation. The permissions to use works in connection with a variety of AI-related activities may require rights to be acquired from multiple rights-holders and publishers. The result is not just legal uncertainty. It is operational friction. Organisations may want to scale AI use, but they are doing so on top of content environments that vary in licensing terms, fee structures, and administrative overhead costs. This makes it increasingly important for organisations to educate their researchers on the importance of copyright compliance and take steps to incorporate at every stage of their work, not just as an administrative afterthought.

Together, these three issues point to a larger question about the next phase of enterprise AI and whether organisations are truly prepared to treat scientific content as infrastructure rather than just information.

That means recognising that the value of scientific literature lies not only in the information it provides, but also in how it is structured, interpreted, and governed. It also requires an understanding that content quality is inseparable from context. And it means accepting that permissions are not a downstream detail, but part of what makes AI use feasible in the first place and on an ongoing basis.

For organisations working in research-heavy sectors, this is not a narrow publishing concern. It is a broader question about how knowledge is operationalized. If AI is to play a meaningful role in scientific and technical work, the surrounding content environment must support it. Otherwise, enterprises risk building sophisticated systems on incomplete, unstable, or unclear foundations.

The market has spent the last two years fixated on what AI can do. A more important question is now emerging: what kinds of content systems are organisations giving AI to work with?

In scientific and technical domains, that may be the question that matters most.

Adam Churchill is Product Solutions Manager at Copyright Clearance Center

Be first to read the lastest industry news and analysis! SUBSCRIBE to the Research Information Newsline!

Back to top