From access to answers: knowledge-as-a-service

Steve Smith explains why publishers must now become “curators of context” in the compute age
For most of the past two decades, the rallying cry in scholarly publishing has been access. We built platforms that opened paywalls, digitised archives, and democratised discovery. “Access to content” became the moral and economic centre of the industry, as a shorthand for fairness, visibility, and impact.
But we’re entering a world where access alone is no longer enough. Researchers, corporations, and AI systems now query the scientific record in volumes that would have seemed absurd a decade ago. ChatGPT answered 100 million queries in its first two months; Google’s AI Overviews now summarise search results without sending users to publisher sites. The question is shifting from Can I read it? to Can I use it? And in that shift, the locus of value moves away from content itself and toward what we might call computable knowledge: structured, linked, context-rich information that can be trusted and reused at scale.
This is a shift from “selling access to content” to “selling access to answers.” Publishers have always been in the knowledge business, but only now is the market demanding that knowledge be machine-readable, provably licensed, and contextually intact. The opportunity is not to supply another data dump but to offer a stable, high-trust layer of verified connections between images, captions, and paragraphs, and between datasets, methods, and outcomes.
What’s emerging is a knowledge-as-a-service model: a way of delivering the distilled, structured meaning of research, not just its formatted outputs. If access was the defining achievement of the last publishing era, knowledge may well be the defining opportunity of the next.
The compute layer: publishers already know how to do this
If “access” was the watchword of the open-science era, the emerging layer is “compute”. A growing number of publishers already enable analysis inside their platforms without users ever exporting a file. Gale’s Digital Scholar Lab, JSTOR’s Constellate, ProQuest’s TDM Studio, and Springer Nature’s SN SciGraph all let researchers run Python or R notebooks directly against licensed content. These environments are, in effect, compute APIs with a user interface, places where the code travels to the content rather than the other way around.
That model matters because it shows how publishers can remain both relevant and compliant in a post-download world. Universities and corporations pay for these platforms precisely because they remove the legal and technical friction of “bulk export”. They provide safe, rights-controlled sandboxes where analysis can happen in situ. Emerging APIs can follow the same logic: allow structured queries, analytics, and model training within a trusted environment, preserving provenance and credit while still enabling discovery at scale.
But here’s the thing: compute, on its own, is still infrastructure. It’s necessary but not sufficient. The real differentiator will be what sits on top of it, the layer we might describe as “access to answers”.
Consider what an engineer building a neural network actually needs. They don’t care which journal a circuit diagram appeared in; they care that the topology is verified, traceable, and linked to the equations that govern it. They don’t need a PDF; they need a knowledge object: image + caption + equations + methods + provenance, all machine-addressable and rights-cleared.
That’s the Knowledge-as-a-Service opportunity, to deliver distilled, machine-actionable insight rather than static documents. Some publishers are already experimenting with “AI-optimised versions of research”, repackaging content for algorithmic consumption. That’s a step in the right direction, but it still treats knowledge as a better-formatted object. The bigger leap is to treat it as a linked, queryable network, where images remain connected to captions, methods can be interrogated (not just read), and provenance is baked in.
In that sense, compute is not the endgame but the bridge to knowledge. The technology exists. The business appetite exists. (Look at the deals AI companies are striking with publishers for training data.) The question now is whether publishers will stop at hosting analysis environments or move up the stack, turning those same environments into trusted knowledge layers that deliver answers, context, and meaning on demand.
Publishers’ emerging role: curators of context
If the first digital revolution was about digitisation and access, and the second about computation, then the next will be about context. Machines can now find almost anything; what they still can’t do reliably is understand it without help. That’s where publishers come back into the story.
Publishers are often caricatured as intermediaries, like toll collectors on the information highway. But their real strength has always been curation, deciding what’s worth attention and preserving the connective tissue that makes research interpretable. A paper, after all, is more than text. It embeds provenance, peer assessment, and metadata links that tell you who funded the work, which methods were used, what data support the figures, and how the results align with prior findings. That network of relationships is the part most at risk when content is disassembled into tokens for model training or indexed fragments in a vector database.
The value lies in linkage and context, images plus captions plus paragraphs plus methods plus data. That’s the unit of scientific meaning, and it’s precisely the layer that gets lost when documents are stripped down to training data. In an AI-driven research economy, that contextual integrity becomes the differentiator. The models that will matter most are not trained on more data but on better-linked knowledge.
This is where publishers have an opening to reposition themselves, not as providers of PDFs or even APIs, but as custodians of verified relationships. They already manage trusted identifiers (DOIs, ORCIDs, RORs, grant IDs). They already invest in metadata quality, editorial validation, and peer-review governance. What’s missing is the reframing, to see these not as cost centres but as assets in a knowledge infrastructure that others will soon depend on.
Imagine a future where a publisher’s API doesn’t just deliver an image or a paragraph but also tells you how confident you can be in its provenance. Where machine-readable statements capture which data and methods underlie each result. Where every figure carries a miniature supply chain of trust, such as CRediT roles, data availability statements, replication codes, or peer review history. That’s what knowledge-as-a-service could look like in practice: a shift from selling access to distributing trusted, interpretable knowledge objects that machines and humans alike can reason over.
Imagine a materials scientist training a model to predict crystal structures. Today, they scrape figures from PDFs, extract captions with OCR (often incorrectly), and hope the metadata is accurate. In a knowledge-as-a-service world, they query an API that returns the crystal structure image, its caption, the experimental conditions from the methods section, links to the underlying dataset, and a confidence score based on peer review status and data availability. The publisher isn’t just supplying a document; it’s supplying verified, reusable knowledge.
The future isn’t about more compute power or more content. It’s about the intelligent connective tissue that makes research findable, usable, and believable. And publishers, almost uniquely, already own that tissue. They just need to recognize it as their next business model.
Risks, requirements, and the road ahead
None of this will happen automatically. If publishers want to inhabit this next layer of value, they’ll need to make some deliberate choices, not just technical ones, but cultural.
The first requirement is interoperability. A knowledge-as-a-service ecosystem can only work if publishers adopt compatible schemas and metadata frameworks. If every “knowledge API” speaks its own dialect, the result will be fragmentation, not value. Standards bodies like Crossref, DataCite, and NISO will have to play a coordinating role, and publishers will need to see metadata quality not as housekeeping but as shared infrastructure. This means investing in structured abstracts, machine-readable data availability statements, standardised figure annotations, and persistent identifiers for everything; not just articles, but figures, datasets, code repositories, and reagents.
The second is trust governance. Once machines become primary users of content, the notion of “trust” shifts from brand reputation to provenance verification. Technologies like C2PA (content authenticity standards), blockchain attestations, and persistent identifiers can record the chain of custody for research objects, but governance still matters. Publishers will need clear policies about what’s in scope for machine consumption, how data are updated or revoked, and how attribution is enforced downstream. In this new market, a publisher’s most valuable asset may be the integrity of its metadata, not the exclusivity of its content.
The third requirement is transparency in pricing and rights. As data licensing evolves, new models will need to be both machine-addressable and legally predictable. That means clearly defined tiers: access for humans, compute for analysis, knowledge for decision support, each with auditable use rights and predictable costs. The goal isn’t to meter curiosity but to make trust and clarity the product that customers pay for. Rights metadata needs to be as structured and queryable as the content itself.
The existential risk is real. Move too slowly, and external aggregators, such as Google, OpenAI, Elsevier’s Scopus AI, will capture the “knowledge layer” first, abstracting away the publishers who created it. We’ve seen this movie before: Google Scholar aggregated citation networks that publishers built; Sci-Hub captured user loyalty by solving an access problem publishers were slow to address. The knowledge-as-a-service opportunity could follow the same arc if publishers don’t act.
Move too quickly, and we risk building proprietary silos that recreate the very access barriers we spent two decades dismantling. The balance will lie in open standards, transparent APIs, and business models that reward stewardship rather than enclosure. This is not an argument for open-washing or giving away the store; it’s an argument for building interoperable infrastructure that creates more value by being connected than by being locked down.
There’s also a strategic tension publishers must navigate: licensing content to AI companies for training (short-term revenue) versus building knowledge services that make publishers indispensable intermediaries (long-term strategic positioning). The former is transactional; the latter is structural. Publishers need both, but the mix matters. Sell too much training data too cheaply, and you’ve commoditised yourself. Invest in the knowledge layer, and you’ve created a moat.
The choice ahead
The last era of publishing was about opening doors to readers, authors, and data. The next will be about connecting rooms: linking people, machines, and meaning across a trusted infrastructure. “Access” brought information to everyone. “Compute” made it analysable. “Knowledge” will make it interpretable.
Publishers can either watch that stack build around them or take the lead in defining it. The choice, as always, comes down to whether we see ourselves as suppliers of content or as custodians of understanding. In an age of infinite data, the most valuable thing left to sell and to safeguard may be trust itself.
The infrastructure is already being built. The question is who controls the metadata, who sets the standards, and who captures the value. Publishers have a brief window to claim this territory before others do. The Access era taught us that openness wins. The Compute era is teaching us that context matters. The Knowledge era will reward whoever makes context computable.
Steven D Smith, DPhil, is the founder of STEM Knowledge Partners and an independent consultant
