Granular stats drill deeper into content usage

28 April 2023

Libraries can get more granularity from COUNTER reports, thanks to a recently adopted analytics tag – while community-agreed standards help protect privacy, says Rob Scaysbrook

As a library or information manager for a large organisation, you know how important it is to understand the different types of usage of the licensed resources you invest in.

That said, understanding the different ways your users are making use of the content they are consuming can be tricky – even when you are using the COUNTER standard to help compare usage statistics in a consistent way. For example, if a group of postgraduate learners in a single faculty are making use of a B2B journal to inspire business case study ideas, that’s a high-value interaction with content – but it may be hard to capture in organisation-wide usage stats.

Conversely, if you have several corporate departments in a business who all rely on a technical news source, you might struggle to know how to allocate it fairly to internal budgets; it might be the R&D team rather than marketing, for example, who use it most. And while no statistic can be a substitute for your own expertise, it’s helpful to have some nuance, or granularity, in your usage data, so you can apply more evidence to your decision-making.

That’s why, at OpenAthens, we see value in a recently adopted data attribute which offers granular usage reporting for customers who use the COUNTER standard.

This attribute, known as eduPersonAnalyticsTag, has been adopted by the REFEDS group – which develops and adopts technical standards for federated single sign-on in the research and education community.

Adding value to federated access

The name of the new attribute might be a bit of a mouthful, but at OpenAthens, we believe it’s a useful addition to the reporting you can get via federated access. It works by building on one of the great strengths of federated access – namely that it’s you, the identity provider, who should remain in control of the data you share.

In other words, as a librarian or knowledge manager, you choose what information you release to publishers for specific purposes.This offers a double advantage: not only can you start to design your own reporting based on the group categories you choose; but importantly, you have control over privacy.

With federated access, the authentication data of individual users is not shared directly with publishers. Effectively, it’s your organisation that authenticates the user, as part of a network of trust.

If you want publishers to be able to provide you with more granular data about usage for the purposes of reporting, then you would need to share more information with them – but only to the degree that you allow. Indeed, to help preserve privacy, the REFEDS standard for the eduPersonAnalyticsTag attribute advises that that granularity should be “coarse enough to prevent unintentional identification of subjects”.

Data consistency is important too. So it’s also helpful that eduPersonAnalyticsTag is COUNTER-compliant – making it a consistent and comparable way to count the use of online resources.

If you use OpenAthens for reporting, then be aware that COUNTER reporting is different; while OpenAthens reports on activity at the moment users log in, COUNTER covers what happens after they gain access. But importantly, this COUNTER-compliant tag is supported by OpenAthens.

Piloted by Elsevier

In practice, the eduPersonAnalyticsTag has already been in a pilot phase, with “glowingly positive” feedback from customers so far, according to Nicolai Humphreys, senior product manager at Elsevier, the first publisher to deliver the new attribute via federated access. Elsevier began piloting it in April 2022 for its ScienceDirect service – using it with 10 customers, including both corporations and organisations in education and healthcare.

Two of the pilot customers were also customers of OpenAthens, and we partnered with Elsevier for the eduPersonAnalyticsTag project. A significant driver for the pilot, Humphreys says, was demand from users of licensed content: “We were getting customers saying: ‘We like the COUNTER reports, but we want to know more; we have requests about unpacking the data and making it more explicit to our needs.

“So we thought: what can we do to help them analyse their data better? Let’s enable the customer to set up the categories for their reporting, and we can ingest the information into our system and provide reports which they can then break down.”

COUNTER, he says, became the basis for that. The tag is presented, he says, as an extra column in a COUNTER-compliant report, letting customers break down data further – for example, by different departments or geographical locations. One pilot customer, he adds, decided to send both “department” and “country” information – and can now break down content usage using either field or both, while protecting the identity of individuals.

Other possible applications could be a cost code, an overseas campus or partner college, or some other identifier that’s meaningful to your organisation.

From Elsevier’s point of view, a further driver was the fact that the stats are being delivered via the system of federated access, based on the open SAML standard –which can save publishers time and effort on discussions with each organisation. Because federated access data is shared within a network of trust, libraries and publishers don’t have to negotiate individually with each other about the data they provide and receive reporting on; instead, libraries simply use a standard field within a standardised system.

This, says Meshna Koren, product manager II at Elsevier, is attractive to Elsevier as a publisher: “SAML is perfect for this because the identity provider can send any values they want; we just need to capture them and deposit them in usage reports.

“The fantastic thing is that we don’t need to know these values in advance, [or] negotiate anything; we simply agree on the attribute and format – and the customer can send multiple attributes, on any level of granularity relevant to them, as long as it’s privacy-preserving.”

Koren stresses that it’s up to identity providers to ensure that no personal data is provided as part of the attribute. “You can describe group properties on any level,” she explains, “but you should not identify individuals with these attributes.” That means, for example, no email addresses or student numbers in this column.

The key, Koren adds, is that the attribute “decouples” usage reporting from access authorisation. Meanwhile, the REFEDS standard says that consuming services – in other words, publishers – should not interpret the values in the attribute, other than as “opaque values to be matched for the intended reporting function”.

A community-wide approach

Of course, a key benefit of federated access is that it has the support and trust of the global research and education community – and any changes to data standards should have that, too. So it was important that the new attribute should involve work with key federated access initiatives such as REFEDS, which represents identity federations, and eduGAIN, which connects them. The process involved a working group and a public consultation period.

“Before the pilot started, we were working with REFEDS and the eduGAIN community, to have the attribute standardised and to add it to the eduPerson schema,” says Koren. After this was done, it could be implemented internally, and only then could the pilot start with customers. “Unless you make a standard out of it, it’s not going to catch on, and it’s not going to bring value.”

“The reason this is important is because schemas don’t get altered many times. Adding this into the schema is quite a big thing. It means that this is accepted by the community and can be used; it becomes part of the general documentation.”

Making the case for federated access

As an advocate for federated access, Meshna recognises that it’s important that it’s not just Elsevier who offer the eduPersonAnalyticsTag attribute; the hope is that other publishers do too. After all, a technical standard is one thing; but it takes time for a new way of working to become standard throughout a community.

Indeed, the principle of federated access is that it works best when it is adopted by the community as a whole – and that holds true for data attributes, too.

“I wish more publishers would implement this,” Koren says. “The more that usage is decoupled from access control, the fewer access issues users are going to have –and that means a more seamless and better user experience overall, which is great for everybody.”

Humphreys notes that, during the restrictions of the Covid-19 pandemic, soaring demand for remote access had helped to make the case for federated access to resources; this, he suggests, builds on the value proposition.

In future, innovations like the eduPersonAnalyticsTag can continue to support the business case for federated access to resources. In other words, it’s a virtuous circle: as a library, if you’re able to see the value in the content you’re subscribing to, then you also gain more value from the technologies in which you invest.

And if you’re a publisher, why not start offering the eduPersonAnalyticsTag to your customers? It’s a way to give libraries the granular usage statistics they want, while getting the benefits of federated access.

Rob Scaysbrook is head of global sales and partnerships at OpenAthens

OpenAthens Elsevier COUNTER