Managing the risks of AI in libraries
AI can offer many benefits to research libraries and their users – but it’s also important to be aware of the risks, explains Jon Bentley
Is AI a friend or foe to library users – and how can the benefits outweigh the risks? These and other vital questions were the subject of keynote speeches and panel discussions at the Access Lab 2024 conference, organised by OpenAthens earlier this year. At the conference, we learned that AI has many potential benefits, with use cases varying from front-end research tools to content digitisation.
But speakers also stressed the risks and limitations of AI, which we should understand if we are to mitigate them.
“AI is not yet an actual intellect,” warned Dr Luba Pirgova-Morgan, research fellow at the University of Leeds, in her opening keynote address to Access Lab. Instead, the speaker said, it’s “a low-grade and sometimes broken mirror of those who engage with it. The more we know its limitations – and how to engage, interpret and verify – the more the mirror cleans up.”
With that in mind, here are the main risks to be aware of.
Errors and hallucinations
An important issue to be aware of with AI, speakers said, is its potential for error. If you’re used to trusting machines in a non-AI context, this can take some getting used to.
In language models, for example, so-called hallucinations – where AI get things demonstrably wrong – is still common, meaning that in many cases, manual checks are necessary.
Day two’s keynote speaker, Bella Ratmelia of Singapore Management University (SMU) libraries, reminded delegates of the case of an Air Canada chatbot. In that case, a chatbot’s incorrect information led to legal liability for the airline, despite the fact that the chatbot also provided a link to the correct info.
For the research community, of course, accuracy is of vital importance. And it’s easy to imagine how a researcher using AI could fall victim to a “butterfly effect”, where an apparently minor error could cascade to bigger errors further down the line.
Unreasonably high error rates could also potentially undermine the apparent benefits of AI, warned Access Lab panelist Matthew Weldon, of Technology from Sage. For example, if an accessibility tool has a 10% error rate, that means that people who rely on that tool are relying on lower-quality information than their counterparts – meaning that the accessibility gap still exists.
“I would argue that an error rate like that is not acceptable when it comes to making your teaching more accessible,” Weldon said. “Accessibility is a core, fundamental aspect of education; it should be accessible to all.
“And if AI is getting that wrong, even 10% of the time, it’s fundamentally creating an inaccessible experience of some sort. It may be a useful adjunct – a useful tool when doing things like alt text – but it probably can’t, and probably shouldn’t, replace people in that process.”
That makes the question of what is an acceptable error rate, and in what context, an ethical question that libraries should be acutely aware of. In the context of a research discovery tool, meanwhile, the suitability of AI might depend on the AI expertise of the user.
Some, for example, might benefit from training in AI skills – or so-called “algorithmic literacy”, to use the words of a report on AI by Dr Pirgova-Morgan, Looking towards a brighter future. “This type of literacy,” the report says, “involves being aware of one’s interaction with AI, comprehending how AI processes information found online, and knowing how algorithms collect personal data” – and this is something that libraries, perhaps, could help provide.
Biased data leads to biased results
Another commonly cited risk of AI is bias. An AI system is typically “trained” on existing data sets – but if the underlying data is unbalanced or not sufficiently diverse, biases may be replicated in AI results.
Bias in algorithms, Dr Pirgova-Morgan told Access Lab, could “perpetuate existing inequalities or even exclude certain groups of library users”, while training on biased data could cause it to “inadvertently discriminate against specific demographics”.
To use an example outside the library, an AI that supported health decisions about melanoma, for example, might need to be trained on a diverse set of skin tones – and indeed, a Diverse Dermatology Images dataset has been created by Stanford University for exactly this purpose.
Depending on the institution, ensuring access to diverse datasets might well be the function of the library. A situation to avoid is where AI becomes a kind of ‘black box’, making discriminatory decisions based on a set of assumptions which are hidden from the user.
Jisc’s Peter Findlay, for example, noted in a panel discussion that people who program AI algorithms “don’t necessarily know why the machine creates the output that it does” – but that humans, too, are a kind of ‘black box’, with AI revealing human biases that pre-exist.
Ultimately, Findlay suggested that working as a “combination of machines with people” is the way forward, if we want to mitigate risks.
Privacy versus personalisation
A third major ethical challenge, for libraries who are keen to use AI tools, is privacy.
A possible benefit of AI, for example, is its potential to make personalization more useful – for example, by remembering a user’s previous searches and using AI to predict the next step.
But researchers may object to this level of personalization, especially if they are conducting research on sensitive topics. And both users and publishers may also object to their data being used to train an AI model, as part of a for-profit model – the former for privacy reasons, and the latter for reasons of intellectual property.
In the keynote speech for Access Lab, Dr Pirgova-Morgan set out the benefits of AI, and its potential to be a “library superhero”, but also warned of the risks. “Sometimes it can be a little bit too nosey, raising concerns about privacy and fairness,” she said. “Imagine it as that well-meaning friend who sometimes overstep the boundaries just a tad too much. So while AI brings a lot of cool features to the library party, we need to keep an eye on how it behaves to make sure it’s a friendly helper – and not a troublemaker.”
A major issue when considering privacy is that policies vary from country to country, as Dr Pirgova-Morgan suggested, noting a difference in data-gathering practices between the UK and the US. A user in the UK, for example, may have higher privacy expectations and legal protections than in other parts of the world.
The importance of transparency
Given these three major risks – error, bias, and loss of privacy – what can be done in mitigation? The answer, perhaps, can be found in an open and transparent approach to AI tech.
Transparent user consent is, of course, a cornerstone of privacy risk mitigation. The risk of bias, too, can be mitigated by transparently acknowledging the data sets that are used to train an AI model, and involving humans in decision-making processes.
And when it comes to the problem of AI-driven research errors, transparent citation of sources is similarly critical. If a language model makes an assertion based on its analysis of research, for example, it’s vital that it provides a source that a researcher can manually check.
From the point of view of OpenAthens, it will be vital that researchers are able to access publishers’ copyrighted content seamlessly, no matter what AI-driven discovery tools they use. Indeed, this will be vital as users make careful checks of underlying research.
Ultimately, AI is a creation of humans – so humans have the power to put the guard rails in place that can mitigate AI risks. This will include ethical considerations at the developer level; training for librarians in AI benefits and risks; and, naturally, a wider understanding of the errors, biases and privacy risks that AI can create – among users, programmers and libraries alike.
Jon Bentley is commercial director of OpenAthens