Language as a barrier to publishing?

26 August 2021

Share this on social media:

Issue:

October/November 2021

Publishing news

Industry trends

Image: Anna Kutukova/Shutterstock.com

Hilde van Zeeland and Juan Castro describe how natural language processing is changing the game

Many researchers, especially those who speak English as a second language, struggle with writing papers. The area of natural language processing (NLP), a strand of artificial intelligence (AI), can help to tackle this issue. It offers a range of tech initiatives that facilitate the understanding, writing, and proofreading of scientific texts. In this piece, we look at how NLP is on its way to transform scientific writing and publishing.

NLP is incredibly broad as a term, and its use cases are infinite. In scientific publishing,it is typically used to derive information or patterns from manuscripts, and to classify, summarise, translate, or proofread them. NLP tech can support authors with any research task that involves language: from digesting relevant papers to writing their own.

Finding and evaluating literature

There are plenty of NLP-driven tools that help researchers process the literature. Examples are Paper Digest, which auto-summarises articles, sci.AI, which lets researchers find relations between objects across papers, and scite, which shows how often an article has been cited with supporting or contrasting evidence. Initiatives such as these help researchers to find, digest, relate, and position articles, and have the potential not only to alleviate researchers’ workload but to contribute to science overall.

With more data becoming available and NLP making progress at incredible pace, the semantic analysis of articles may lead to even more impactful applications for literature evaluation.

Josh Nicholson, CEO and founder of scite, says: 'I think articles are going to be increasingly used in new ways that make them more discoverable and easier to understand. Just like with cell phones as new capabilities became available, we started to use our phones for calling less and less. I see this happening with articles now, where new capabilities are being introduced that allow researchers to go beyond simply printing them. For example, we already have AI tools that can help researchers pull out key facts from papers and distill the findings into short synopses, and that can show you how papers have been discussed in the literature or social media.'

Another NLP-driven area that can help researchers with reviewing the literature is text simplification. While publicly accessible text simplification tools are typically limited to word-level changes (replacing technical terms with higher-frequency ones, etc.), state-of-the-art approaches provide full sentence rewrites. Services built on these approaches may be especially relevant to junior researchers or non-native English speakers who struggle to understand scientific papers. As many of such approaches are open source, they might become more visible within the research ecosystem over time, as developers build user interfaces to query them.

NLP for automated proofreading

NLP can also be a powerful asset to facilitate the writing process. Researchers already have several language apps at their disposal, such as the grammar and spell check within Google Docs and Microsoft Word, and stand-alone services such as Grammarly and Writefull. Writefull, which offers automated proofreading solutions to researchers and publishers, has quickly gained popularity over the last few years thanks to its NLP tech that is tailored to scientific writing. The company recently reported that its language models have achieved 88 per cent of the average human proofreader’s performance, making affordable and high-quality proofreading widely available to researchers.

AI-driven proofreading solutions are increasingly adopted by publishers, too. They use them to triage manuscripts based on language quality, to speed up the peer-reviewing process, and to alleviate the work of their copy editors. Juan Castro, CEO and co-founder of Writefull, witnesses a changing attitude towards AI-driven proofreading in the publishing industry: 'Publishers and scientific copy editing companies often assess our tech before they integrate it into their systems. Several customers have told us that the language edits given by our models are more consistent and accurate than those of their human copy editors. While publishers were skeptical of our tech a few years ago, we see a rapid shift in this as its capabilities become evident.'

Further uptake of AI-driven proofreading tools can help to reduce the linguistic bias in scientific publishing, where submissions with ‘non-native-like’ language are perceived as having lower scientific quality^[1]. Hindawi, the first publisher to integrate Writefull, writes on its website that it offers the service to its authors ‘to help ensure no one is unfairly held back from publishing due to English not being a first language’. As the adoption of NLP-based proofreading progresses, language should become less of a barrier to publishing, diminishing the divide between more and less language-proficient researchers.

Auto-generated manuscripts

Automated proofreading services can also be used to complement other NLP-driven applications, such as those that automatically generate language or translate texts. There have been a number of such initiatives within scientific writing, including an automatically generated book and tools to auto-generate (parts of) manuscripts, like SciGen and SciNote’s Manuscript Writer. Beyond academia, GPT-3, DeepL, and Google Translate are powerful tools to auto-generate or auto-translate language. Big players such as Google, Microsoft, and Amazon regularly publish NLP-based resources that can make an impact on scientific writing. For example, Google Research recently launched a service called Tapas, where users can ask a question about a table and get an auto-generated answer back - in essence, an explanation of the data shown.

While many of the above initiatives may go unnoticed by most researchers, authors do seem to be experimenting with auto-generated texts. We have seen numerous SciGen-generated manuscripts being submitted to journals and conferences, and academic proofreaders have reported a growing number of requests to ‘post-edit’ texts that have been auto-translated using DeepL^[2].

Auto-generated or auto-translated text should never be taken at its face value and used ‘as is’. In addition to their lack of scientific understanding, AI-based tools have linguistic limitations, such as the loss of coherence over long texts. Yet, for the time being, they might serve as a useful resource to researchers that struggle with writing, enabling them to discover relevant vocabulary and sentence structures.

Moving towards an all-round AI-based manuscript check?

While current NLP initiatives focus on separate pieces of the scientific publishing process, in terms of technical capabilities, AI could almost offer all-round manuscript support. Juan Castro: “In terms of manuscript revision, the needs of authors and publishers align. Publishers increasingly use AI to automate manuscript checks beyond language. For example, does a paragraph refer to the right figure, does the text explain the data accurately, are all citations in the appropriate sentences? These checks are just as useful to authors, and in the future, authors might have them at their disposal too. As they write, they could be getting auto-completions, auto-edits, auto-descriptions of their tables, and their references might be verified and automatically populated. And all could be tailored to their journal or discipline.”

While an all-round manuscript check could make the lives of researchers easier, once they have finalised their paper, the question remains: Is the science accurate? In the near future, NLP will likely allow us to produce manuscripts that are fully correct in terms of language, complete, and consistent - but evaluating the science will always remain a human task.

Hilde van Zeeland is an applied linguist at Writefull; Juan Castro is CEO and co-founder at Writefull.

References

[1] Politzer-Ahles, S., Girolamo, T., & Ghali, S. (2020). Preliminary evidence of linguistic bias in academic reviewing. Journal of English for Academic Purposes, 47. https://doi.org/10.1016/j.jeap.2020.100895.

[2] Textworks Translations. (2019, 16 July). DeepL for academic translations [Blog post]. Retrieved from https://www.textworks.eu/eng/deepl-for-academic-translations/.

NLP