Annotation in the scholar and publisher workflows

Share this on social media:

Heather Staines investigates data portability and flexible data creation

With the recent notices about Cambridge Analytica’s use of personal information from millions of Facebook accounts, people are paying closer attention to their data – who can use it and how users and platforms can control that use.

The European Union’s General Data Protection Regulation (GDPR) has also forced companies of all sizes around the world to take a closer look at what user information they store, who they share it with, and what mechanisms they use to protect it. Researchers and readers want to know how the tools they use throughout their daily workflow collect data about that use, and they want to know how they can get that data out of such tools if necessary.

Founded as a non-profit with a mission to promote an open standards-based annotation ecosystem, Hypothesis has always put user privacy and protection of user data first. Our goal is to enable researchers, readers, students, teachers, journalists, and others to create and participate in a conversation that extends across the web and to enable private and group discussions that further scholarly communication, education, and fact-checking.

To do so with confidence, users must feel secure about their data. We consider both our non-profit status and our open source origin to be essential in winning user trust. This ability to manage data precisely extends also to our integration partners who want to define specific use cases for their own content and data. As we edge closer to a world where standards-based annotation clients run by multiple entities can interoperate with each other to provide a seamless end user experience, transparency around data will be critical.

Access to annotation data

When researchers or students create free annotation accounts, they can then use the tool to annotate content, regardless of its format, anywhere on the web. Annotators have immediate access to all their annotations, which automatically populate to their activity page where they can be searched, filtered by tags, and viewed in context atop web-based content.

This corpus of annotation data, once siloed on individual web pages or dispersed among different tools, now provides enduring value to the user. Researchers can use this same simple tool throughout their workflow from early reading to preprint collaboration to peer review to post publication discussions, as well as in the classroom. Utilising the browser extension or bookmarklet, they need not be dependent upon a publisher integration to take notes and collaborate in groups. The value of this body of annotation data is increased further because users can access their annotations anytime and utilise them for other purposes through a robust API.

Partners also have API access to annotation data across their content, which they can then use to display annotations elsewhere on their website for promotional purposes or further analyse through text and data mining. By embedding a common toolset for their end users, integration partners increase the value of the annotation data further.

Researchers and readers can view and interact with annotations across a publisher’s content through group pages, which can span a single book or journal, a collection of journals or an entire domain. Editors can decide what content groupings make sense for researchers. Unique users, like society members, can participate in groups scoped to multiple domains, to now have access to one conversation across society publications and member sites. Societies can use annotation to promote engagement or analyse member activity across these sites.

Who owns annotation data?

Public annotations carry a CC0 license. Private and group annotations are All Rights Reserved to the creators. (Comments rescued by Hypothesis after the discontinuation of PubMedCommons had CC-BY license, a situation which is noted on their new annotation cards. See: We anticipate that some partners may wish to explore the use of different licenses, for example, for annotations made by invited experts as part of the content or for supplementary information.

Flexibility and simplicity

Keeping things simple for end-users and useful for integration partners is another key goal in promoting the wider adoption of open annotation.

The last thing researchers want to do is create another account. Thus, partners can integrate existing accounts so that researchers can annotate immediately. This implementation also enables continuing access to existing personalisation tools. Partners can also port over any prior comments made using other services to preserve all interactions as page note annotations. This is what we did when PubMedCommons discontinued commenting. Users then have access to a connected searchable body of data. In the PubMed case we augmented the data with DOIs and PMID tags to make them more findable, accessible, interoperable, and reusable.

Since the W3C standards body for the web approved annotation as a web standard in February 2017 (, annotation tool builders have guidelines for building interoperable tools. In the future, you should be able to tell your browser what annotation client you use in the same way that you indicate your preferred search engine today.

Annotations made with standards-based interoperable tools should soon be able to interact with each other in the same way that emails sent via different email clients do today.

We often talk about a multi-client annotation world, but what does that mean?

Right now, annotations made using Hypothesis are stored on the Hypothesis server. However, organisations – and even individuals – who wish to run their own annotation servers can do so. Our tool was built with this world in mind, and we’re in the process of completing work to make it even easier.

End-users will be able to connect multiple accounts and move between annotation layers seamlessly, viewing and managing all their annotation data smoothly. It won’t matter if annotations in one publisher group layer are stored on one server and those in a separate layer are on another server. The tool will just work, and the annotator will be able to conduct their research or interact with their students accordingly.

Creating and consuming content through groups

People work in groups, and people look to different collaboration groups for different purposes. Anyone using the tool can easily put together a private group and invite other users to participate. Annotations for the group are then automatically visible on a group dashboard that members can explore. This functionality is commonly used in the education space, either for stand alone classes or via a Learning Management System integration.

Researchers also create collaboration groups to track activities for their projects and co-authors. There are currently more than 22,000 private groups on Hypothesis, and more than half of the 3.1 million annotations have been made in private collaboration groups.

Sometimes, however, there are more precise use cases for groups and a need to make group annotations visible to a world readership. Researchers and readers want to know, for example, when annotations are curated by a publisher or journal. Users can then contribute to that publisher or journal specific layer to interact with other researchers and readers on specific portions of text.

The first publisher group, developed in association with the life sciences publisher eLife, launched on January 31, 2018. The eLife group is what we consider to be an Open Group. The annotations made on eLife articles are world readable, visible by default to everyone who comes to the article pages. eLife uses their own accounts, so researchers who want to create annotations do so via their eLife profile. eLife moderators take care of any concerns around annotations in their layer. Readers who wish to use the Hypothesis public discussion layer or make private discussion groups can do so using their regular account. Hypothesis closely monitors the public layer and resolves any issues.

In other, even more specific use cases, partners may require what we call a Restricted Group. Anyone can read the annotations, but only those individuals designated by the publisher or journal can create them. In this case, readers have a clear message that the annotation content is authoritative. The first Restricted Group was recently launched for the American Diabetes Association, who wished to use annotations to provide updates and additional links to articles in their Annual Standards of Medical Care issue.

Other publishers with pending Restricted Group launches envision them as places where only authors can update content or where controlled conversation between an author and invited experts can happen.

Through multiple annotation layers across the same content, general discussion can to take place alongside dedicated read-only channels. Users can move seamlessly between the layers and explore activity pages for each of the groups they encounter. Other examples of group use cases include connecting citations to original articles or data, annotation for open peer review either pre- or post-publication, or entity annotation for reproducibility and other purposes. These latter cases focus on data connection as much as data creation.

Into the future

Open annotation technology offers users – any user, individual or organisation – the ability to create a unique persistent web address for content or entities displayed on a web page.

As a community, we have grown used to thinking of the web as a series of page level addresses, but now annotation technology can connect two precise passages to each other or connect an identifier with information that resides in a database elsewhere. The implications for linked data are endless, and the functionality works on any content viewed in the browser, without the need for the content provider to enrich or retag content in any way.

Users can connect things across the web, regardless of where they live, breaking down content silos and enabling researchers, publishers, students or readers to curate their own collection of annotations that are searchable via group pages or the public stream and repurposable through APIs. Public annotations feed into Crossref Event Data for indexing by Google, making them discoverable by others. With annotation data control well in the hand, both users and partners can move forward with confidence that their annotation efforts will endure, streamlining workflow and providing value to others.

Heather Staines is director of business development at Hypothesis