Data plays growing role

1 November 2004

Part secondary publisher, part primary publisher and part research institution, CAB International (CABI) has its origins in the Commonwealth Agricultural Bureaux. This was established in 1928 because Commonwealth countries wanted effective access to agricultural research information but, over the years, it expanded as more countries saw the value of the services provided. Now 40 member governments, ranging from China to Switzerland, own and make modest financial contributions to CABI, although CABI stresses that it is 97 per cent self-financing through the sale of its products and services.

Although the international not-for-profit organisation publishes journals and books and conducts fundamental research, it is perhaps best known in the STM publishing industry for its CAB Abstracts and Global Health bibliographic databases. We asked Andrea Powell (AP), product development director and Carol McNamara (CM), sales and marketing director of CABI Publishing to reveal more about these databases and how they fit into the broader vision of CABI.

How important are bibliographic databases to the research process?
AP: Well-constructed bibliographic databases are as important and valued as they have ever been, particularly as the amount of research literature grows and grows. Research from Los Alamos National Laboratory in the US has concluded that 60 per cent of researchers start their searches in specialist secondary bibliographic databases which then lead them to other material and, specifically, to full-text content. Our own statistics confirm that database use has grown. It is important for researchers not to repeat work, so extensive bibliographic databases are important.

The internet has changed the way that databases are used. The output of a bibliographic database was an end in itself 15 years ago – people would take the list of references and go to the library shelves to find the journal articles – but now people want to click on the references to get to the papers. That is why we focus on linkage to full text. For example, we actively use the DOI and CrossRef systems to embed links in our database to full text content; the user is authenticated by the original publisher of the cited journal.

However, the internet can make people lazy. Because linkage is so easy, users tend not to follow links unless the full text is available.

What other trends are you seeing?
CM: There is a continuing trend towards more online use of our content, but it is not as fast as we might like. People still like to browse through print versions and, in many parts of the world, internet access is intermittent and expensive. Many librarians, for the moment at least, still want a print archive.

The other main trend is in the subject field itself. We specialise in life-science publishing but we're increasingly seeing a change as agriculture declines. The subject area is moving from pure agriculture to leisure and amenity uses of the same land.

We have major customers in government agencies and they are also interested in related issues such as food safety and bioterrorism. For example, the US is using our databases to find out what is going on at and outside its borders for possible research into bioterrorism.

What about archiving?
CM: We have recently begun digitising our archives and this is revealing very important historical material relating to topics such as tuberculosis, which are still very relevant today. Another example is that the previous outbreak of foot and mouth disease in the UK was in the 1960s but CABI's content had only been digitised as far back as the 1970s when there was the most recent outbreak of the disease. Without a digital archive, it was more difficult to find research that took place during the last outbreak.

There is also an emerging interest in organic farming today. This is triggering an interest in farming during the times of the world wars, when farming methods were more traditional.

The first backfile to be released, which relates to our Global Health database, has just been launched on CAB Direct and Ovid. This backfile includes 800,000 records. The backfile of CAB Abstracts, with 2.2 million records, will be launched in February 2005. This will extend back to 1908 and include valuable information such as how tuberculosis has been eradicated in the past.

How do researchers find what they are looking for?
AP: We take great pains to index every record in our database using our thesaurus, which is a controlled vocabulary of around 60,000 terms. Researchers can do simple and advanced searches. We also have searchable classification terms. This is important because agriculture covers a broad spectrum of subjects. The CABI codes allow researchers to limit their search to just soil information, for example. We also put considerable effort into providing online information and explanations to the end-user. The database has to be very simple and intuitive.

Have pricing models changed?
CM: We have seen pricing models change, especially in recent years with the merging of print and electronic delivery. Some customers want both until they feel confident that the electronic material will be available permanently. We are continually looking at our pricing models.

The most common pricing model today is the simultaneous user model. We also deal with many consortia and they are often interested in an FTE (full time equivalent) model where they calculate the number of researchers at the site that will actually be users of the product. Corporate customers also generally opt for this model and their researchers may be distributed worldwide.

Our not-for-profit status and focus also gives us an emphasis on developing markets. Part of this is an objective to provide publications at affordable prices, and as part of this we try to get sponsorship for our products. There is a huge amount of interest in our databases from developing countries because in most of these countries agriculture is a key industry.

What does CABI Publishing do apart from its databases?
CM: Around 62 per cent of our income is derived from our databases but we also publish books and journals. Because of our database we can monitor and track trends and publish journals and books in those areas. For example, we recently spotted an increasing interest in aquaculture, or fish farming.

We also publish 17 primary journals, many on behalf of societies. We have a growing books list with around 500 titles in print, publishing 65 new titles per year. We are not very big compared with some other publishers, but having both databases and journals and books gives us opportunities. For example we can provide linkage between databases and full text, and package these together for members of societies, governments and academia.

How is CABI's database compiled?
AP: We receive material from all over the world. Two sacks of post arrive every day with hard copies of journals, books, conference proceedings, reports and other document types. We have a library management-type tool to predict when certain journals are expected so that we can chase up any that do not arrive. This material currently comes in print form, is given a barcode and is then sorted according to our selection criteria. Material that is selected is then outsourced to our partners in the Philippines who abstract and index the information.

There is a two-week turnaround before the abstracted data returns to CABI's offices in Oxfordshire, where final quality checks are carried out. The time between the post arriving and the records being accepted into the database is typically four to six weeks. In this way, 225,000 records are processed every year and that has grown tremendously. When I first joined CABI in the early 1990s it could take nine months for information to get into the database.

What are your priorities for the future?
CM: Access to digital content is essential to speed up this complicated process. We are currently negotiating with publishers to have electronic access to new research material. Timeliness of our databases is important to our users. We have recently changed to a monthly update frequency on Ovid's platform and our own platform is now updated weekly.

Training is increasingly becoming a big issue for our librarian customers. We have always given face-to-face training, but have also now developed a specific area of our site that is especially for librarians. We also run interactive training sessions globally, which are proving hugely popular.

What about the underlying technology?
AP: Improving the technology of the database is the other big priority. This includes integration with other library platforms and OpenURL compliance. Customers want to do searches in our system and then link to other products. We put considerable emphasis on technology and have an IT department that is well-stocked with people and technology.

We do not host our online delivery systems in-house. Instead they are hosted in London on a cluster of four servers to give load balancing. The site is then backed up here and also in Bangkok, where our development partners are based.

We have to have a very strict policy on archiving and are moving away from using tape backups because they cannot store enough data. The databases currently amount to around 40 GB, and will grow to around 60 GB once we add all the archive data.

What are the main challenges for CABI in the future?
AP: Technology will continue to push us forward. The ongoing migration from print to electronic will continue to be a challenge and we have to manage that process. Customers will not pay twice for the same material.

Because of changes in scholarly communication, a huge challenge is finding the information and giving access to it. The open-access model presents new challenges because we usually have to go out and search for new open-access publications. Today we find out about new journals by regularly looking at the Directory of Open Access Journals website. Open-access publishers should not just assume that because their information is out there it will be found. We try to educate publishers as well as end-users. And we cannot ignore the Google factor. It is not going away. We would rather work with them than compete.

What are the big opportunities?
CM: There are many opportunities in both the short and long term. Many of these are linked to the challenges. One opportunity is working with new partners, which could be the likes of Google. New partnerships open up new groups of customers. We will also continue working with existing partners to enable content to be more user-friendly. We are continuing to educate customers too.

Our archive is one of the major short-term opportunities. It will enable us to service our existing customers with more content as well as to enter new markets that we have never sold into before.