Software sifts through diverse chemical data

Share this on social media:

John Murphy profiles the CEO of the chemical software firm Elsevier MDL

The annals of the pharmaceutical industry are filled with stories about how a substance made for one reason is found to actually be the miracle new drug that the research team down the corridor have spent years looking for.

If only people knew what everyone else was doing, how much wasted time would be saved, eh? And that is without counting how many research papers and patents that have the information you need are actually published in journals you would not think to read. The results that you need may have already been obtained and been patented or the idea could have already been tried and eliminated. And, even if you could read everything, how do you know you haven't missed something because it has been described in a way that you were not expecting?

This could be why so many pharmaceutical companies are customers of Elsevier MDL, a California-based company that produces enterprise software and database solutions that sort through both internal and external haystacks looking for the relevant needles.

'We provide the ability to search by chemical structure and sub-structure. We integrate any type of data with any type of workflow when doing research,' explained Lars Barfod, CEO of MDL. 'The data of most pharmaceutical companies today is still very disparate in all kinds of databases and documents - mostly either in Word documents or Excel spreadsheets - and these are traditionally not very structured, so it is very hard to get at the information.'

With this challenge in mind, MDL's latest offering, Isentris, offers the ability to search document stores by chemical structure even if the structures in those documents were never indexed. 'We try to help companies plan their experiments so that their data is better structured, and then help them to search and browse the data afterwards. But about half our business is providing customers with external information, such as chemistry databases, which can be integrated with their internal data.'

Integrating the past and future
As part of this, MDL's technology is being turned inwards on Elsevier's historical database of papers, turning graphic images into searchable structures. In the future, even clinical data could be integrated, together with toxicology databases, so that when a scientist searches a structure they can get in one place and one format everything that is known and recorded about that substance, wherever it was stored and irrespective of the format it was originally stored in. According to Barfod, what can be added to Isentris is only limited by what data is harvested in the first place.

MDL began in 1978 as Molecular Design Ltd. It was founded by Stuart Marson and Stephen Peacock, who were post-docs at UC Berkeley, and Todd Wipke, a professor at UC Santa Cruz. The company was originally a consultancy service drawing molecular structures on huge specialist graphics terminals. However, the founders soon discovered that there was more demand for the tools they had developed to help them draw chemical structures, so the business evolved into a software company.

In the early days there were few standards in IT, and Oracle databases were still considered as new technology. The sophistication of the company's products has grown incrementally since then, leading to the release of the ISIS (Integrated Scientific Information system) platform in 1991. Along the way, MDL developed the Molfile format for representing structures and this has now become a world standard.

The company was taken over by the Robert Maxwell empire in 1987 and was one of the star assets when that group was liquidated in 1992. MDL then went public in 1993. In 1997, it was acquired by Reed Elsevier and changed its name to Elsevier MDL in 2004. It currently serves about 1,500 customers but the majority of its business comes from just 30 or so of those, which are the heavy hitters of the pharmaceutical and chemical industries.

Over the years, the company has added capabilities to ISIS and integrated other services, including external data sources, through acquisitions or alliances. This has culminated in the replacement of ISIS with the Isentris platform. Elsevier MDL integrates data sources from its own databases, scientific papers, patents and directories of global sources of chemical compounds. It even acts as a subscription agent for several information providers.

The man at the top
Barfod joined the company in 2001, having previously worked for the Danish diabetes product company, Novo Nordisk, and for Genentech, a biotechnology research firm based in California. He has a background in science and said that he has always wanted to be part of industries that work to improve people's health and quality of life, which these two companies gave him the opportunity to do.

He is also fascinated by the potential for data, information and knowledge to facilitate discovery by improving workflows and enhancing scientists' ability to make decisions. This fascination led him to become vice-president of another Californian company, Deltagen, which produces mouse gene knockout data, before eventually leading to him joining MDL.

He became CEO of MDL in July 2004 and sees the company as playing an important role in the drug discovery process. 'Our purpose has always been providing integrated access to critical information and our strongest capability is in lead generation and optimisation in the pharmaceutical industry. We can be used by any chemistry, pharmaceutical or biotechnology company,' he explained.

'We have always been and continue to be financially healthy, despite working in a very tough environment. If you look at our competitors you will see that our customers need to look for cost efficiencies a lot more than they did five or 10 years ago.'

Despite the tough environment, there has been less consolidation in this industry than some people might have expected. 'While it sounds easy to consolidate, many companies have built applications that are difficult to integrate,' said Barfod. 'In life sciences discovery, most applications are geared to a particular scientific discipline, such as gene expression or cheminformatics. We might think it would be good to integrate some of these but the cost of integration is great and the benefit not so apparent.'

Nonetheless, MDL does have a strategy to grow from internal growth and acquisitions. This was shown in its purchase, in 1994, of Occupational Health Services, which makes safety data sheets, and acquisition of the rights to sell the Beilstein databases of chemical reactions in 1998. MDL has also acquired Interactive Simulations (a San Diego-based molecular modelling company), Afferent Systems (a San Francisco-based combichem tools company), and SciVision (a Burlington QSAR company). And this looks set to continue. Barfod said that he is interested in looking at potential acquisition targets in underlying technology, scientific databases or logistics and procurement, as well as electronic laboratory notebooks.

A labour-intensive task
MDL's business is extremely labour-intensive. The staff must read journals and extract information for its databases, redrawing the structures into a searchable format. The company is developing tools to electronically scan information from patents and other documents and turn them into a searchable form - but even then there is a high degree of human checking that needs to be done.

'We are increasingly employing a machine-reading approach. That has been very difficult because the approach to writing a reaction scheme, protocol or material and methods changes from one person to the next. We have developed semantic capabilities to redraw structures automatically and extract the protocol. People must still sit and re-read the thing to make sure it was done properly but we are saving a lot of time,' he explained.

'Speed is very important because our customers want access to information, particularly patent information, no more than two weeks after publication,' he continued. 'One of the problems with chemistry patents is that people try to cover the widest area possible by using Markush structures to cover any possible variations of a structure that are functionally equivalent.'

Another big task has involved drawing on other Elsevier resources by scanning the Scopus bibliographic database, which covers 14,000 journals going back 35 years, and setting up links between Scopus articles and Beilstein databases.

But the main focus of MDL's work is in communication and information distribution within the huge pharmaceutical companies. As Barfod pointed out: 'There are examples of research groups in the same company that read about findings from their own company in a journal. There is a tendency amongst scientists not to tell people what they are doing, even internally, and systems have not made it easy for 10 or 15 research sites around the world to freely share information. Most of our competitors are mainly focused on external data sources, but for most pharmaceutical companies their internal data is much more important.'

The reason that so many pharmaceutical companies want this kind of facility is to save time. Barfod said that the whole industry is under huge pressure to find replacements for the cash cows that are coming out of patent, so up to $70 billion a year is spent on research.

With Isentris, a scientist does not have to know the chemical name of a substance they might want to create as a drug candidate. They can simply describe what it might look like and get back every reference to similar substances, how they were synthesised, what toxicology is known and even where to buy the starting materials.

MDL's offerings are given product labels and installed 'out of the box', but in reality every installation is different. It has core packages but most of the value is from integration with whatever data the customer has. Barfod spends considerable time visiting the top management of the company's customers to discuss what features they would like, and to make sure his strategy is aligned with what customers want. There are also several user-group meetings per year, where customers share best practices with each other.

Some may wonder how it is that Elsevier, a company that specialises in publishing information could end up owning what is effectively an information systems software vendor, but Barfod rejects the idea that the parent company is only interested in a short term engagement while it develops and extends the Scopus offering.

He said: 'I am confident that Elsevier sees MDL increasingly as a strategic asset, because we are one of the ways that Elsevier can transform from a publishing company to a company that provides information in its broadest sense. The type of technology we have is what Elsevier knows it needs to get the company to the next level. Our strength is in letting our customers look at external information in the context of their own information.'

So could the future include providing the integration of external and internal information to law firms, banks, accountancy or medical practices? 'Exactly,' replied Barfod.


Master of Science, Copenhagen Royal University, Denmark

Product manager, Chr. Hansen's Laboratories
Various positions, Novo Nordisk Pharmaceuticals
Vice-president for marketing and corporate officer, Genentech
Vice-president for commercial development and corporate officer, Deltagen
Executive vice-president and chief business officer, Elsevier MDL
President and chief executive officer, Elsevier MDL