Open-source search tool helps clippings database

The database and technology arm of the Newspaper Licensing Agency (NLA) oversees access to a vast database of clippings from over 140 national and regional UK newspaper titles. The NLA’s subscription services have proved popular with a range of clients, and its database has grown by around half a terabyte per year.

The NLA recently identified a gap in the market to provide newspaper content to a wider audience. Its vision was to use the “Googleesque” search interface built for its existing ClipShare subscription product, to allow easy access to its clippings on a more casual pay-as-you-go model. In late 2008, the NLA launched ClipSearch.

ClipSearch provides access to over 11 million articles within the NLA database and is said to offer the only comprehensive single source of major UK newspapers in digital format available to the general public. The content, available through ClipSearch, is expanding by 480,000 articles per month; NLA services are used by 25,000 users daily in 5,000 organisations.

The NLA has two key aims: first to build its publishers’ revenue by controlled access to digital newspaper content and, secondly, to save its publishers money by distributed access to content within the NLA database. Achieving this through even a simple frontend interface required a capable and rich back-end search engine. The users required different access rights to be considered. This resulted in an extensive number of rules that had to be enforced through the application-programming interface.

After short-listing a handful of options that met the technical requirements of ClipShare, the NLA selected the Flax search solution. This open-source tool was developed by search engine development company, Lemur Consulting and licensed under the GPL open-source licence.

Flax gives users the ability to save searches and set up automatic alerts on topics of particular interest. It also restricts access to particular articles under legal embargo. Lemur Consulting customised the user interface to the NLA’s specific requirements and integrated the solution with the NLA’s existing infrastructure.

Matt Groshong, operations director at NLA, commented: ‘As with a proprietary solution, we have the security of knowing we are fully supported by a team of experts. At the same time we have the cost benefits of open source and the ability to view and modify our own code, should we choose to. Lemur has built a system that is easy to use, easy to control, and has excellent functionality.’

Groshong’s initial opinion of the opensource search model has changed. ‘I spent over 10 years at Microsoft, so my blood and bones told me I shouldn’t back open-source – but since the NLA adopted this model, I have certainly embraced the idea that open-source is a great way of developing fast, effective and thorough collaboration of information. I would recommend taking an approach like Lemur’s, where the company has taken an open-source core and developed a real-world solution and code which you can apply to specific business scenarios or models.’

Back to top