Digitisation project connects history
Researchers have a new resource for searching across a wide range of British historical sources. Sharon Howard describes the Connecting Histories project and gives advice to people undertaking similar projects
Time is precious when you are a researcher. Running repeated searches of separate resources is extremely time-consuming. In addition to this, researchers have to navigate differing types of search facilities and technologies, so it can be difficult to search for resources systematically. Moreover, researchers need to record information about the sources they find, not least for citation purposes, and will find variable options for saving relevant results and exporting metadata.
When researchers find links between the materials they uncover within different resources, they will almost certainly have to record those links manually. And where resources are behind a paywall it is often difficult for non-subscribers even to evaluate their potential usefulness for particular research projects.
These are perhaps particularly acute problems for historians, who tend to use a wide range of different types of source material.
A team of specialists from the universities of Hertfordshire, London and Sheffield in the UK have been working on a resource to help historians overcome some of these issues. Connected Histories, a project funded by JISC, is a federated search facility for a wide range of distributed digital resources relating to early modern and nineteenth-century British history.
The process of selecting, planning and drafting the proposal for funding took around two months. The resource choices reflect the interests of the project partners in British history in the period 1500-1900. This is a place and time that is extremely well represented in digital history and by a variety of different business models. We were particularly interested in resources that would contain a high proportion of named people and places.
All the resource owners contributing to the project were very willing to participate. The practical process of negotiating formal signed agreements did run into occasional difficulties, and in general took longer than expected. However, we worked with research services staff at the University of Sheffield to create an agreement template, and they provided a crucial support service throughout the process. Although most organisations signed up to the standard agreement with minimal changes, the process of obtaining approval from legal departments and/or governing bodies could be quite protracted.
In developing the user interface, we needed to understand how users were likely to interact with the site, and what features they would be likely to use. The website now brings together 11 different major historical web resources - allowing, for instance, Parliamentary Papers to be searched in combination with the records of all the Middlesex county sessions papers.
The site removes the initial barriers to cross searching and allows people to search in a more structured way.
The facility to save 'Connections' makes it easy for researchers to record links they find, and the export functions allow them to easily transfer information into their own databases or other software for analysis. Although Connected Histories cannot give non-subscribers access to restricted sites, it does display snippets of text in search results, so helps researchers to decide whether the resource is worth obtaining access to.
Although it is not immediately obvious to users, Connected Histories uses an API architecture that means search engine and data can be used and re-used independently of the website. This facilitates future expansion, but it will also be of interest to researchers interested in pursuing large-scale data mining projects.
My advice for those planning similar projects is threefold. Firstly, always allow plenty of time for getting agreement from source providers, and use whatever support and advice is available from your institution. Generally, try to build good relationships with source providers and engage them in the process as much as you can. For example, we included representatives from several source providers on our advisory panels and invited them to the launch.
Secondly, be prepared for processing and storing massive amounts of data. The size of some datasets rather took us by surprise, and we had to find much more server and storage space than we originally anticipated. In addition, good quality metadata and documentation from providers, especially for complex structured datasets, can make the job much easier.
Thirdly, make the most of available software and tools to aid collaboration between partners in the project. For example, we used Basecamp extensively for project communication and file storage, and a wiki for collaboratively writing background pages for the website.
We do not know what kind of historical research will be undertaken using Connected Histories, but it has been built intentionally to facilitate both cross-resource searching and all the new methodologies associated with text and data mining. It will certainly make it easier to discover links between, say, Parliamentary Papers and newspaper reports, and to more fully incorporate images and ephemera into our research; but we also hope it will allow historians to more effectively work together, to interrogate the ever expanding infinite archive of online data in new ways.
Sharon Howard is project manager at the Humanities Research Institute of the University of Sheffield