Partnership tackles archiving challenge

Share this on social media:

Topic tags: 

There is a growing awareness of the need to preserve digital information. Tim Tamminga of Endeavor Information Systems describes how a new partnership with Sun plans to address this issue.

Last October, Sun Microsystems and Endeavor Information Systems, an Elsevier company, announced an expanded partnership. Their aim was to create data preservation technology to tackle the long-term archiving needs of institutions such as national libraries, universities, research organisations, and museums.

This partnership evolved in response to two intersecting trends. The first is the need for standardisation of preservation efforts. This has been hastened by the growth of the internet and the variety of digital storage options that have emerged in the past decade. Concerns about the range of storage options can be seen, for example, in the 2001 revisions to the American Library Association's preservation policy. The original policy was only developed a decade ago but the recent revisions added the need for institutions to monitor the 'life-cycle management of digital publications to assure their usefulness for future generations.'

The second trend that the companies identified was the proliferation of governmental and institutional mandates requiring the preservation of national and cultural heritage artefacts. The mandates cover images of historical events, sound recordings, and textual documentation, as well as more recent digital creations such as web pages, wikis, and blogs. Such efforts were given a boost with the approval of the Open Archival Information Standard (OAIS) as a published standard by the International Organization for Standardization (ISO) in 2003. Now technology providers and archivists have a common vocabulary for digital repository and preservation efforts.

However, despite the recent policies and standardisation efforts, there is still plenty to be done. The sheer volume of new content that is now 'born digital' raises many new questions. What are the best practices for migrating new and existing content? How can institutions optimise workflow and automated processes? What digital curation and electronic warehousing options are available? And how do libraries themselves need to evolve? The ultimate goal of this partnership, a best-in-class digital repository platform, should help to address these and other emerging issues.

The companies bring different strengths to this multi-year partnership. Sun's expertise lies in network-computing infrastructure. Endeavor produces library-management software and digital-archiving technology. It also brings Elsevier's heritage in managing large-scale content repositories. This is not the first time the companies have worked together. Sun has been Endeavor's primary technology partner since Endeavor's inception in 1994 and Sun technology supports 90 per cent of all installations involving Endeavor's integrated library system (ILS), Voyager.

Migration versus emulation

The newly expanded collaboration faces numerous challenges. New technologies will emerge and replace existing digital standards, so the framework must address how digital artefacts can be accommodated to evolving standards to ensure long-term viability. In parallel, libraries must adjust to the reality that digital information now comes with a lifetime cost, and not just the initial purchase cost.

One option in maintaining archive material is migration, where digital objects are moved from an older technology format to a more current standard. Migrations are relatively easy and cost-effective, as new media types typically store at least twice as much data as the previous version. Additional savings may be realised in decreased storage space (both physical and digital) and decreased operating costs for managing the older standard.

Migration also offers libraries the ability to handle exponential data growth. With many data collections doubling in size in a year or less, migration to newer, higher-density storage media guarantees the accessibility of the archived data.

This approach will not satisfy everyone, however. The ultimate goal is to preserve not just the bits associated with the original data, but also the context that permits the data to be interpreted. For this reason some archivists demand access to the materials in the format in which they were originally created. Emulation, which combines software and hardware to reproduce the essential characteristics and performance of a computer or program in a newer environment, would satisfy many of these concerns.

However, arguments loom regarding the validity of emulators. While emulators aim to replicate the complete functionality of a technical environment, there is the potential for the essential characteristic of the record - the inherent 'look and feel' - to be lost in the emulation process.

Intellectual property rights on hardware and software could also impact emulation efforts, as could prohibitive maintenance costs.

At this early stage, the partnership is adopting a hybrid preservation strategy. This approach will support the migration of digital objects as new standards and technologies emerge. However, for each migration, a copy of the archived work in its original format will be maintained as well. Plans regarding emulation modules will continue to be refined as the project moves forward.

Today, national and state libraries are leading the charge to place their collected works online. Some of these collections include millions of digital objects and will require storage measured in petabytes - the equivalent of more than 50,000 desktop computers with 20-gigabyte hard drives.

As digital libraries evolve and expand in terms of both size and functionality, it is critical that the underlying technology platform delivers the required performance and reliability. Sun has a wealth of practical experience to share, as it has helped a number of organisations transition to digital-media technologies.

In the early 2000s, Sun first introduced its Digital Asset Management Reference Architecture (DAM-RA), providing companies with a framework for managing the transition to an open, standards-based digital-asset management system. Essentially, this reference architecture serves as a set of blueprints that characterises all the elements of a replicable system for a particular application, helping to accelerate planning and deployment of the system.

Institutions that adopt DAM-RA can also benefit from Sun's experience in managing and achieving efficiencies in both workflow and automated processes. For example, Sun can evaluate how patrons of a national library access the terabytes of content stored in digital repositories. The more popular content would be designated as 'higher value'. It is accessed with more regularity than other content so should be available to patrons at all times.

Sun's hierarchical file management capabilities would then drive the development of a process - either automatic or manual - that would knowledgably dictate storage options for all the library's digital archives. It would make the high-value content readily accessible on disc array, while moving the low-value content to tape, where it would still be easily accessible. This hierarchy would also generate cost-savings, as storing objects on tape is less expensive than disc arrays or magnetic disk drives, which most libraries are beginning to phase out.

A modular approach

One of the goals of this partnership is to create a digital repository platform that is both modular and open in nature. A component-based framework will make it easier to substitute different parts. This will give institutions the ability to 'plug and play' as patron needs (e.g. customised functionality for students of different ages or providing access for special needs) and technology requirements dictate. It also protects against the obsolescence of antiquated technology, as this modular framework will be conducive to integrating replacement software.

Endeavor currently offers two product lines that manage digital and electronic content.

Curator enables local digital-content creation and access, providing a framework for managing diverse collections of papers, audio, art images, and other digital objects. Journals Onsite is an enterprise-level system enabling libraries to locally store, search, and browse commercial electronic journals. Elements of both technologies will be included in the framework, along with new repository functionality. This will provide a broad range of content types and categories, enhanced preservation support, and extensible web services via application program interfaces (APIs).

In particular, Journals Onsite will be employed at institutions that require the handling of locally stored journal content. The technology is designed to integrate with OAIS-based preservation systems, supports high-volume ingest processes, and boasts significant efficiencies in both scalability and speed.

Endeavor's parent company, Elsevier, has an extensive track record in the area of digital repositories and this will help in executing the vision of this partnership. It has invested considerable capital and resources toward preserving published research for future generations of scholars. This began in the late 1970s with the launch of the Adonis project, which scanned journals to optical disks for local use by document delivery services. In the last few years the company has digitised more than three million articles, published over a span of 180 years.

In 2002 and 2005, Elsevier forged agreements with, respectively, the National Library of the Netherlands (KB) and Portico, a US-based, non-profit electronic archiving service. These will create official digital archives for all of Elsevier's 2,100 current and formerly published journals on ScienceDirect, its electronic platform. These initiatives seek to ensure an established digital archive guaranteeing permanent access to critical scientific and medical publications.

Specifically, the Sun-Endeavor partnership will benefit from Elsevier's experience in delivering electronic-warehouse solutions, which manage the storage and dissemination of electronic resources to online journals, as well as the company's specialty in analysing human-computer interaction. This expertise is highlighted by Elsevier's User Centered Design (UCD) Group, a team of human-computer interaction experts that, in addition to numerous Elsevier projects, has also assisted in the development of customer-focused interfaces for several product lines within Endeavor.

The digital library, redefined

As increasing numbers of libraries make the decision to 'go digital,' these institutions must be willing to redefine themselves, not only as repositories and preservers of information, but also as direct providers themselves, of digital content to their patrons. Previously, libraries were solely focused on achieving efficiencies in amassing materials and streamlining distribution efforts. Now, the concept of enhanced dissemination of all institutional materials, both physical and digital, must become commonplace for these institutions to evolve.

As the digital libraries of tomorrow take their first baby steps today, this burgeoning technology partnership endeavours to meet the changing needs of this massive undertaking. The long-term goal is to become the de facto standard for all digital repository and preservations efforts throughout the world.

Tim Tamminga is director of strategic development, archive solutions for Endeavor Information Systems.