Data sharing on virtual servers can aid collaboration and help universities cut costs and carbon emissions, writes JISC's James Farnhill
Every day, 8,000 physicists in over 170 locations start work on their individual computers in universities in 34 different countries. Yet they all have near real-time access to the same data, generated at the Large Hadron Collider (LHC), the massive particle accelerator deep below Geneva in Switzerland.
International access to data is just one reason why some researchers are turning to virtualisation as a way of complementing resources available in their institutions. Virtualisation provides research on demand through software or web-based services that run multiple virtual machines side by side using the same hardware, imitating different parts of a normal computer system such as servers, applications, network and storage. It includes grid computing, like the LHC Grid project, special software which builds virtual computers, and the use of ‘cloud computing’ services accessed through an internet connection.
Working together and sharing data is often quicker using virtualised resources – rather than sending the same document back and forth, researchers can now collaborate through virtual servers such as on Google Docs. Using a virtual world such as Second Life to confer with colleagues, or a social networking site specially set up to enable exchange of workfl ows such as the Taverna-based MyExperiment, can make communication more intuitive.
An advantage for researchers is that their data is stored securely and backed up virtually, rather than on a laptop or CD that could be lost, stolen or corrupted. A virtual server may be less prone to breaking, and saving copies of a document or data across multiple virtual servers means there are more back-up copies if something goes wrong.
JISC has been looking at how we can facilitate storage and also allow others access. For example, researchers using differently-formatted data from various databases, both public and private, can use the OGSA-DAI software to bring it up on their computer in a single format – so that all the data can be searched, read and commented on as if it was in one database.
However, individual universities don’t have the resources for all the interested researchers to access the massive amounts of data they need. Besides, an individual researcher working on, for example, medical instrument data may only be interested in working on a small subset of data, like readings from a single scanner, and may not need access to a whole database. Projects like the Australian-led DataMINX initiative allow researchers to move data from one server to another across international boundaries, regardless of the difference in hardware, for easier sharing of information.
Virtual resources may be new to some researchers but universities have been making use of virtual machines and virtual desktops for some time to secure intellectual property rights (IPR) and save money and energy. Using virtual servers allows universities to set up and tear down pre-built servers or virtual machines very quickly, reducing the need for them to invest in new physical hardware.
With universities under pressure to improve their green credentials, use of virtualisation reduces not only the energy needed to run institutional data centres but also their carbon emissions. Moving data to larger facilities takes advantage of economies of scale and massively reduces the energy used in processes such as mechanical chilling. Middlesex University in the UK, for example, announced plans last October to cut its main data centre servers from 250 to 25 in an attempt to reduce its energy consumption by 40 per cent. Use of virtual machines can be a really good candidate for helping to lower the sector’s carbon footprint.
Admittedly, virtualised computing might represent a radical change of practice for some information services and research computing departments. It means that universities extend some of their infrastructure beyond physical hardware. It also represents choices for the researcher that go beyond the institutional walls. Research on demand is a key driver for virtualisation to create greater effi ciency: no more resources are used than are absolutely necessary, and in some cases universities and researchers need not own those resources at all but just rent what is needed. A recent Gartner report predicted that 50 per cent of current IT workloads will be running on virtual machines by the end of 2012. The route is greener, more cost effective and can also offer signifi cantly more fl exibility than traditional physical servers and hardware.
James Farnhill is e-Research programme manager at JISC in the UK