Boosting transparency
What trends have you seen with experimental data?
One of the largest changes, which is still underway, is the movement towards data entry at the point of capture rather than at the end of the day. The other big trend is the increase in storage, which means that everything can be captured and analysed later.
Data quantity is a practical challenge. My past role was in genomics, where datasets can be huge and are currently kept somewhat in check since costs escalate quickly with the number of subjects. One of the challenges in that field is getting statistical power at cost, and as costs continue to fall the data storage and analysis problems continue to rise. Once you have the raw data, the bigger challenge is annotating and making it searchable. Science has to explain what it is you interpreted.
There are three big challenges that the research community is facing that digital systems can help address. First, there is research productivity. There is now much more of a focus on what kind of output we’re getting for tax dollars, foundation, or commercial investment.
Another big challenge that’s recently been a major focus is reproducibility. Studies have found that, conservatively, more than 70 per cent of certain types of studies were not reproducible. This could be for a few reasons, but the one we’re primarily focused on is transparency.
Experimental protocols can vary in their sensitivity to inputs – there are examples where people can’t get something to work unless they use a particular reagent from a particular vendor, or depend on verification of intact sample and purity after extraction – the importance of this is not always clear from the methods section of papers but should have be captured in the laboratory notebooks. If we get transparency into the process we believe we can have a significant impact on this problem.
The other thing with transparency is the lack of laboratory control of materials. For example, in the USA recently there have been cases of discovering samples of smallpox, avian flu, and other bio-hazard materials, some decades old, being found in the corners of labs. So, control of materials and reproducibility can actually go hand-in-hand.
These are the things we’re trying to address with LabGuru – help laboratories plan and organise their work, with an eye to capturing it as it is created. And, alongside this, organise the materials – both biological and consumables – to make them easy to find when scientists are running the experiments and afterwards for safety reasons.
There are some major side-benefits to our approach – when users put things into LabGuru, it tracks where they are and this immediately cuts out repeat ordering. It also shows users, for example, where and in which freezer the thing they are looking for is. A lot of duplicate ordering happens because people don’t know somebody’s got something or they don’t find it in its expected place.
What are your thoughts on open data?
The bottom line is that open data is great. It helps with reproducibility. However, going beyond the paper to the experiment presents an extra challenge to scientists. They have to record their data with an eye to how it is going to be presented and read by others. This means that they have to pay more attention to the reader who might want to reanalyse the raw data.
Like most trends, it will have to catch on in the culture. At the moment, scientists feel that their lab notebooks are their property. I hear scientists say ‘yes, people can read my notebook but they won’t know what they are reading’. This isn’t a problem for electronic systems, since virtually all of them let scientists keep their records private. But it is a challenge if we’re asking them to open them and make them part of the paper.
I’ve never heard them say that they don’t want to be scooped but l think that is a fear too. Or maybe there is something they have observed in their experiments that they want to take further themselves, so don’t want to make public. With LabGuru you can release parts of your notebook to deal with some of these cultural issues. We’re also looking at making it easier to put into things like Figshare.
Our focus is on life sciences, mainly biology and chemistry. We are fairly biology heavy right now because that is the area that has been the least addressed. As far as organising data, there isn’t a massive difference between subjects, but you do have to understand the type of data. LabGuru is a fairly open system. The user organises it how they want and we normalise the backend.
What are the challenges here?
Having data and information in a system other than the notebook is a challenge. Laboratory management systems, for example, don’t generally create a snapshot in time; they tend to present more of a dynamic picture as it is. Things like calibration dates of instrument might be kept or they might not, and have to be synchronised with when the data were generated.
You have one system that is your authority. Sometimes it is easier to snapshot data into the notebook if the dataset or other record is not huge. Where the data is huge, it makes more sense to link and make sure that the repository where the data is is maintained to same standard. But that should be done judiciously since it adds cost and maintenance complexity.
I think there will be a move to fewer, more capable applications that are integrating various disciplines and elements of the lab process. I think there will be an increase in the amount of data stored and shared – in fewer, more capable tools, with not so many silos.