A messy world: managing research data matters

Jude Towers and David Ellis from Lancaster University discuss data practices in the evolving world of 24/7 data generation and what this means for robust research

What counts as data?

When asked, most people could have a good go at explaining ‘data’. Ask a researcher and the answer (nearly) always starts with’…well it’s complicated…’ In arts, humanities and social sciences we have a particular challenge when we ask ‘what are or what counts as data?’ because of the persistent assumption that data are quantitative, rather than data being (any) information which is used to progress research, whether that be survey, scraped from the web, interviews, or simply a way of conceptualising existing knowledge.

But, whether quantitative or qualitative, ‘data science’ or ‘social science’, cross-disciplinary conversations are desperately needed. We need to talk about the collection, use and management of data; how to systematically explicate and take account of the strengths and limitations of data; and to develop strategies to ensure that the research underpinned by data is ethically and intellectually robust. The siloing of different forms of data within specific academic disciplines is increasingly problematic – especially as data and its use (or abuse) becomes increasingly central to our social lives and to societal evolution. One only has to think about the potential impact of the Cambridge Analytical scandal on democracy, or the radical changes to the concept of ‘privacy’ being instigated by the Internet of Things.

Messy data

Some research data are relatively less messy, data generated by clinical trials or experimental physics for example, while other forms are extremely messy. Any measurement or data source contains some element of noise, but this is often well understood and acknowledged in advance. However, data originally collected for non-research purposes and then retrospectively used, brings a completely new set of problems.

Data from our everyday lives and digital existence are regularly collated, anonymised and shared with academics for research purposes (and sometimes with others, for other purposes!). These data were historically collected or logged with another purpose altogether. This includes administrative health data, police recorded crime, social media profiles or geolocation tagging. So when these data are then used for research, let’s face it, the research process becomes (even more) difficult.

Research, conducted using these data is vital for helping inform evidence-based public policy and practice, but to do so robustly and transparently, we must think explicitly about how these data are collected and made available for research. What the implications of these practices and protocols are on research practices and research findings? Without this scrutiny, the research is null and void.

We need such scrutiny to be systematically embedded within everyday research practices, in a way which breaks down the silos and enables researchers across disciplines to build data practices which are accessible, interoperable and reusable (FAIR). Without these criteria, research projects such as Imperial College London’s use of data from a network of 10,000 phones, speeding up cancer research while we sleep, risk remaining science fiction, rather than being current science.

Why now?

Researchers, students and the wider public are increasingly asking questions about the value of research and the ‘ethics’ of the hitherto unimaginable scale of the collection and use of (our) data.

There is a growing movement amongst researchers (and policy makers) that, at least publically funded research must be shared as widely as possible: it is not enough for individual researchers or individual university departments to ‘do the right thing’ or ‘do things right’: we need a collective and consistent approach. This is partly what a Jisc initiative sponsoring a group of Research Data Champions in UK universities has been supporting. As data champions, we are working together to find innovative and effective ways to develop robust data usage and management practices and protocols, to share good practice, and to maximise the positive impacts of research for progressive and positive social change.

At Lancaster University, for example, staff within psychology have established an informal support group (PROSPR) which aims to promote open science practices within the department, and beyond.

Similar movements to maximise the impacts of research data require more than just researchers to be developing a shared understanding and common language. In universities, our research data managers are a vital part of this process too. The Research Data Shared Service from Jisc has been piloting approaches to help support the cataloguing of data that is being produced in universities and is exploring ways in which these resources can best be shared, to enable greater value from both data and research. Whether reseach data is very messy or less messy, robust research calls for FAIR data, and we all have our part to play.

Jude Towers is doctor of applied social statistics, lecturer in sociology and quantitative methods, associate director of the Violence and Society UNESCO Centre, lead for the N8 Policing Research Partnership Training and Learning strand, Jisc-sponsored data champion, and holds graduate statistician status from the Royal Statistical Society. David Ellis is a lecturer in computational social science, also from Lancaster University, is a 50th anniversary lecturer in the Department of Psychology; an honorary research fellow at the University of Lincoln and a Jisc-sponsored data champion.

Twitter icon
Google icon icon
Digg icon
LinkedIn icon
Reddit icon
e-mail icon

John Sack, founding director of HighWire Press, describes some key moments during his time in the industry