We recently held a small meeting to discuss the teaching of good data management practice to graduate students at the University of Oxford, which at present happens only to a very limited extent. Allowing for our shared roles, those attending included researchers at all levels (a graduate student, a research fellow, two research group leaders), three teachers (from an academic department and the University’s Computing Services IT Learning Programme), and two professional project/data managers. We were all members of Oxford University with the exception of Jez Cope (ICT Project Manager, Centre for Sustainable Chemical Technologies, Department of Chemistry, University of Bath), who had been invited to attend after expressing interest in this area to me during a recent JISC Managing Research Data conference. Some important ideas emerged about data management planning and data management plans (DMPs), particularly as they relate to graduate students, that I will try to encapsulate here.
First, appeals to altruism – the virtues of data publication for the greater good of science – are unlikely to succeed. Data management training should first and foremost emphasise the benefits that good data management can bring to the students themselves, for example in terms of being better organized, working more efficiently, and being more able to assemble the right data quickly for inclusion in figures and tables during later thesis and article writing. Such training should also emphasize the potential dangers of data loss if students do NOT make plans to manage – and particularly to back up – their data, employing salutary horror stories, such as the one described in the previous blog post, and photos of burned out computers after a laboratory fire, to illustrate the point.
While this discussion was undertaken in the context of graduate students, it was agreed that the following conclusions applied in equal measure to all researchers. In order of decreasing importance, the issues of relevance surrounding data management were seen to be:
- Benefits of data management and data backup – “What’s in it for me?”
- Determining the intrinsic value of data – “How do I decide what data to keep and what to discard?”
- Issues of data confidentiality and data theft – “How do I avoid being scooped?”
- Issues of data publication and data citation – “How can I get personal credit for spending time on data management and for data publication?”
- Issues of data ownership – “Do I own my data, or does my supervisor or the University?”
- Administrative issues, such as compliance with institutional and funders’ requirements – “What is the minimum I need to do to get funded?”
While these might not be considered the ideal questions for researchers to be posing, from the point of view of those of us interested in open data publication, it was agreed that this was the reality of the situation on the ground (or rather, at the lab bench).
For this reason, it was felt that current data management planning tools had the wrong emphasis, being constructed from the ‘top down’ viewpoint of an institution or a data manager, in a way that was likely to be off-putting to the person completing the plan, rather than stressing initially what was of central importance to the researchers themselves, thereby increasing relevance and gaining their enthusiastic engagement.
For example, the DCC’s DMPonline tool, designed to create DMPs that will accompany grant applications, starts by asking the researcher to input information about funder’s requirements, institutional guidelines and other policies, and then proceeds through questions about data types, intellectual property rights, and altruistic data sharing and re-use. Finally, after 42 other questions (in the DCC’s generic DMP), the researcher is finally asked:
- Where (physically) will you store your data?
- How will you back-up your data?
- How regularly will back-ups be made?
It was thus thought that, in order to achieve widespread uptake of proactive data management planning across the university, the DMP questions that we would use need to be designed, on basis of the preceding points, to more clearly address those issues of primary concern to researchers, not only from the point of view of the Principal Investigator, but also that of the lowly graduate student.