This blog post described various sets of data management planning questions from different sources and their alignment in a spreadsheet to permit their comparison. The following post will compare these different sets of data management planning questions, and will suggest conclusions that can be drawn from such a comparison. Subsequently, I will compare various software tools that might be used to create data management plans (DMPs) based on such questions.
These discussions should be considered in the context of true purpose of data management planning: not to fulfill the requirements of research funders, but to better manage the research data flowing from research projects – which is the funders’ aim in the first place when requesting submission of data management plans to accompany grant applications.
As an exercise in ‘drinking my own champagne’ when applying to the JISC for funding of the Oxford DMPonline Project in July last year, I used the Digital Curation Centre (DCC)’s DMPonline Tool to create a data management plan (DMP) to accompany my grant application. Here is the first paragraph of that eight-page plan, the full version of which is available here.
In response to this first exposure to the DCC’s questions, I had the following thoughts:
That the whole questionnaire was longwinded, often requiring discursive answers, e.g.
Question 2.4.2 “How will you manage integration between the data being gathered in the project and pre-existing data sources?”
Question 2.5.5 “Why have you chosen particular standards and approaches for metadata and contextual documentation?”
That its primary point of view was that of a data administrator, not a researcher, e.g.
Question 1.3.3 “(Define) Other policy-related dependencies.”
Question 6.4.2 “In the event of the long-term place of deposit closing, what is the formal process for transferring responsibility for the data?“
Question 7.1 “Outline the staff/organisational roles and responsibilities for implementing this data management plan.“
Question 8.1.1 “How will adherence to this data management plan be checked or demonstrated?“
- That researchers were unlikely to know the answers to several of the questions, and would thus have to invest considerable time and effort consulting experts in data management to discover them.
- That, even discounting this, it would a longer time to complete the many detailed questions asked that most researchers would be prepared to give to the task.
- That the eight-page output that contained my own DMP was far longer than could normally be submitted to accompany a grant application – funders typically have a limit of one or two pages for DMPs.
- Thus, even if the DMP was downloaded in an editable format, rather than the default PDF format, substantial further work would be required to change the DMPonline Tool output into a submissable data management plan.
I concluded that the average researcher would not find the DCC’s DMPonline question set easy to use, and that uptake was likely to be limited (see Footnote).
Others’ experiences when using the DCC’s DMPonline Tool have been similar to my own. Jez Cope gave different groups of University of Bath graduate students one hour to complete the DCC’s DMPonline questionnaire, or one of three alternative question sets. He reports at http://blogs.bath.ac.uk/research360/2012/03/rdm101-data-management-planning/ on their reaction to using the DMPonline Tool:
- “The students were immediately put off by the amount of detail they were asked to input”
- “On a positive note, they definitely felt that this was the most comprehensive template!”
- “None of the students using this template got anywhere near to finishing the questions within the time.”
- “The students reported that very little of what they were being asked felt relevant to them.”
- “They said that for at least some of the questions it was difficult to understand what they were being asked for.”
Twenty Questions for Research Data Management
Recently, many weeks after I had completed the DMP submitted with my Oxford DMPonline grant application, when I had forgotten the specific questions asked by the DCC’s DMPonline tool, but still retained an uneasy feeling that they were too detailed, I sat down with a blank sheet of paper to write down the most important questions I could think of concerning research data management, from the researcher’s point of view. I reckoned that most people could manage twenty questions without giving up, particularly if I also provided some short example responses.
In my first draft, the resulting Twenty Questions for Research Data Management were arranged under the six headings What? Where? How? When? Who? and Why? Jez Cope kindly gave that question set a test drive with another group of his University of Bath graduate students, at the same time as the first group were answering the DCC’s question set. In the same blog post, he reports these students’ reactions to Twenty Questions for Research Data Management:
- “These questions were considered to be mostly relevant and easy to understand, and the students had no problem completing them in the time available. The example responses made it easier to understand what was required for each question.”
Jez gave me valuable feedback about the poor ordering of that original draft of the Twenty Questions, due to the constraints imposed by the headings I had chosen:
- “Because they were arranged under What, Where, etc., the students found it difficult at times to see how the questions related to each other. Perhaps because of this, the students were undecided as to whether it (the question set) was comprehensive enough.”
Since then, I have put the Twenty Questions through two revisions. The first re-ordered them into a more logical progression, under the new headings:
- The nature of your data
- Date descriptions (metadata, “data about data”)
- Data sharing
- Data storage and backup
- Data archiving
- Data publication
- Future data management
The second revision simplified the wording, removed some redundancy between questions, and split compound questions into single questions. To keep the total number of questions to twenty, I removed two questions about when data would be collected and analyzed, that I considered of lesser importance. The resulting Twenty Questions for Research Data Management, with examples of possible responses for each question, were published in an earlier blog post, and are available as a downloadable Word file here.
Different sets of data management planning questions
Having completed my own Twenty Questions for Data Management Planning, I thought it would be useful to compare them side-by-side with the other English-language question sets of which I had knowledge:
- The DCC DMPOnline Checklist dated 2 Feb 2012, kindly sent to me by Martin Donnelly in the form of an Excel spreadsheet. (A PDF version of this is available here.)
- Two versions of the questions used in the DMP Tool developed in the USA to create DMPs tailored for US funding agencies) – I chose to use those questions suggested for NSF-BIO and for the NIH, which were fewer in number.
- The DataTrain Questions for Post-Graduate Research Projects, available from the Archaeological Data Service here.
- Jez Cope’s own draft set of University of Bath Questions for Post-Graduate DMPs, available here.
The largest corpus of questions, from the DCC, are individually numbered, and structured under the following headings:
- Metadata about the DMP itself (unnumbered questions)
- Section 1: Introduction and Context
- Section 2: Data Types, Formats, Standards and Capture Methods
- Section 3: Ethics and Intellectual Property
- Section 4: Access, Data Sharing and Reuse
- Section 5: Short-Term Storage and Data Management
- Section 6: Deposit and Long-Term Preservation
- Section 7: Resourcing
- Section 8: Adherence and Review
Since they don’t actually ask questions, I omit from the following discussion the final sections of the DCC corpus:
- Section 9: Statement of Agreement
- Section 10: Annexes
Re-ordering the DCC’s DMPonline questions
To enable subsequent comparison between different question sets, I first re-ordered the DCC’s questions so as to provide a clear separation between three different types of question:
- First those seeking metadata about the data management plan itself.
- Then those eliciting information about the research project to which the plan relates.
- Finally those questions that are concerned with managing the data per se.
To these, since it was required by one of the other question sets, I added a fourth category into which I moved two of the DCC’s questions:
- Questions about related documents.
With expanded headings, the new order of the DCC’s questions used in my alignment spreadsheet is as follows:
Questions about the data management plan
- Personal details of the plan creator (DCC questions lacking a number)
- About this data management plan (DCC Section 1.4)
- Data management resourcing (DCC Section 7)
- Plan adherence and revision (DCC Section 8)
Questions about the research project
- Research project details DCC question 1.1.4)
- Project participants (DCC questions 1.1.5, 1.1.6 and 10.1)
- Research funding (DCC questions 1.1.2 and 1.1.3)
- Data management policies (DCC Section 1.3)
Questions about the research data
- Research area (DCC Section 1.2)
- The nature of your data (DCC Sections 2.1 to 2.4)
- Creating data descriptions (metadata, “data about data”) (DCC Sections 2.5 and 3)
- Data sharing – person to person (DCC Section 4)
- Short-term data storage and backup (DCC Section 5)
- Data archiving (DCC Sections 6.1 to 6.2.5)
- Data publication (DCC Sections 6.2.6 to 6.4)
Questions about related documents (DCC questions 6.3.3 and 6.3.4)
Alignment of the data management planning questions from different sources
I then entered the questions from the other sets into different columns in the Excel spreadsheet, re-ordering them against the DCC questions so that similar questions were aligned across the page in the same row. This revealed which questions were common to several sets, and which were unique.
The Excel spreadsheet containing the aligned data management planning questions can be downloaded from here.
A comparison of the question sets was made easier by this alignment, and is described in the following blog post.
[Footnote: Adrian Richardson reported on 23 March 2012 that there are some 1000 DMPs now lodged in the DCC’s database, although some of these, like six of the seven DMPs saved under my own name, are likely to be test submissions to try out the system, rather than genuine DMPs.
For those DMPs designed to accompany grant applications, we need to put this number into context. In the most recent year for which statistics are available, AHRC received 957 grant applications, BBSRC 1,832 applications, EPSRC 2,568, ESRC 905, MRC 1,377, NERC 1,361 and STFC 415, giving a current annual total of 9,415 RCUK grant applications. I was unable to find current figures for the number of applications funded by UK medical charities, but these are likely to add at least 2,000 to the annual total.]