Twenty Questions for Research Data Management

A Web  entry form that permits creation of a data management plan using these questions is now available at http://www.miidi.org/dmp/.

[Notes: These questions were revised on 22 March 2012 and again on 11 June 2012.   Further changes to improve the clarity of the questions were made on 9 May 2013  – see Footnote.  

This document is also available as a Word file from  http://imageweb.zoo.ox.ac.uk/pub/2012/publications/Shotton-Twenty_Questions_for_Research_Data_Management.docx.]

These twenty questions are designed to prompt and assist your thinking, as a research student, a postdoc or an academic researcher at the beginning of a research project, and to form the basis of a workable research data management plan that can both guide your on-going data management activities and inform others about the nature and availability of your research data.

They will help you determining how best to safeguard your data from loss, how to describe your datasets in ways that assist both yourself when returning to them in the future and others in their subsequent interpretation, and how to publish your data in ways that maximize their usefulness to others and bring maximum academic scholarly credit to yourself, to reward your efforts in acquiring, analysing, describing, interpreting and publishing them in the first place.

You may not have immediate answers to all these questions.  But, by seeking advice from your research supervisor, colleagues and others in your institution with responsibilities for data management, you should endeavour to discover them.  Then, once in a while, you should revisit these questions and see whether your data management practices can be improved, updating your answers.

More detailed data management planning questions are available online, and a comparison of those with these Twenty Questions will be the subject of a subsequent blog post.

The nature of your data

1       What is the general subject discipline (domain, field) to which your research data relates?

Possible responses:

  • Quantum physics.
  • Cell biology.
  • Ornithology.

2       What is the exact nature (range, scope) of your research data?

Possible responses:

  • Long-distance quantum communication using entangled photons.
  • Protein chemistry and electron microscopy of cell membrane proteins.
  • Video field recordings of avian behaviour, and their quantitative analysis.

3       Who will own the data arising from your research, and the intellectual property rights relating to them?

Possible responses:

  • Myself alone.
  • Myself and my research group leader.
  • My university.

4       If you know at this stage, specify in what format(s), will you store your data in the short term after acquisition?

Possible responses:

  • Questionnaire response data will be stored on my laptop in a Microsoft Office Access 2007 database.
  • Raw video recording on digital video tapes on the shelf above my desk, edited videos in .mov format on my laptop. numerical analyses in a spreadsheet (Microsoft Office Excel 2007 format) on my laptop.
  • Numerical analyses in a spreadsheet (Microsoft Office Excel 2007 format) on my laptop.
  • On my research group’s cloud-based secure DataStage research data file store, in Zeiss confocal 3D image format.

Date descriptions, so that someone else can understand what the data are about (i.e. metadata, “data about data”)

5       When and where will you describe each of your research datasets?

Possible responses:

  • The only description will be the filenames on my hard drive.
  • I will describe the data using handwritten notes in my lab notebook if and when I have time, after the experiments have been completed – hopefully I’ll be able to remember all the details.
  • I will describe the data using the column and row labels in my spreadsheets after the data have been analysed.
  • I will create descriptive metadata for each dataset as I create/acquire it, and will save these descriptions with my datasets on my hard drive.

6 How will descriptive metadata be created or captured?

Possible responses:

  • Instrument metadata are automatically included in each data file.
  • I will create a title and short textual description for each dataset using the supplied submission interface when I submit the dataset to my university’s data repository.
  • My data descriptions will be saved in spreadsheets or word processor documents.
  • I will create rich metadata conforming to a Minimal Information Standard appropriate to my research field will be recorded at the time of data acquisition, using a metadata entry form to ensure I don’t miss any essential information.  This metadata file will be saved locally with my dataset, and eventually will be deposited with the dataset when it is submitted to a data repository.

Data sharing and publication

7       With whom will you share your research data in the short term, before publication of any papers arising from their interpretation?

Possible responses:

  • My research supervisor only.
  • Members of my research group and trusted external collaborators.
  • Anyone who asks for them.
  • Everyone, by publishing the data online, since our research community is committed to the rapid sharing of research results.

8      For how long will you embargo your research data before it is published for others to see and use?

Possible responses:

  • We will allow immediate public access to the data.
  • For one year, to permit us to exploit our hard-won research results.
  • Until the journal article describing our results has been published.

9      Why is public access to your research data to be restricted (if indeed it is)?

Possible responses:

  • We intend to make a patent application, and must avoid prior disclosure.
  • Don’t want to make locations of members of endangered species available to poachers.
  • The research data are confidential because of the arrangement my research group has made with the commercial partner sponsoring our research.
  • My data form part of a long-term study upon which my research group is entirely reliant for its on-going research publications and academic reputation.  We only share this with trusted colleagues.
  • Confidential human patient data.
  • Questionnaire data collected in confidence from individuals – anonymized averaged data will be published.

10      Under what data-sharing license will you publish your research data?

Possible responses:

  • What is a data-sharing license?
  • Under a Creative Commons Open Data CC Zero public domain dedication and waiver, since my research data are not covered by copyright.
  • Using a Creative Commons Attribution License, since my image data are covered by copyright.

11      What persistent identifier will be used to permit correct citation of your datasets?

Possible responses:

  • This URL: http://****.
  • A Digital Object Identifier (DOI).
  • The accession number for the dataset issued by the European Bioinformatics Institute database to which the dataset is submitted.

12      What metadata will be published with the data to make them interpretable and reusable?

Possible responses:

  • I will expect users to be able to interpret the column and row labels in my spreadsheets.
  • The dataset will be described in the journal article we will publish, but will have no other metadata beyond those required by the repository for data citation: Author, Date, Title, Source, Identifier.
  • An XML metadata file created in conformance with a Minimal Information standard will be submitted to the repository as part of the data package, along with the data files.

Data storage, backup and archiving

13       Where will you store your data in the short term, after acquisition?

Possible responses:

  • On my laptop.
  • On the computer connected to the microscope.
  • On my research group’s DataStage filestore.

14       Who is responsible for the immediate day-to-day management, storage and backup of the data arising from your research?

Possible responses:

  • Myself alone.
  • My research group’s data manager.
  • Our departmental IT staff, who manage our research group’s DataStage research data management system.

15      How frequently will your research data be backed up for short-term data security?

Possible responses:

  • Whenever I remember to do so.
  • Nightly, using our research group’s DataStage research data management system connected to the University’s automated backup service.

16      Where will your research data be archived for long-term preservation?

Possible responses:

  • Selected data will be included in the figures and tables of research papers published by my research group, but we have no plans to archive and publish the full datasets.
  • As supplementary files attached to my journal articles on the publisher’s web site.
  • In the University’s DataBank data repository, run by the library service.
  • In appropriate genomics databases run by the European Bioinformatics Institute.

17      When will your research data be moved to a secure archive for long-term preservation and publication?

Possible responses:

  • Our research data are already securely stored in an institutional data server.
  • Nightly.
  • Upon completion of each set of experiments.
  • When my research group leader decides it is appropriate.
  • Immediately after publication of my thesis.
  • Upon submission of our Nature paper, so that the data are available for reviewers.

18      Who will decide which of your research data are worth preserving?

Possible responses:

  • Myself alone.
  • Myself, in consultation with my research supervisor.
  • My research supervisor alone.

19      How (i.e. by what physical or electronic method) will you transfer your research datasets to their long-term archive, under the curatorial care of a separate third-party, e.g. a data repository?

Possible responses:

  • On physical hard drives that I will bring back from my field site by air.
  • By e-mailing files to our librarian.
  • By completion of the selected data repository’s Web-based submission form and uploading of the data files over the Internet.
  • By use of a local data management system such as DataStage that can automatically package and submit data files to the selected repository.

20      Who will be responsible for your data, once you have left your present research group?

Possible responses:

  • At this stage, I have no idea.
  • I’ll take my data with me and maintain responsibility.
  • My supervisor will make appropriate arrangements.
  • I hope the journal will maintain access to the supplementary information files associated with my article.
  • My University will assume long-term responsibility for the data I have chosen to preserve in its data archive.

– – – – –

Notes

Creative Commons: Creative Commons is a non-profit organization that has developed a legal and technical infrastructure for the licensing of copyright material and data in a standardised and machine-readable manner, thereby facilitating open publication, sharing and innovation in the digital age.

DataStage and DataBank: DataStage is a simple research data filestore and repository data submission system, designed for deployment at the research group level.  DataBank is a data repository for archiving and publishing research data, designed for deployment at the institutional level.  Both are open-source services for local or cloud deployment developed together at Oxford University within the JISC University Modernization Fund DataFlow Project, and both are now available for third-party installation and use.  

European Bioinformatics Institute:  The European Bioinformatics Institute (EBI) houses Europe’s primary databases for molecular sequence data, genomics and bioinformatics, and shares data daily with similar institutions in the United States and Japan.

Minimal Information Standards for life science research specify minimal metadata requirements for certain types of research data, are integrated by the MIBBI Project (Minimum Information for Biological and Biomedical Investigations), and are described in Reference [1].

Reference

[1]      Taylor et al. (2008). Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project. Nature Biotechnology 26 (8): 889-896. doi:10.1038/nbt0808-889.

– – – – –
Footnote:

These questions were revised on 22 March 2012, two weeks after they were first published, to simplify the wording, to remove some redundancy between questions, and to split compound questions into single questions. To keep the total number of questions to 20, two questions about when data would be collected and analysed have been removed.  The remaining twenty questions have been slightly re-ordered.

Following suggestions by Sally Rumsey of the Bodleian Library, minor revisions were then made on 11 June to the text of questions 5, 6 and 18, and to the possible responses for questions 14 and 18, in order to add clarity and remove ambiguities.  Question 20 was also moved to position 15, and the subsequent questions re-numbered (s0 that Question 18 is now Question 19, etc.).  The Notes were also edited to update the information on DataFlow and to delete the description of SWORDv2, considered to be too specialized.

On 9th May 2013, some questions were slightly changed in wording, and others swapped in position and renumbered to make the flow of questions more logical, to match changes to the online data entry form at http://www.miidi.org/dmp/.  Question 3 was swapped with question 4, and questions 8-12 were swapped with questions 13-20.  Some of the exemplar responses were also revised to make them more useful.

A list of the original questions follows.

Original Twenty Questions published on 7 March 2012

1        What is the subject discipline (domain, field) to which your research data relates?

2        What is the exact nature (range, scope) of your research data?

3        When will your research data be collected?

4        When will your research data be processed and analysed?

5        Who owns the data arising from your research, and the intellectual property rights relating to them?

6        How will your research datasets be described, i.e. with what metadata or accompanying interpretive information will they be accompanied, and how will these metadata be created?

7        Where, and in what format(s), will you store your data in the short term after acquisition?

8        Who is responsible for the immediate day-to-day management, storage and backup of the data arising from your research?

9        How frequently and where will your research data be backed up for short-term data security?

10       With whom will you share your research data in the short term, before publication of any papers arising from their interpretation?

11       Why is access to your research data to be restricted in the short term (if indeed it is)?

12       To whom will you provide access to your research data in the long term, with what limitations as to re-use, and under what license arrangements.

13       Why is access to your research data to be restricted in the long term (if indeed it is)?

14       How (i.e. by what physical or electronic method) are your research datasets to be transferred from short-term storage under the local care of yourself or your research group to their long-term archival and Web publication destination under the curatorial care of a separate third-party, e.g. a data repository?

15       Where will your research data be archived for long-term preservation?

16       When will your research data be moved from your own local storage to a secure archive for long-term preservation (e.g. your institutional library’s data repository)?

17       Who has authority to decide which of your research data are NOT worth preserving and will be deleted?

18       Where will your research data be published for others to see?

19       When will your research data be published in this manner?

20       To whom will responsibility for the long-term preservation of your research data devolve, once you have left your present research group?

This document is licensed under a Creative Commons Attribution 3.0 Unported License.

Advertisements
This entry was posted in JISC and tagged , , , , , , , , , . Bookmark the permalink.

19 Responses to Twenty Questions for Research Data Management

  1. Pingback: Oxford DMPonline Project update (March 2012) | Creating data management plans online

  2. Pingback: DMP questions – description and alignment | Open Citations and Semantic Publishing

  3. Pingback: DMP questions – description and alignment | Creating data management plans online

  4. Pingback: DMP questions – comparisons and conclusions | Open Citations and Semantic Publishing

  5. Pingback: DMP questions – comparisons and conclusions | Creating data management plans online

  6. Pingback: RDM audit and project benefit metrics » University of Hertfordshire - Research Data Toolkit

  7. Pingback: iridium – early findings on research data management planning (approaches, tools and writing plans) « iridium

  8. Who has authority to decide which of your research data are NOT worth preserving and will be deleted?

    • davidshotton says:

      Clearly the researcher who created the dataset has primary responsibility during the course of a research project for deciding what research data are not worth preserving. If I create an electron micrograph that is blurred by astigmatism, or shows nothing of interest, there is no scientific reason to preserve it – it should be scrapped. However, some years down the line, when the researcher has moved institutions, retired or died, it becomes the responsibility of the institution presently curating the data to decide whether it is still worth keeping. Because of the falling cost of storage, and the many examples of the serendipitous re-use of data in unforseen contexts, the default position should always be to preserve!

  9. Pingback: Surveying our researchers « datamanagementuel

  10. Pingback: Data management planning: resources and guides | Research Support Hub

  11. Pingback: Writing DMPs: some personal reflections « RDM Insight

  12. Pingback: 23 Research Things @ACU » Blog Archive » 15: managing research data

  13. Pingback: 20 questions pour la gestion des données | Recherche d'ID

  14. Pingback: L’oiseau bleu en Suisse : les données de recherche dans le cadre du projet Helve’Tweet | Recherche d'ID

  15. Pingback: Un DMP? Pour quoi faire? | Recherche d'ID

  16. Pingback: Data management plan : un avenir pour les données de la recherche | Recherche d'ID

  17. Pingback: Le DMP en pratique | Recherche d'ID

  18. Pingback: Gestion des risques informationnels: revue de la littérature et gestion des données du projet | Recherche d'ID

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s