Data documentation – what and why

22/09/2016

Just as a map becomes much easier to read when you have a legend explaining the symbols, data is much easier to comprehend when you have descriptive and contextual information. For example, while you’re using your research data, you’ll know what your variable names are short for, and what different missing value codes mean. But if you want to reuse it in a few years’ time, will you still be certain? Or if another researcher uses your data, will they be able to understand it without needing to ask you questions? Maybe you’ve experienced frustration yourself, when you want to incorporate existing data into your research, but it wasn’t made available with sufficient information and you’ve had to spend time contacting the creator.

Making documentation available alongside your data is therefore a key step to ensuring its long-term usability and value, both for you and others (which might lead to extra citations). Documentation can be as simple as one text file, in which you’ve noted all the key contextual information about the data and the project. Some elements you might want to include are:

Study-level information
- Data collection methods, such as instruments used, sampling methods, scales, geographic/temporal coverage, etc.
- Data processing methods, such as validation, checking, cleaning, calibration procedures, etc.
Data-level information
- Names, labels, descriptions, and units for all variables and fields.
- Explanation of codes, including reasons for missing values (not applicable/not provided/not recorded/error/etc).

With data-level information, you may be able to include enough information in the data files themselves (e.g. in table headings in a spreadsheet). Otherwise, you might want to store a separate explanatory file alongside your data file(s). You could use our template readme.txt file to note all the necessary elements you want to include, perhaps adding to it as you work through the project.

Or you might benefit from writing what’s called a “data dictionary”, basically a legend to your data. This great blog post on data dictionaries explains it in a nutshell, and highlights the benefits, whether sharing data with others (in or outside your project team) or just your future self. And remember that if you’ve entered this information into a tool such as SPSS, you should be able to export it to a file to store with your data (e.g. in SPSS v23, File > Display Data File Information > Working File > print, and print to pdf, or use the File > Export option in many screens to export to a more usable format such as txt).

So, when depositing your data in a repository, remember to include sufficient documentation for others (or your future self!) to understand it so no questions arise.

Image: City map by Mathieu Pellerin, CC-BY-NC-SA 2.0.

Written By: Georgina Parsons

Categories & Tags:

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Discover our blogs

Data documentation – what and why

Leave a comment on this post:

You might also like…

Mastering the art of revising your writing

Thinking about your literature review?

A beginner’s guide to sourcing a company beta

Credibility, confidence and collaborative focus: The impact of studying for a sustainability apprenticeship at Cranfield

Meet Mendeley: a powerful referencing tool that does the hard work for you!

Discover our blogs

Data documentation – what and why

Leave a comment on this post:

You might also like…

Mastering the art of revising your writing

A ‘hands-on’ take on warehouse design as part of my Logistics and Supply Chain MSc

Thinking about your literature review?

A beginner’s guide to sourcing a company beta

Credibility, confidence and collaborative focus: The impact of studying for a sustainability apprenticeship at Cranfield

Meet Mendeley: a powerful referencing tool that does the hard work for you!

Sign up for more information about studying master’s and research degrees at Cranfield