To brighten up the return to work on a cold, grey January morning, I delivered a surprise figshare cup to Dr Robert Grabowski in SWEE: a prize for a fantastic example of best practice in data publishing. This starts my new year’s resolution to write about some of the items deposited to CORD, highlighting the ones that have caught my eye for being particularly well created.
The first item in question is a dataset on invertebrate assemblages at https://doi.org/10.17862/cranfield.rd.7539305. As fascinating as that is, I didn’t choose it for the subject matter, so what’s so great about it?
- It’s very clearly described. The data is split into several files and accompanied by “description.txt” and “readme.txt” files to explain them, including an explanation of acronyms and file name coding. This means users can understand the data very efficiently, and not waste any of our time asking questions to clarify anything uncertain.
- The files are in open formats. Using csv and txt file formats means that these files are likely to be openable in many years to come. Had they been in Excel and Word formats, for example, they would rely on specific software at particular versions. The researchers themselves can be confident of still being able to access this data themselves in ten years’ time – as well as enabling access to the broadest possible audience, as there are no software barriers for others.
- It’s under embargo because it underpins a journal article that is still in press. You can’t see the files yet, because this is a dataset used in a paper, so the data should ideally be released at the same time as the article is published. This is difficult to coordinate so there are several approaches. In this approach, the record is published but the files are embargoed. This means that the DOI is live and when it’s used in the paper, peer reviewers can click through and there are no concerns about inactive links. If the article is published before the embargo ends, it’s easy for the researchers (or I) to remove the embargo to ensure the data is public as soon as possible.
Whilst Bob got the free cup, the dataset was actually prepared by a recent student, Chiara Magliozzi, who should share the credit! It was great to hear how easy they found using CORD, simply dragging and dropping the files over. A private link can be generated for all draft items to enable co-authors to preview them, which is a great feature for when your collaborators aren’t at Cranfield.
Everyone’s doing a great job publishing data on CORD, so keep an eye out for more posts in case I pick your next item as a favourite! Feel free to share if you have your own favourites and reasons for choosing them, too.
Georgina Parsons, your Research Data Manager and Deliverer of Figshare Cups