When you submit a paper for publication, you often provide supplementary information including the data used in the research. It’s important to make this data available for the paper’s readers, to provide the evidence for your findings and allow them to better understand your research and build on it. But often it is not appropriate to submit this data as supplementary information files to the publisher’s website – the data should be submitted to a data repository (such as CORD). Why?
- Supplementary information is usually made available in pdf form, which makes the data pretty non-editable. Have you ever read a paper and wanted to reuse their figures, and ended up tracing a graph or typing out a table that wouldn’t copy and paste from the pdf? This is one reason it is much better to provide the data in a reusable, editable format (such as csv, txt, rtf, or even xlsx if necessary) on a data repository.
- It is often derived data that is used as supplementary information, such as graphs and charts reporting selected values. However, it is the full primary dataset that is most useful to others and that allows your results to be properly validated. Graphs are certainly useful in demonstrating findings, and appropriate to include in your publication, but the full raw data should be shared via a data repository.
- Indeed, it is the full dataset that is most useful to you yourself in future, and by depositing it to a data repository, you’re assured of its long-term preservation and retrievability. Where the repository has archival storage and carries out digital preservation (both of which we are working hard to implement with CORD), you’ve got a much better chance of accessing and reusing this data should you need to in ten years’ time.
- On a similar note, a repository requires a licence to be assigned to the data, and then it is reusable according to those terms. The situation is less clear for supplementary information, so only publishing it there might reduce its reuse value. Is the article open access and is the supplementary information equally open access? Does it have a licence or did the publisher request transferral of copyright, and do they allow others to reuse and redistribute this data?
- The data may even be more interesting than the paper! Perhaps there are other uses for the data and researchers may want to cite your data, though they did not use your article. On a repository, your data gets a DOI and is citable with metrics available for its use (CORD gives you view and download figures, citation counts, and altmetrics).
So whilst we’re not advising you to stop using supplementary information, it is good to consider what data you have used in your article, and whether the most appropriate place for it is actually a data repository. Of course, don’t forget to link to it from the article in your data access statement (internal link) or references.
Image: Dice five, CC-BY-NC-SA 2.0