LibGuides: Research Data and Reproducibility: Sharing Data

Preparing to share

Effective data sharing requires it to be organized, well-documented, and appropriately preserved. Human subjects research requires the informed consent of study participants to share data. Make sure data sharing was mentioned in your IRB and informed consent forms. It is also crucial to de-identify data prior to sharing to reduce the risk of identifying individuals in datasets.

Guidance on the HIPAA Privacy Rule in Research
Details the HIPAA Privacy Rule, which outlines the circumstances under which covered entities may use or disclose protected health information for research.
Preparing raw clinical data for publication
Many peer reviewed journals now require authors to be prepared to share their raw, unprocessed data with other scientists or state the availability of raw data in published articles, but little information on how such data should be prepared for sharing has emerged. Iain Hrynaszkiewicz and colleagues propose a minimum standard for de-identifying datasets to ensure patient privacy when sharing clinical research data in this article from the BMJ.
J-PAL Guide to De-Identifying Data
Researchers who plan to publish data on human subjects should take careful steps to protect the confidentiality of study participants through data de-identification—a process that reduces the risk of re-identifying individuals within a given dataset. This guide provides further details on the de-identification process, including various procedures for de-identifying a dataset, a list of common identifiers that need to be reviewed, and sample code that can be used to de-identify data intended for publication.
Guide for De-identifying Qualitative Research
A guide from the Qualitative Data Repository that discusses different types of potential identifiers and how to deal with them when sharing research data.

When you share your data, you need to choose a license. A license tells others exactly how they can use your data and how to give you credit.

Licensing data is different from licensing other open access materials. Because datasets are often combined, reused, and built from many sources, requiring detailed attribution can quickly become complicated and it can make your data harder to reuse. To avoid these issues, many researchers choose a license that doesn’t require attribution, such as CC0 or the Open Data Commons Public Domain Dedication and License (PDDL). These licenses make it easier for others to reuse your data without legal uncertainty.

The resources linked below can help you understand what needs to be considered when licensing your data.

Contact Technology Transfer if you have intellectual property questions about your data.

Contact the Library if you have questions about specific licenses.

Licensing Open Data: Resources and Practices
A very thorough discussion of data licensing by Peri Weisberg of DataSF.
DCC: How to License Research Data
This detailed guide from the Digital Curation Centre (DCC) covers the rationale for licensing data, licensing concepts, types of licenses, and mechanisms for licensing data.

Open Data Commons
Open Data Commons is the home of a set of legal tools and licenses to help you publish, provide and use open data.
Public Domain tools from Creative Commons
Information about CC0 and the Public Domain Mark, two popular ways to identify datasets in the public domain.

Where can data be shared?

Domain Specific Repositories

The NIH supports a large number of domain-specific data sharing repositories. These repositories are described in two lists: one for repositories that allow open submission and access and one for repositories that may restrict submission and access to specific researchers. If available, best practices and many policies dictate that data should be shared via domain-specific repositories.

Generalist Repositories

The repositories listed below accept datasets from all research disciplines and are appropriate when a domain-specific repository does not exist. They also accept deposits of other scholarly outputs, such as preprints and software. Consider consulting with generalist repository comparison chart produced by the NIH to help you decide which repository is right for your data. The Library also offers data sharing consultations where you can receive expert guidance on where to share your data!

Zenodo: general-purpose open-access repository operated by CERN. It allows researchers to deposit research papers, datasets, research software, reports, and many other types of research outputs.
Figshare: online open access repository where researchers can preserve and share their research outputs, including figures, datasets, images, and videos. It is free to upload content and free to access.
Open Science Framework (OSF): open source software project that facilitates open collaboration in science research. It can be used for both research data management and research project management.
Harvard Dataverse: free data repository open to all researchers from any discipline, both inside and outside of the Harvard community, where you can share, archive, cite, access, and explore research data.
Mendeley Data: open repository for sharing research data and a search engine that indexes both domain-specific and cross-domain data repositories.
Dryad: international open-access repository of research data. It is free to access, but submission may involve a Data Publishing Charge (DPC).
Vivli: global clinical research data sharing platform from the Center for Global Clinical Research Data.