Data sharing is the practice of making research data available to others. An integral part of open science, many funding agencies, publishers, and institutions have policies requiring research data sharing.
Effective data sharing requires it to be organized, documented, and preserved. Human subjects research requires the informed consent of study participants to share data. Make sure data sharing was mentioned in your IRB and informed consent forms. It is also crucial to de-identify data prior to sharing to reduce the risk of identifying individuals in datasets.
Applying a license is a necessary part of sharing your data. A license lets those who reuse your data understand precisely how they can reuse it, and how your contributions should be attributed. Releasing data without a license creates uncertainty and inefficiency for data generators and reusers.
It's important to recognize that datasets have different licensing considerations than other open access outputs: license interoperability and attribution stacking. Because data can be extracted from many datasets which may themselves have been extracted from other datasets, providing complete and accurate acknowledgement of the sources quickly becomes complicated, and can ultimately limit the reusability of a dataset. Choosing a license that does not require attribution, such as CC0 or the Open Data Commons Public Domain Dedication and License (PDDL) can help those who wish to reuse your data avoid these problems. The resources linked below can help you understand what needs to be considered when licensing your data.
Contact Technology Transfer if you have intellectual property questions about your data.
Contact the Library if you have questions about specific licenses.
The NIH supports a large number of domain-specific data sharing repositories. These repositories are described in two lists: one for repositories that allow open submission and access and one for repositories that may restrict submission and access to specific researchers. If available, best practices and many policies dictate that data should be shared via domain-specific repositories.
The repositories listed below accept datasets from all research disciplines and are appropriate when a domain-specific repository does not exist. They also accept deposits of other scholarly outputs, such as preprints and software.
Data papers are peer-reviewed publications that describe datasets: how and why it was collected, how it has been processed, what format it is in, and where it can be accessed. Data papers do not describe methods, analysis, or conclusions. Instead, they are aimed at facilitating access to data, so other researchers can further analyze it. As a result, they often provide a more thorough description and context than data repositories and form a scholarly credit for research staff that make data-based contributions.
Data journals are specialized publications that focus on publishing data papers to enhance other researchers' findability of high-quality datasets. Typically, they do not host data, but they may recommend places to deposit data. Below are some resources for finding data journals.
Here are some sites where additional data repositories may be found.
Some of the tools, websites, and resources used to collect, process, and analyze data may seem repository-like, but they are not appropriate for sharing data.
Exacloud is a computational resource provided by OHSU's Advanced Computing Center for supporting large-scale computational and data-intensive workflows. It is not intended for data storage or sharing.
The Research Data Storage resource provided in partnership by OHSU's Advanced Computing Center and Information Technology Group is intended for long-term, large-scale, secure data storage that provides data replication and redundancy. It is not a publicly-accessible resource and therefore is not appropriate for sharing data.
The word "repository" is overloaded with meaning and the difference between repositories intended for data sharing and OHSU IRB-approved repositories is often unclear. While IRB repositories are intended to store data and/or specimens for future research use, they are also subject to strict requirements that make them inappropriate for data sharing.
GitHub is a code repository, intended to manage the process of creating and updating software, programs, and scripts. It is not appropriate to use GitHub to store or share data.
GitHub should also not be relied upon for preserving computational research artefacts; instead, use Zenodo to archive code releases and issue a DOI so your computational research outputs can be cited and referenced.
REDCap (Research Electronic Data Capture) is a data collection tool. It is a secure, reliable, versatile and feature-rich web application for building and managing HIPAA- and IRB-compliant online surveys and databases. REDCap is not a data repository and it is not an appropriate platform for data sharing.
Qualtrics is another data collection tool; it's a survey platform used for academic or research purposes. Qualtrics is not a data repository and it is not an appropriate platform for data sharing.