Skip to Main Content

Research Data and Reproducibility

Your guide to research data services at OHSU.

What is data sharing?

Data sharing is the practice of making research data available to others. An integral part of open science, many funding agencies, publishers, and institutions have policies requiring research data sharing.

  • Data can be made available for sharing by depositing it in a data repository, describing it in a data paper published in a data journal, or sharing it with individual researchers upon request.
  • Data sharing may be limited by privacy or security concerns.
  • Sharing data facilitates data reuse.

Preparing to share

Effective data sharing requires it to be organized, documented, and preserved. Human subjects research requires the informed consent of study participants to share data. Make sure data sharing was mentioned in your IRB and informed consent forms. It is also crucial to de-identify data prior to sharing to reduce the risk of identifying individuals in datasets.

Licensing your data

Applying a license is a necessary part of sharing your data. A license lets those who reuse your data understand precisely how they can reuse it, and how your contributions should be attributed. Releasing data without a license creates uncertainty and inefficiency for data generators and reusers.

It's important to recognize that datasets have different licensing considerations than other open access outputs: license interoperability and attribution stacking. Because data can be extracted from many datasets which may themselves have been extracted from other datasets, providing complete and accurate acknowledgement of the sources quickly becomes complicated, and can ultimately limit the reusability of a dataset. Choosing a license that does not require attribution, such as CC0 or the Open Data Commons Public Domain Dedication and License (PDDL) can help those who wish to reuse your data avoid these problems.  The resources linked below can help you understand what needs to be considered when licensing your data.

Contact Technology Transfer if you have intellectual property questions about your data.  

Contact the Library if you have questions about specific licenses.  

Where can data be shared?

The NIH supports a large number of domain-specific data sharing repositories. These repositories are described in two lists: one for repositories that allow open submission and access and one for repositories that may restrict submission and access to specific researchers. If available, best practices and many policies dictate that data should be shared via domain-specific repositories.  

The repositories listed below accept datasets from all research disciplines and are appropriate when a domain-specific repository does not exist. They also accept deposits of other scholarly outputs, such as preprints and software.

Data papers are peer-reviewed publications that describe datasets: how and why it was collected, how it has been processed, what format it is in, and where it can be accessed. Data papers do not describe methods, analysis, or conclusions. Instead, they are aimed at facilitating access to data, so other researchers can further analyze it. As a result, they often provide a more thorough description and context than data repositories and form a scholarly credit for research staff that make data-based contributions.  

Data journals are specialized publications that focus on publishing data papers to enhance other researchers' findability of high-quality datasets. Typically, they do not host data, but they may recommend places to deposit data. Below are some resources for finding data journals.

Here are some sites where additional data repositories may be found.

What are some things that are not repositories?

Some of the tools, websites, and resources used to collect, process, and analyze data may seem repository-like, but they are not appropriate for sharing data.

Exacloud

Exacloud is a computational resource provided by OHSU's Advanced Computing Center for supporting large-scale computational and data-intensive workflows. It is not intended for data storage or sharing.

OHSU Research Data Storage

The Research Data Storage resource provided in partnership by OHSU's Advanced Computing Center and Information Technology Group is intended for long-term, large-scale, secure data storage that provides data replication and redundancy. It is not a publicly-accessible resource and therefore is not appropriate for sharing data.

IRB-Approved Repositories

The word "repository" is overloaded with meaning and the difference between repositories intended for data sharing and OHSU IRB-approved repositories is often unclear. While IRB repositories are intended to store data and/or specimens for future research use, they are also subject to strict requirements that make them inappropriate for data sharing.

GitHub

GitHub is a code repository, intended to manage the process of creating and updating software, programs, and scripts. It is not appropriate to use GitHub to store or share data.

GitHub should also not be relied upon for preserving computational research artefacts; instead, use Zenodo to archive code releases and issue a DOI so your computational research outputs can be cited and referenced.

REDCap

REDCap (Research Electronic Data Capture) is a data collection tool.  It is a secure, reliable, versatile and feature-rich web application for building and managing HIPAA- and IRB-compliant online surveys and databases. REDCap is not a data repository and it is not an appropriate platform for data sharing.

Qualtrics

Qualtrics is another data collection tool; it's a survey platform used for academic or research purposes. Qualtrics is not a data repository and it is not an appropriate platform for data sharing.