Skip to Main Content

Research Data and Reproducibility

Your guide to research data services at OHSU.

What is data documentation?

Data documentation describes the data, variable names and definitions, when data was collected, and other happenings throughout a project.

Data documentation may include:

  • Metadata
  • README files
  • Codebooks and Data Dictionaries
  • Protocols

Metadata and Data Standards

Research data cannot speak for itself; it needs to be accompanied by descriptive information that will allow you and others to understand it, especially in the future. We call this descriptive information metadata. It should include all relevant details about a project, experiments, equipment, researchers, and the data itself. Good metadata facilitates the discovery of and access to your research data over the long term.

Many standards exist for research data metadata; find them via the links below.

Readmes, Data Dictionaries, and Codebooks

Additional documentation about your research data can be provided in README files, Data Dictionaries, and Codebooks.

README files allow provide information about data files and are intended to help ensure that the data can be correctly interpreted by yourself at a later date or by others when sharing or publishing data. They explain the nuances of your unique data collection.

Data Dictionaries are files that describe each element of your dataset -- what all the variable names and values in your spreadsheet really mean.

Codebook is a term that may be used interchangeably with data dictionary, it may refer to a document that is more detailed than a data dictionary, or it may refer to tools that are used by survey researchers to provide information about the data from a survey instrument.


Researchers are increasingly publishing the protocols that make up their research methods, particularly when article length limits prevent describing methodologies in enough detail to facilitate reproducibility. Here are links to places where protocols can be discovered or published, and some guidance on how to write reproducible protocols.