Skip to Main Content

Research Data and Reproducibility

Your guide to research data services at OHSU.

Data Organization

What is data organization?

Data organization refers to:

  • How project folders are structured make it easier to find files
  • Naming files for logical grouping and/or chronological sorting within folders
  • Structuring the contents of files to make analysis easy

Good data organization considers:

  • All of the data and materials associated with a project and the relationships between them
  • How to consistently and meaningfully name files
  • What others need to know to understand your organization methods

Directory Structures

You'll want to plan and use a file and folder organizational structure that is informative and keeps all the research information and materials associated with a project together.

Here are some tips for organizing directories:

  • Organize data hierarchically, and identify ways to divide your data into categories or attributes such as:
    • Project
    • Time
    • Location
    • File Type
  • Files can be arranged chronologically, by classification or code, or alphabetically within folders. The most appropriate arrangement may depend on the types of files.
  • Folder and sub-folder names should reflect what's in the folder, not the names of researchers or staff.
  • Include basic information such as the project title, dates, and some kind of unique identifier, such as a grant number.
  • Document your file directory structure and describe the types of records that should be maintained in those folders in a README file.

 

Directory Structure Examples:

Organized by File Type Organized by Analysis
  • Dataset.A

    • Code

      • Step.1

      • Step.2

    • Data

      • Processed

      • Raw

    • Results

      • Figure.1

      • Figure.2

      • Models

    • readme.txt

  • Dataset.B

    • Figure.1

      • Code

      • Data

      • Results

    • Figure.2

      • Code

      • Data

      • Results

    • Table.1

      • Code

      • Data

      • Results

    • readme.txt


File Naming

  • Make file names unique. Avoid using the same file name in different folders—this can cause confusion or even data loss.

  • Include key details in the file name to describe what it is. This might include:

    • Date (use YYYMMDD or YY-MM-DD for easy sorting)

    • Project name

    • Experiment or sample name

    • Data type (raw, processed, final, notes, etc.)

    • Instrument or location

    • Author or creator’s name

    • Version number (e.g., v01, v02)

  • Be clear and consistent:

    • Use short, meaningful terms or abbreviations

    • Use CamelCase, dashes ( - ) or underscores ( _ ) to separate words

    • Avoid spaces and special characters

    • Use leading zeros for numbers (e.g., 01, 02, 03... 10) so files sort properly

  • Arrange details from general to specific to help files group logically in folders.

  • Keep names to under 50 characters if possible, for easier reading and compatibility.

  • Record your naming rules and apply them consistently.

Data Documentation

Data documentation is the information you create to explain your research data so that others (and your future self) can understand and use it correctly. It includes details about how the data were collected, what each variable means, how files are organized, and any processing steps that were taken. Common forms of documentation include README files, data dictionaries, and codebooks. Clear, consistent documentation is essential for ensuring your data is reusable, reproducible, and meaningful over time.

Research data doesn’t explain itself. Without context, it can be difficult—or even impossible—for others (or even you in the future) to understand what the data means, how it was collected, or how it should be used. That’s where metadata comes in.

Metadata is descriptive information about your data. It tells the who, what, when, where, why, and how behind your research. Think of it as the user manual for your dataset.

Good metadata should include:

  • The purpose of the project or study

  • Who collected or created the data

  • When and where the data was collected

  • What instruments, tools, or software were used

  • How the data was processed or transformed

  • Descriptions of variables, file formats, and units of measurement

  • Any limitations, assumptions, or known issues with the data

Creating thorough metadata helps:

  • You remember the details of your work months or years later

  • Others understand and reuse your data properly

  • Make your data easier to find in repositories or through search engines

  • Support long-term preservation and reproducibility

There are many established metadata standards designed for different disciplines. Find them via the links below.

README files explain the contents of your data files. It tells others (or your future self) what the files are, how they were created, and how to use them. This ensures your data can be interpreted correctly. They might include:

  • A list of files and what each one contains
  • How the data were collected
  • Any processing steps or tools used
  • Notes about data quality or limitations

Data dictionaries describe each variable in your dataset. It explains:

  • What each variable name means
  • What kinds of values are expected (e.g., categories, numbers, dates)
  • What coded values represent (e.g., 1 = Yes, 0 = No)

This makes your data easier to understand, analyze, and reuse.


Codebooks are sometimes used in place of a data dictionary, but can also mean something more detailed. Codebooks can include:

  • Full survey questions
  • Variable names linked to those questions
  • Value labels and skip logic
  • Response options and how they were coded

Codebooks are extremely helpful when working with complex surveys or instruments.

Researchers are increasingly publishing their research protocols, particularly when publisher word counts don't allow you to describe methods in enough detail so others can recreate them. Here are links to places where protocols can be discovered or published.