Skip to Main Content

Research Data and Reproducibility

Your guide to research data services at OHSU.

Data Storage

Data storage refers to where data is located while actively being collected or processed. Data that is stored should be periodically backed up to a secure location. Storage and backup can be contrasted with preservation and archiving, which focus on handling data when a research project ends.

Good data storage practices consider:

  • How data will be used during a project
  • How to sustainably record and access data during a project
  • Potential disasters and how they can be mitigated
  • The data's longer-term value, access, and preservation needs

Here are some tips for storing research data:

  • "Keep raw data raw"
  • Store data in open formats
  • Practice the 3-2-1 rule:
    • Three copies
    • At least Two media types
    • One copy in an entirely different location
  • Read and understand the terms of service of your storage providers
  • Use an automated service to create regular backups, but don't create your backups on your local hard drive
  • Make sure you know how to recover data from your backups!

Data Preservation

Data preservation involves saving your data in a way that keeps it safe, usable, and accessible for the long term. This includes using reliable file formats, storing files in secure and stable locations, and keeping copies in more than one place—such as on a local drive and in a trusted data repository.

Preservation usually happens at the end of a project and creates a final, unchanging version of the data. This is different from storage and backup, which happen during the research process while data is still being collected, analyzed, and updated.


File Formats for Data Preservation

Choose file formats that will be easy to open and use in the future. To keep your data readable over time, avoid uncommon, proprietary, or compressed formats that may rely on specific software or hardware.

  • Preferred formats are best for long-term access. These are open, widely used formats that don’t require special software. They often use standard text encoding like ASCII or Unicode, and they are not compressed or encrypted.
  • Acceptable formats can be used if a preferred format isn't possible, but they may take extra effort to preserve over time.
  • Not recommended formats depend on proprietary software and may become unusable if that software is no longer supported. There’s no guarantee these files will be accessible in the future.

Planning ahead by choosing the right formats now helps protect your data from becoming unreadable later.

 

Type Preferred Acceptable Not Recommended
Structured data and spreadsheets

Delimiter-Separated Values (.csv, .tsv)

OpenDocument Spreadsheet (.ods)

Microsoft Excel OOXML (.xlsx)

SQLite (.sqlite3, .sqlite, .db)

Microsoft Excel (.xls)

SPSS (.por, .sav)

Text and word processing documents

PDF/A (.pdf)*

Plain Text (.txt)

Markdown (.md)

XML (.xml)

SGML (.sgm, .sgml)

PDF (.pdf)

Microsoft Word OOXML (.docx)

OpenDocument Text (.odt)

LaTeX (.latex)

EPUB (.epub)

HTML (.htm, .html)

Rich Text Format (.rtf)

PostScript (.eps, .epsf, .ps)

Microsoft Word (.doc)

WordPerfect (.wpd)

Google Docs

All other text document formats not listed here

Photos, images, and vector graphics

Tagged Image File Format (.tiff)

JPEG (.jpeg)

Portable Network Graphics (.png)

Scalable Vector Graphics (.svg)

PDF/A (.pdf)*

Graphics Interchange Format (.gif)

Digital Negative (.dng)

Bitmap (.bmp)

PDF (.pdf)

Adobe Illustrator (.ai)

Adobe Photoshop (.psd)

RAW camera images

All other image formats not listed here

Audio

MPEG-1 or MPEG-2 Audio (.mp3)

Waveform Audio File Format (.wav)

Audio Interchange File Format (.aif, .aiff)

Broadcast Wave (.bwf, .bwav)

Standard MIDI (.mid)

Free Lossless Audio Codec (.flac)

MPEG-4 (.mp4, .m4a)

Ogg Vorbis (.ogg)

Sun Audio (.au)

AIFF Compressed (.aifc)

Windows Media Audio (.asf, .wma)

All other audio formats not listed here

Video

Audio Video Interleave (.avi)

QuickTime Movie (.mov)

MPEG-4 (.mp4)

FFVI/Matroska (.mkv)

 

Windows Media Video (.asf, .wmv)

All other video formats not listed here

Posters, presentations and slide decks

PDF/A (.pdf)*

PDF (.pdf)

OpenDocument Presentation (.odp)

Microsoft PowerPoint OOXML (.pptx)

Microsoft Powerpoint (.ppt)

Google Slides

All other presentation formats not listed here


Maintaining Data Integrity

Digital data are fragile, no matter where you store them—hard drives, servers, or elsewhere. Over time, files can become damaged or unreadable due to a problem known as bit rot. To keep your data safe, use two key strategies: refreshment and replication.

  • Refreshment means copying your data to a new device every 2–5 years to avoid loss from aging hardware.

  • Replication means keeping multiple copies of your data:

    • One main copy
    • One backup copy
    • One off-site or cloud-based copy

Personal computers and external hard drives are not reliable for long-term archiving. For preserving finalized data, networked file servers or trusted data repositories are the best option.