Skip to Main Content

Research Data and Reproducibility

Your guide to research data services at OHSU.

What is data preservation?

Data preservation and archiving refer to practices and procedures aimed at providing long-term storage of and access to data by conserving and maintaining its safety and integrity. It is achieved by using durable file formats, placing files in secure locations, and placing files in multiple locations, including archiving files locally and/or depositing files in data repositories.

Preservation and archiving typically occur at the end of a project, and preserved data is a static and usually final record of the project. Contrast this with storage and backup, which are activities that occur during the period where research data is still being collected and analyzed.

Preservation file formats

Choose your file formats carefully. For maximum longevity, readability and access, plan for both hardware and software obsolescence and avoid formats that are unusual, closed, proprietary or compressed.

Preferred formats are those that are most likely able to be preserved for long-term use. They tend to be non-proprietary, open formats that are in common use in research communities, use standard encodings like ASCII or Unicode, and are not compressed or encrypted.

Acceptable formats may require more work than Preferred formats to be preserved and should only be used when it's not possible to use a Preferred format.

Formats that are Not Recommended often require proprietary software in order to be read and are at risk of becoming obsolete.  Future usability and access of any files that use such formats cannot be guaranteed.

Type Preferred Acceptable Not Recommended
Structured data and spreadsheets

Delimiter-Separated Values (.csv, .tsv)

OpenDocument Spreadsheet (.ods)

Microsoft Excel OOXML (.xlsx)

SQLite (.sqlite3, .sqlite, .db)

Microsoft Excel (.xls)

SPSS (.por, .sav)

Text and word processing documents

PDF/A (.pdf)*

Plain Text (.txt)

Markdown (.md)

XML (.xml)

SGML (.sgm, .sgml)

PDF (.pdf)

Microsoft Word OOXML (.docx)

OpenDocument Text (.odt)

LaTeX (.latex)

EPUB (.epub)

HTML (.htm, .html)

Rich Text Format (.rtf)

PostScript (.eps, .epsf, .ps)

Microsoft Word (.doc)

WordPerfect (.wpd)

Google Docs

All other text document formats not listed here

Photos, images, and vector graphics

Tagged Image File Format (.tiff)

JPEG (.jpeg)

Portable Network Graphics (.png)

Scalable Vector Graphics (.svg)

PDF/A (.pdf)*

Graphics Interchange Format (.gif)

Digital Negative (.dng)

Bitmap (.bmp)

PDF (.pdf)

Adobe Illustrator (.ai)

Adobe Photoshop (.psd)

RAW camera images

All other image formats not listed here

Audio

MPEG-1 or MPEG-2 Audio (.mp3)

Waveform Audio File Format (.wav)

Audio Interchange File Format (.aif, .aiff)

Broadcast Wave (.bwf, .bwav)

Standard MIDI (.mid)

Free Lossless Audio Codec (.flac)

MPEG-4 (.mp4, .m4a)

Ogg Vorbis (.ogg)

Sun Audio (.au)

AIFF Compressed (.aifc)

Windows Media Audio (.asf, .wma)

All other audio formats not listed here

Video

Audio Video Interleave (.avi)

QuickTime Movie (.mov)

MPEG-4 (.mp4)

FFVI/Matroska (.mkv)

 

Windows Media Video (.asf, .wmv)

All other video formats not listed here

Posters, presentations and slide decks

PDF/A (.pdf)*

PDF (.pdf)

OpenDocument Presentation (.odp)

Microsoft PowerPoint OOXML (.pptx)

Microsoft Powerpoint (.ppt)

Google Slides

All other presentation formats not listed here

Maintaining data integrity

Digital data are fragile, regardless of which storage medium you choose (DVD, hard disk, tapes, etc.). Digital data are susceptible to bit rot, and are likely to degrade or decay over time. The recommended methods for combating bit rot are refreshment and replication.

Refreshment: Periodically copy your data onto a new drive or disk (every 2-5 years).
Replication: Maintain your original copy, an external copy, and an external remote copy. Use at least two forms of storage in two different locations.

Personal computers and external storage devices are not recommended for long-term archiving of finalized data -- networked file servers are the best choice.