Data preservation and archiving refer to practices and procedures aimed at providing long-term storage of and access to data by conserving and maintaining its safety and integrity. It is achieved by using durable file formats, placing files in secure locations, and placing files in multiple locations, including archiving files locally and/or depositing files in data repositories.
Preservation and archiving typically occur at the end of a project, and preserved data is a static and usually final record of the project. Contrast this with storage and backup, which are activities that occur during the period where research data is still being collected and analyzed.
Choose your file formats carefully. For maximum longevity, readability and access, plan for both hardware and software obsolescence and avoid formats that are unusual, closed, proprietary or compressed.
Preferred formats are those that are most likely able to be preserved for long-term use. They tend to be non-proprietary, open formats that are in common use in research communities, use standard encodings like ASCII or Unicode, and are not compressed or encrypted.
Acceptable formats may require more work than Preferred formats to be preserved and should only be used when it's not possible to use a Preferred format.
Formats that are Not Recommended often require proprietary software in order to be read and are at risk of becoming obsolete. Future usability and access of any files that use such formats cannot be guaranteed.
Type | Preferred | Acceptable | Not Recommended |
---|---|---|---|
Structured data and spreadsheets |
OpenDocument Spreadsheet (.ods) Microsoft Excel OOXML (.xlsx) SQLite (.sqlite3, .sqlite, .db) |
Microsoft Excel (.xls) SPSS (.por, .sav) |
|
Text and word processing documents |
Plain Text (.txt) Markdown (.md) XML (.xml) SGML (.sgm, .sgml) |
PDF (.pdf) Microsoft Word OOXML (.docx) OpenDocument Text (.odt) LaTeX (.latex) EPUB (.epub) HTML (.htm, .html) Rich Text Format (.rtf) PostScript (.eps, .epsf, .ps) |
Microsoft Word (.doc) WordPerfect (.wpd) Google Docs All other text document formats not listed here |
Photos, images, and vector graphics |
Tagged Image File Format (.tiff) JPEG (.jpeg) Portable Network Graphics (.png) Scalable Vector Graphics (.svg) |
Graphics Interchange Format (.gif) Digital Negative (.dng) Bitmap (.bmp) PDF (.pdf) |
Adobe Illustrator (.ai) Adobe Photoshop (.psd) All other image formats not listed here |
Audio |
MPEG-1 or MPEG-2 Audio (.mp3) Waveform Audio File Format (.wav) Audio Interchange File Format (.aif, .aiff) Broadcast Wave (.bwf, .bwav) |
Standard MIDI (.mid) Free Lossless Audio Codec (.flac) MPEG-4 (.mp4, .m4a) Ogg Vorbis (.ogg) Sun Audio (.au) |
AIFF Compressed (.aifc) Windows Media Audio (.asf, .wma) All other audio formats not listed here |
Video |
Audio Video Interleave (.avi) QuickTime Movie (.mov) MPEG-4 (.mp4) |
Windows Media Video (.asf, .wmv) All other video formats not listed here |
|
Posters, presentations and slide decks |
PDF (.pdf) OpenDocument Presentation (.odp) Microsoft PowerPoint OOXML (.pptx) |
Microsoft Powerpoint (.ppt) Google Slides All other presentation formats not listed here |
Digital data are fragile, regardless of which storage medium you choose (DVD, hard disk, tapes, etc.). Digital data are susceptible to bit rot, and are likely to degrade or decay over time. The recommended methods for combating bit rot are refreshment and replication.
Refreshment: Periodically copy your data onto a new drive or disk (every 2-5 years).
Replication: Maintain your original copy, an external copy, and an external remote copy. Use at least two forms of storage in two different locations.
Personal computers and external storage devices are not recommended for long-term archiving of finalized data -- networked file servers are the best choice.