File Formats For Preservation
Introduction
To maximize the ability to share, preserve and re-use digital files, you will want to carefully consider the format you use for digital files. Careful selection of a file format limits the chances of your data becoming inaccessible when a proprietary format is no longer supported.
In general, UW Libraries encourages you to use file formats that are Open format (or free file formats–well-documented), non-proprietary, commonly used within your field or discipline; that have a standard encoding (ASCII, Unicode); and are not otherwise encrypted or compressed.
The tables below highlight select file formats according to tier level. The tier level indicates the likelihood that the file format will be viewable in the future. The list is not exhaustive. For a more comprehensive list, see Sustainability of Digital Formats: Planning for Library of Congress Collections. For questions regarding file formats contact [email protected].
Tiers
Tier One: Most likely will be able to render the content of the file successfully in the long-term. The file format is an open format or the proprietary file format’s documentation is available. Use is ubiquitous and the application to render the file is widely available.
Tier Two: Reasonably likely will be able to render the content of the file successfully in the long-term if the appropriate application is available to render the file successfully. Use may be common enough to successfully procure the code or application to render the content. The file format may be an open format or the proprietary file format’s documentation is available. Some proprietary file formats may be Tier Two due to their ubiquitous use and lack of complexity.
Tier Three: Least likely to render content of the file successfully in the long-term. Some of these file formats may be proprietary files that require specific applications or drop support for older versions of files.
File Formats by Content Type:
Documents
| Content | Tier One | Tier Two | Tier Three |
|---|---|---|---|
| Text | PDF/A-1 — ISO 19005-1 (.pdf) Plain text — US-ASCII, UTF-8, UTF-16 with BOM (.txt) SGML with included DTD (.sgm, .sgml) XML with included schema (.xml) Plain text — ISO 8859-x (.txt) OpenDocument Text (.odt, .sxw) EPUB (.epub) |
PDF with fonts embedded (.pdf) Rich Text Format 1.x (.rtf) |
MS Word 2007+ (OOXML) (.docx)
Microsoft Word (.doc) |
| Presentation | PDF/A-1 – ISO 19005-1 (.pdf) | OpenDocument Presentation (.odp)
MS Powerpoint 2007+ (OOXML) (.pptx) PowerPoint (.ppt) |
Images and Graphics
| Content | Tier One | Tier Two | Tier Three |
|---|---|---|---|
| Raster Graphics- Lossless | TIFF — uncompressed or CCITT 4 compressed (.tiff)
PNG (.png)–24bit true color JPEG2000 — lossless compression (.jp2) GIF (.gif) PNG (.png)–8 bit indexed |
BMP (.bmp) | PhotoShop (.psd)
MrSID (.sid) |
| Raster Graphics- Lossy | TIFF — compressed (.tiff)
JPEG (.jpg) JPEG2000 — lossy compression (.jp2) |
HEIF (.heif, .heic) | PhotoShop (.psd)
MrSID (.sid) |
| Digital Camera File Formats | TIFF — uncompressed or CCITT 4 compressed (.tiff)
PNG (.png)–24bit true color JPEG2000 — lossless compression (.jp2) |
Digital Negative DNG (.dng)
JPEG (.jpg) JPEG2000 — lossy compression (.jp2) |
Camera RAW files, (ex. .CR2, .NEF) |
| Vector Graphics | SVG — no JavaScript binding (.svg)
PDF/A-1 — ISO 19005-1 (.pdf) |
Computer Graphics Metafile (.cgm) | Encapsulated Postscript (.eps)
Macromedia Flash (.swf) |
Audio and Video
| Content | Tier One | Tier Two | Tier Three |
|---|---|---|---|
| Born Digital Audio | WAV – PCM (.wav)
AIFF – PCM (.aif, .aiff) |
Standard MIDI (.mid)
Apple Lossless MP3 (.mp3) AAC (.mp4, .m4a) |
AIFC — compressed AIFF (.aifc)
RealAudio (.rm, .ra) Windows Media Audio (.wma) WAV — compressed (.wav) |
| Born Digital Video | Matroska using FFV1 codec (.mkv)
AVI with uncompressed video (.avi) QuickTime with uncompressed video (.mov) Material Exchange Format (.mxf) |
MPEG-1, MPEG-2, MPEG-4 (.mp1, .mp2, .mp4)
ProRes (.mov) Motion JPEG 2000 (.jp2) |
Windows Media Video (.wmv)
RealVideo (.rm, .rv) Flash (.flv) |
Geospatial Data
| Content | Tier One | Tier Two | Tier Three |
|---|---|---|---|
| Geospatial – Vector Data | GeoJSON (.json, .geojson)
Geography Markup Language (GML) (.gml) |
ESRI Shapefile (making sure all component files are present) (.shp, .shx, .dbf)
ESRI Geodatabase (.gdb) (prefer Shapefiles) Keyhole Markup Language (KML) (.kml, .kmz) |
Other ESRI files |
| Geospatial – Raster and Georeferenced Images | GeoTIFF (.tif) | GML in JPEG 2000 (.jpx, .jp2) |
Data
| Content | Tier One | Tier Two | Tier Three |
|---|---|---|---|
| Spreadsheet or Database | Comma- or tab-separated Values (.csv, .tsv, .txt)
Delimited text |
OpenDocument Spreadsheet (.ods)
MS Excel 2007+ (OOXML) (.xlsx) |
Other ESRI files |
| Quantitative and Statistical Data | Comma- or tab-separated Values (.csv, .tsv, .txt)
Structured text or markup file containing metadata information: Data Documentation Initiative (.ddi), |
SPSS (.sav, .sps, .spv, .spo) SAS (.sas, .sas7dat) R (.R) HDF4 (.hdf) HDF5 (.hdf) |
Excel (.xls)
Other proprietary formats |
CAD Files
| Content | Tier One | Tier Two | Tier Three |
|---|---|---|---|
| CAD (See also: Vector Graphics) |
AutoDesk’s Drawing Interchange File Format/Data eXchange Format (.dxf)
Industry Foundation Class (.ifc) Standard for the Exchange of Product Model Data (.step, .stp, .p21) Initial Graphics Exchange Specification (.igs) |
Extensible 3D (.x3D)
Universal 3D (.u3D) |
Proprietary CAD formats |
Other Born Digital Content Types
| Content | Tier One | Tier Two | Tier Three |
|---|---|---|---|
| Computer Programs | Computer program source code | Compiled / Executable files | |
| Containers | Zip (.zip) – no compression, Tar (.tar) |
Zip – compressed | |
| MBOX EML |
MSG PST |
||
| Websites | WARC with GZIP compression | Web Archive Collection Zipped (WACZ) |
