Skip to content

File Formats For Preservation

Introduction

To maximize the ability to share, preserve and re-use digital files, you will want to carefully consider the format you use for digital files. Careful selection of a file format limits the chances of your data becoming inaccessible when a proprietary format is no longer supported.  

In general, UW Libraries encourages you to use file formats that are Open format (or free file formats–well-documented), non-proprietary, commonly used within your field or discipline; that have a standard encoding (ASCII, Unicode); and are not otherwise encrypted or compressed. 

The tables below highlight select file formats according to tier level. The tier level indicates the likelihood that the file format will be viewable in the future. The list is not exhaustive. For a more comprehensive list, see  Sustainability of Digital Formats: Planning for Library of Congress Collections. For questions regarding file formats contact [email protected].

Tiers

Tier One: Most likely will be able to render the content of the file successfully in the long-term. The file format is an open format or the proprietary file format’s documentation is available. Use is ubiquitous and the application to render the file is widely available. 

Tier Two: Reasonably likely will be able to render the content of the file successfully in the long-term if the appropriate application is available to render the file successfully. Use may be common enough to successfully procure the code or application to render the content. The file format may be an open format or the proprietary file format’s documentation is available. Some proprietary file formats may be Tier Two due to their ubiquitous use and lack of complexity. 

Tier Three: Least likely to render content of the file successfully in the long-term. Some of these file formats may be proprietary files that require specific applications or drop support for older versions of files. 

File Formats by Content Type:

Documents

Content Tier One Tier Two Tier Three
Text PDF/A-1 —  ISO 19005-1 (.pdf) 
Plain text — US-ASCII, UTF-8, UTF-16 with BOM (.txt) 
 
SGML with included DTD (.sgm, .sgml) 
 
XML with included schema (.xml) 
 
Plain text — ISO  
8859-x (.txt) 
 
OpenDocument Text (.odt, .sxw) 
EPUB (.epub)
PDF with fonts embedded (.pdf) 
 
Rich Text Format 1.x (.rtf) 
MS Word 2007+ (OOXML) (.docx) 

Microsoft Word (.doc) 
 
WordPerfect  
(.wpd) 
 
LaTeX with referenced files (.latex) 

Presentation PDF/A-1 – ISO 19005-1 (.pdf)  OpenDocument Presentation (.odp) 

MS Powerpoint 2007+ (OOXML) (.pptx) 

PowerPoint (.ppt)

 Images and Graphics

Content Tier One Tier Two Tier Three
Raster Graphics- Lossless  TIFF — uncompressed or CCITT 4 compressed (.tiff) 

PNG (.png)–24bit true color 

JPEG2000 — lossless compression (.jp2) 

GIF (.gif) 

PNG (.png)–8 bit indexed 

BMP (.bmp) PhotoShop (.psd) 

MrSID (.sid) 
 

Raster Graphics- Lossy TIFF — compressed (.tiff) 

JPEG (.jpg) 

JPEG2000 — lossy compression (.jp2)

HEIF (.heif, .heic) PhotoShop (.psd) 

MrSID (.sid) 
 

Digital Camera File Formats  TIFF — uncompressed or CCITT 4 compressed (.tiff) 

PNG (.png)–24bit true color 

JPEG2000 — lossless compression (.jp2)

Digital Negative DNG (.dng) 

JPEG (.jpg) 

JPEG2000 — lossy compression (.jp2) 

Camera RAW files, (ex. .CR2, .NEF) 
Vector Graphics SVG — no JavaScript binding (.svg) 

PDF/A-1 —  ISO 19005-1 (.pdf) 

Computer Graphics Metafile (.cgm) Encapsulated Postscript (.eps) 

Macromedia Flash (.swf)

Audio and Video

Content Tier One Tier Two Tier Three
Born Digital Audio WAV – PCM (.wav) 

AIFF – PCM (.aif, .aiff)
  
FLAC (.flac)

Standard MIDI (.mid) 

Apple Lossless 
Audio Codec (ALAC) (.m4a) 

MP3 (.mp3) 

AAC (.mp4, .m4a)

AIFC — compressed AIFF (.aifc) 

RealAudio (.rm, .ra) 

Windows Media Audio (.wma) 

WAV — compressed (.wav)

Born Digital Video Matroska using FFV1 codec (.mkv) 

AVI with uncompressed video (.avi) 

QuickTime with uncompressed video (.mov) 

Material Exchange Format (.mxf)

MPEG-1, MPEG-2, MPEG-4 (.mp1, .mp2, .mp4) 

ProRes  (.mov) 

Motion JPEG 2000 (.jp2)

Windows Media Video (.wmv) 

RealVideo (.rm, .rv) 

Flash (.flv) 

Geospatial Data

Content Tier One Tier Two Tier Three
Geospatial – Vector Data GeoJSON (.json, .geojson) 

Geography Markup Language (GML) (.gml) 
 

ESRI Shapefile (making sure all component files are present) (.shp, .shx, .dbf) 

ESRI Geodatabase (.gdb) (prefer Shapefiles) 

Keyhole Markup Language (KML) (.kml, .kmz)

Other ESRI files
Geospatial – Raster and Georeferenced Images GeoTIFF (.tif)  GML in JPEG 2000 (.jpx, .jp2)

 Data

Content Tier One Tier Two Tier Three
Spreadsheet or Database Comma- or tab-separated Values (.csv, .tsv, .txt) 

Delimited text 
Platform-independent open database formats (.db, .db3, .sqlite, .sqlite3)

OpenDocument Spreadsheet (.ods) 

MS Excel 2007+ (OOXML) (.xlsx)
 
dBASE (.dbf) 
 

Other ESRI files
Quantitative and Statistical Data Comma- or tab-separated Values (.csv, .tsv, .txt) 

Structured text or markup file containing metadata information: 

Data Documentation Initiative (.ddi), 
XML (.xml), JSON (.json)

SPSS (.sav, .sps, .spv, .spo)
 
SAS (.sas, .sas7dat)
 
R (.R) 

HDF4 (.hdf) 

HDF5 (.hdf) 

Excel (.xls) 

Other proprietary formats

 CAD Files

Content Tier One Tier Two Tier Three
CAD 
  
(See also: Vector Graphics)
AutoDesk’s Drawing Interchange File Format/Data eXchange Format (.dxf) 

Industry Foundation Class (.ifc) 

Standard for the Exchange of Product Model Data (.step, .stp, .p21) 

Initial Graphics Exchange Specification (.igs) 

Extensible 3D (.x3D) 

Universal 3D (.u3D)
 
Portable Document Format/Engineering or PDF3D (.pdf)

Proprietary CAD formats

 Other Born Digital Content Types

Content Tier One Tier Two Tier Three
Computer Programs Computer program source code Compiled / Executable files
Containers Zip (.zip) – no compression,  
Tar (.tar)
Zip – compressed
Email MBOX 
EML
MSG 
PST
Websites WARC with GZIP compression Web Archive Collection Zipped (WACZ)