Library Guides: Data Management: File Formats

File Formats

"Working" file formats, those used in the course of collecting and working with project data, are not always ideal for re-use or long-term preservation, and may not meet the requirements of data archives or repositories or satisfy the expectations of research funders.

In the absence of specific directives from funders or repositories, we offer the Cornell University general guidelines for selecting file formats for preservation and reuse: Recommended File Formats

Principles for Selecting File Formats

Select open, non-proprietary formats

Open, non-proprietary formats are far more likely to remain usable even if the software that created them is not available or no longer functional. Formats whose documentation is complete and freely available also have a higher likelihood of long-term preservation. If the program that created the file is the only option for reading or accessing the data, it is likely to be a proprietary, non-open format. As a general rule, plain text formats, such as comma- or tab- delimited files, are open formats and are typically better for re-use and long-term preservation.

Example of a proprietary format: Photoshop .psd file
Example of an open format: .tiff image file

Select "lossless" formats

Formats that compress the information in a file are often smaller, but the compression often permanently removes data from the file. These formats are "lossy," while formats that do not result in the loss of information when uncompressed are "lossless."

Example of lossy formats: .mp3 audio file, .jpeg image file
Example of lossless formats: .wav audio file, .tiff image file

Select unencrypted and uncompiled formats

If the encryption key, passphrase, or password to a file is lost, there may be no way to retrieve the data from the file later, rendering it unusable to others. Uncompiled source code is more readily re-usable by others and has a far greater likelihood of remaining usable over time since recompiling is possible on different architectures and platforms.

Creative Commons License

Adapted from the Research Data Management Service Group website (https://data.research.cornell.edu), Cornell University. Made available under a Creative Commons Attribution 4.0 International License. Retrieved from https://data.research.cornell.edu/content/file-formats .