Skip to Main Content

Data Management

Storing and Preserving Data

Life After the Project

Even after the end of a project, data may continue to be very useful.  You might use it to initiate a new project, or you may wish to share it to validate your findings.  When your project has concluded, you can take a few final steps to maintain the long-term accessibility of your data. Take care to store your data in a suitable location, and in a stable format.

Data Back-Up

Perhaps most importantly, researchers should regularly back up their data. Missing data due to hardware failure, theft, or loss may be difficult and time-consuming to retrieve, or it may not be possible to replicate. Keeping multiple copies of your data is the best prevention against loss; the more copies you have, the safer your data is. Take care to save copies of your raw data in addition to analyzed data.

It is recommended that you maintain at least three copies:

  • Working version on a Principal Investigator's machine
  • An external hard drive or other local, on-campus machine
  • An off-campus location

It can also be helpful to maintain a backup copy in a stable, common-use file format. More on file formats for storage can be found in the sections on data preservation.

Data Cleanup

Even after the end of a project, data may continue to be very useful.  You might use it to initiate a new project, or you may wish to share it to validate your findings.  When your project has concluded, you can take a few final steps to maintain the long-term accessibility of your data. Take care to store your data in a suitable location, and in a stable format.

Open Refine is an open source tool that can be used to clean and transform data from one format to another.

File Formats for Long-Term Access

As technology changes, researchers should plan for both hardware and software obsolescence and consider the longevity of their file format choices to ensure long term readability and access.

File formats more likely to be accessible in the future have the following characteristics:

  • Non-proprietary
  • Open, documented standard
  • Common usage by research community
  • Standard representation (ASCII, Unicode)
  • Unencrypted
  • Uncompressed

Examples of preferred file format choices include:

  • ODF, not Word
  • ASCII, not Excel
  • MPEG-4, not Quicktime
  • TIFF or JPEG2000, not GIF or JPG
  • XML or RDF, not RDBMS

From MIT - Creative Commons Attribution Non-Commercial License

Long-term Storage

It is important to ensure that will be able to retrieve your data after the project if necessary. While you may have access to your personal machine, hardware failure may hinder that access. Consider finding a more permanent and secure home for your data. If your project was grant-funded, you may be required under the terms of the award to store the data long-term on servers within your department. If departmental storage is not available to you, consider submitting your data to a repository (for more information on data repositories, see the Data Sharing section of this guide).

If an alternative can't be found, researchers may also wish to store their datasets on Indigo, UIC's institutional repository. Indigo will accept datasets smaller than 1GB.