Even after the end of a project, data may continue to be very useful. You might use it to initiate a new project, or you may wish to share it to validate your findings. When your project has concluded, you can take a few final steps to maintain the long-term accessibility of your data. Take care to store your data in a suitable location, and in a stable format.
After analysis—or even after your initial data collection—you may find some of your data disorganized or messy. To easily and quickly clean your datasets, you may find OpenRefine helpful. An open-source, browser-based software, OpenRefine can be used for tasks as simple as standardizing terms and inputs, or as sophisticated as harvesting Twitter data.
Data is not only lost if hardware is lost or damaged; files that can’t be opened are just as permanently lost as hard drives that fail. While commercial software can be indispensable during the research and analysis process, future changes to the software or limited availability can make files stored long-term difficult to access. Researchers are very strongly encouraged to convert their files to stable, non-proprietary formats for long-term storage. For example, if you have completed your analysis in Microsoft Excel, convert those files to .csv files.
It is important to ensure that will be able to retrieve your data after the project if necessary. While you may have access to your personal machine, hardware failure may hinder that access. Consider finding a more permanent and secure home for your data. If your project was grant-funded, you may be required under the terms of the award to store the data long-term on servers within your department. If departmental storage is not available to you, consider submitting your data to a repository (for more information on data repositories, see the Data Sharing section of this guide).
If an alternative can't be found, researchers may also wish to store their datasets on Indigo, UIC's institutional repository. Indigo will accept datasets smaller than 1GB.
Perhaps most importantly, researchers should regularly back up their data. Missing data due to hardware failure, theft, or loss may be difficult and time-consuming to retrieve, or it may not be possible to replicate. Keeping multiple copies of your data is the best prevention against loss; the more copies you have, the safer your data is. Take care to save copies of your raw data in addition to analyzed data.
It is recommended that you maintain at least three copies:
It can also be helpful to maintain a backup copy in a stable, common-use file format. More on file formats for storage can be found in the sections on data preservation..