Skip to Main Content

Data Management

Data Documentation

Even with excellent organization structures, files and the data they contain can still be unclear, particularly if they have not been examined for long periods of time. Saving additional documentation to the folder in the form of plain text files can preserve greater context and meaning behind the dataset, providing you with an explanation when you do reuse the data.  You should be documenting when you need to know in order to understand and reuse your own data later.  Documenting your data means creating metadata that can be used to retrieve, reuse, and increase the longevity of your data.  Metadata is often defined as "data about data".  It is also known as data documentation. Metadata is used to describe and document research data.   

Descriptor information might include elements such as:

  • Creator
  • Title
  • Source
  • Methodology
  • Description
  • Location
  • Dates
  • Rights
  • Funder
  • Subjects
  • Format
  • Identifier (DOI or Handle)

You may want to use a data dictionary to add context and explain tabulated data, and this may can in the form of a "Readme" file.   Cornell University has developed a template for creating readme files.

Click below for a guide on on documentation:

​Colectica for Excel: a Microsoft Excel plug-in to add descriptive and background information to your spreadsheets.

Data Cleanup

Even after the end of a project, data may continue to be very useful.  You might use it to initiate a new project, or you may wish to share it to validate your findings.  When your project has concluded, you can take a few final steps to maintain the long-term accessibility of your data. Take care to store your data in a suitable location, and in a stable format.

Open Refine is an open source tool that can be used to clean and transform data from one format to another.

Controlled Vocabularies Quick Guide

Controlled vocabularies are systems of consistent terms for denoting particular entities or relationships within a given domain.   If you’ve browsed through the subject headings in a library catalog, for example, you’ve used a controlled vocabulary; those established terms make it possible for researchers to reliably recognize what they need and understand the relationships between different resources.  But controlled vocabularies describe more than just subject headings; they offer a consistent language for the background, composition, or methodology of a dataset.  Standardized terminology within a field allows for enhanced communication among researchers sharing their data, and can improve the re-discovery of data.  

Below are different systems of controlled vocabularies designed for various fields of research.   

Data Documentation Initiative

  • DDI Controlled Vocabularies
  • This metadata development group focuses on a standard descriptive language for use across the social sciences.  Their set of controlled vocabularies offers guidance on describing several different aspects of datasets, such as dates or the subjects of analyses.

Plant Ontology

  • Plant Ontology
  • This controlled vocabulary establishes terms to describe anatomy, morphology, and development for all plants.  

Chemical Entities of Biological Interest

  • ChEBI term lookup
  • This resource gives a vocabulary for molecular entities in chemical compounds; additionally, the vocabularly establishes terms for relationships among entities. 

PRO Protein Ontology

  • PRO browser
  • This vocabulary defines protein types and their relationships.

USGS Thesaurus

  • USGS Thesaurus browser and lookup
  • This resource from the United States Geological Survey provides tabs to find vocabularies for geographic features and areas, lithology, and marine planning.

Medical Subject Headings

  • MeSH browser
  • The National Library of Medicine allows researchers to browse the terms used in medical information classification.


Library of Congress

  • Library of Congress Subject Headings lookup
  • Used in libraries and institions across the United States, the Library of Congress uses a series of vocabularies for subjects, proper names, languages, relationships, and other characteristics.