Skip to Main Content

Metadata

A guide providing an overview of metadata, gathering different metadata standards and best data documentation practices.

Data Documentation

Basic elements of data documentation and best practices to follow while conducting your research. 

Basic Elements of Data Documentation

  1. Title - The name of the research project or dataset.
  2. Creator - Names of individuals who created the dataset, including organizational affiliation.
  3. Dates - The date range during which the data was collected, processed, and/or modified, as well as the dates to which the data pertain.

  4. Methodology - The process by which the data was created or captured, including any code, software, equipment, or protocols used.
  5. Subjects or Keywords - Words of phrases which describe the type or content of the data, the location in which it was collected, as well as the discipline or domain to which it pertains.
  6. Funders - The agencies or organizations which funded the research that produced the dataset.
  7. Rights Statement - Information regarding conditions governing access to or use of the data, as well as who holds the intellectual property rights for the data.
  8. Unique Identifiers - Any name, number, or alpha-numeric text string used to uniquely identify the project or dataset, including grant numbers or internal reference numbers.

Levels of Metadata

For a given research project, metadata are generally created at two levels: project- and data-level. Project-level metadata describes the “who, what, where, when, how and why” of the dataset, which provides context for understanding why the data were collected and how they were used.

Examples of project-level metadata are: 

  1. Name of the project
  2. Dataset title
  3. Project description
  4. Dataset abstract
  5. Principal investigator and collaborators
  6. Contact information
  7. Dataset handle (DOI or URL)
  8. Dataset citation
  9. Data publication date
  10. Geographic description
  11. Time period of data collection
  12. Subject/keywords
  13. Project sponsor
  14. Dataset usage rights

Dataset level metadata are more granular. They explain, in much better detail, the data and dataset.

Dataset level metadata might include: 

  1. Data origin: experimental, observational, raw or derived, physical collections, models, images, etc.
  2. Data type: integer, Boolean, character, floating point, etc.
  3. Specialized tools: microscopes, cameras, etc.
  4. Data acquisition details: sensor deployment methods, experimental design, sensor calibration methods, etc.
  5. File type: CSV, mat, xlsx, tiff, HDF, NetCDF, etc.
  6. Data processing methods, software used
  7. Data processing scripts or codes
  8. Dataset parameter list, including
    • Variable names
    • Description of each variable
    • Units

Data Documentation Good Practice

During your research, document all research data formats utilized by your project. Research data comes in many varied formats, such as:

  • Text - flat text files, Word, Portable Document Format (PDF), Rich Text Format (RTF), Extensible Markup Languague (XML).
  • Numerical - Statistical Package for the Social Sciences (SPSS), Stata, Excel.
  • Multimedia - jpeg, tiff, dicom, mpeg, quicktime.
  • Models - 3D, statistical.
  • Software - Java, C.
  • Discipline specific - Flexible Image Transport System (FITS) in astronomy, Crystallographic Information File (CIF) in chemistry.
  • Instrument specific - Olympus Confocal Microscope Data Format, Carl Zeiss Digital Microscopic Image Format (ZVI).

Dataset Documentation:

  • Variable names, and descriptions
  • Explanation of codes and classification schemes used
  • Algorithms used to transform data
  • File format 
  • Software - version, OS

Additional Resources