Skip to Main Content

INDIGO

Selecting the Data set

Selecting Data

The first step to depositing data is determining what data to deposit. Research projects often generate a lot of data throughout the life of the project. It is not always feasible to deposit all the data from the project. When selecting data to deposit in INDIGO, you should consider:

  • the importance of the data
  • the reusability of the data
  • the necessity of the data to validating research results

In addition, you must address whether the data includes personally identifiable information and whether you have the rights to make the dataset public.

Describing your Data in INDIGO (metadata)

As part of actual submission process in INDIGO, you will be required to provide metadata in specific form fields. (Step by Step Guide on Uploading Datasets in INDIGO)   This includes:

  • Title (Name of the project) -  Enter a title that is meaningful and reflects the content that you are uploading.  
    • e.g.  Kilimanjaro glacier melting data
    • Do Not enter titles such as “data set for article” or “supplementary data” as this does not describe the content and will not support data reuse.
  • Authors: enter the names of the dataset authors/ owners
  • Categories [Subject/Discipline categories]
  • Item Type: Select Dataset
  • Keywords [Author selected keywords]
  • Description: Provide sufficient detail to enable others to easily understand whether the data is of interest. It is also good to include information about your use policies, data characteristics and preservation plan. Information to include here include:

    • How was the data collected, what type of data (i.e. what was measured, how was data reported / measured), time period(s) of data collection, examples data fields found in data.
    • Contact information (i.e. email address) to request permission to access the data or location of the data and information on how to request access. 
      • Required for metadata only records
    • Data usage rights or data use agreement requirements, including IRB approval requirements 
      • Required for metadata only records
    • Information on how long the data set will be stored in its current location (i.e. is there a date when support to maintain the data will expire).  
      • Required for metadata only records
  • License (copyright, creative commons agreement)

Optional fields include:

  • Related Materials(strongly recommended) related materials is optional but it is a great way to link to material to provide more context for your data, such as the article(s) that were published related to your data.   Select Identifier Type - DOI to enter the title and DOI of the articles that the dataset you uploaded supplements.
  • Funding (recommended if dataset resulted from funded research)

To the right of the INDIGO submission screen, there are Item actions.  

  • You can Add an embargo to delay your dataset from being publicly available.
  • By clicking Share with private link, you can obtain a link that will allow someone to access the files, even when the item is not published.   This is useful if you are uploading data for a journal publication and you don’t want to publish the data until the article is published.   This allows the publisher to see the data is ready to be published.
  • Click Manage Identifiers to Reserve a DOI.    DOI are automatically assigned when items are published in INDIGO.  However, you can obtain the DOI in advance.   This is useful to share with publishers and grant funders in advance of the data becoming available.
  • Click Edit timeline in order to enter the publication data of the item.   If is has not been published before, it is ok to select today’s date to reflect the data it was uploaded in INDIGO.

Also review Preparing Documentation (README files, Data Dictionaries, sample data) for further information on how to provide sufficient information and details to facilitate data re-use.  

Preparing Documentation (README file, Data Dictionaries, data use policies, sample data)

README file

A README file will be expected as part of any data or software file deposit to INDIGO. A README file will provide context to your dataset, independent of any explanation that may be found elsewhere, such as in a publication or other web-based resources.   You may have a README file for each dataset, or one file that explains all datasets.  A README file will contain informations such as data origin (how was the data collected), data type, instruments used to collect the data, data acquisition details, file type (csv, mat, xlsx, tiff, ect), data processing methods and software used, and data processing scripts or codes, data parameters (variable names, descriptions, units;  use also may use a data dictionary to explain some of this information.    See What is metadata? for more information (Oregon state).

Cornell University has developed a template for creating README files that you may use to prepare your deposit for INDIGO.  Oregon State has also created a template for creating README files that you may use.  

Data Dictionaries

  • It may also be appropriate to include a data dictionary with your data deposit.   If you your data uses codes or abbreviations, or you need to explain the meaning of terms, the relationship of the data to other data, the data origin, or the format of the data, you will want a data dictionary to define these terms and explain the source of the data.
  • For more on data dictionaries and for examples, see: Data Dictionaries.

Data Use Policies or Agreements

Include information on any data use policies associated with the dataset.   This may be a requirement of the funder or there may be expectations for using the dataset (i.e. signing a data use agreement, receiving IRB approval)

Sample Data  (for metadata only records created in INDIGO)

If you are not supplying the data itself in INDIGO (either because its too big or their are privacy issues) it is a good idea to share a small sample of  the data so users interested in the data have a sense of what is available prior to taking the steps to request the data.    If there privacy issues related to your data, be sure to de-identify and fully anonymize the the sample data before sharing it.

Formatting and Deposit Size

INDIGO will accept any file format. To facilitate basic preservation services, compressed data is discouraged (.zip, .gz, tar.gz). File format recommendations for preservation can be found in the Library of Congress Recommended Formats Statement (https://www.loc.gov/preservation/resources/rfs/).

If you are working with proprietary or less-sustainable formats, consider converting your data to an open, widely-used format when you save and share your data. Many software programs allow for converting datasets into open formats (e.g. save SPSS dataset as CSV). This will better ensure that your data is accessible and usable by yourself and others and into the future.

There are some constraints on the size of files deposited:

  • The maximum size of a file uploaded through the online interface is 5GB.  
  • Each individual has a 2 GB space limit by default in INDIGO.   It is possible to increase your space up to 10GB.  Please contact indigo@uic.edu to request an increase in your space allotment in INDIGO.

File Names

File names should have meaning. This means that the content of a file can be identified based on the name of the file, in addition to indicating how it might differ from other files in the data set. Your files names could be based on important elements of your project such as: specimen, dates and times, location, testing conditions or variables, file visioning numbers, or other relevant information.  If you have multiple files, as part of your ReadMe file (described below), provide an explanation of your file naming convention and / or a description of the contents of each file.

Some other consideration for your file names:

  • avoid special characters
  • limit the use of periods. Use Periods (.) to separate the file name from the file extension.
  • limit the use of spaces. Use instead dashes (-) or underscores (_).
  • Use existing Standards when possible. For example, there is a standard for how dates and times should be recorded:  ISO 8601.

Requirements for Metadata Only Records

Please see Metadata Only Records for additional requirements for metadata only records.

Conditions for Deposit

Research data from all fields, subjects, and disciplines at the University of Illinois at Chicago may be published and/or archived in INDIGO, provided the above conditions are met. All submissions will undergo Curation Review before being permanently deposited to INDIGO. Curation Review is when UIC librarians review the submitted files. Curation Review focuses on how the submitted data is documented and organized for preservation and re-use purposes. If your submission is declined, it indicates that revisions are required before it can be deposited to INDIGO or that there may be a confidentiality or other concern. You will be contacted with information related to the required revisions. You may request the assistance of a librarian to provide advice on needed revisions. If you have any questions about this, please email  INDIGO@uic.edu.

Fair Principles

Please follow the FAIR Principles as you prepare your data for deposit.  https://www.go-fair.org/fair-principles/

  • Findable: The first step in (re)using data is to find them. Metadata and data should be easy to find for both humans and computers. 
  • Accessible: Once the user finds the required data, they need to know how they can be accessed, possibly including authentication and authorisation.
  • Interoperable: The data usually need to be integrated with other data. In addition, the data need to interoperate with applications or workflows for analysis, storage, and processing.
  • Reusable: The ultimate goal of FAIR is to optimise the reuse of data. To achieve this, metadata and data should be well-described so that they can be replicated and/or combined in different settings.

Step by Step Guide for Depositing Data in INDIGO