Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Your browser is unsupported

We recommend using the latest version of IE11, Edge, Chrome, Firefox or Safari.

HathiTrust Digital Library: Glossary

The HathiTrust Digital Library brings together the immense collections of partner institutions in digital form, preserving them securely to be accessed and used today, and in future generations.

Glossary of Terms

  • Algorithms are executable programs that you can run on your workset. You can customize each algorithm's parameters.
  • An API, or Application Programming Interface, is a set of procedures that make data available for exchange. Users can retrieve HTRC volumes in bulk using the HTRC Data API within the HTRC Data Capsule environment.
  • corpus is a collection of texts. For example, Hathi Trust has nearly 4 million volumes in its public domain corpus.
  • Jobs are what you submit when you run algorithms in HTRC. You can view the status of the jobs that you have submitted and delete jobs.
  • Non-consumptive  research involves computational analysis of one or more books without the researcher having the ability to reassemble the collection. With this analytical approach, you can detect trends in a corpus (e.g. 19th century literature) through machine processing instead of reading a book or collection of books. See Franco Moretti's Graphs, Maps, Trees for more information.
  • Results are the results of your job(s). You can either view the results in HTRC or download them.
  • The Sandbox is a good place to begin working with Hathi Trust data and tools. It has hundreds of thousands public domain volumes available as data.
  • Topic Modeling is a process that involves locating the major themes of a large volume of texts by identifying topics, or groups of words that frequently appear together.
  • Worksets are collections of volumes and other data to be processed.

Credit and Licensing

Credit

Adopted from A Guide to the HathiTrust Research Center from the University of Illinois Urbana-Champaign Library Scholarly Commons.

Licensing

Creative Commons License

Except where otherwise indicated, original content in this guide is licensed under a  Creative Commons Attribution (CC BY) 4.0 license. You are free to share, adopt, or adapt the materials. We encourage broad adoption of these materials for teaching and other professional development purposes, and invite you to customize them for your own needs.