Skip to Main Content

HathiTrust Digital Library: Algorithms

The HathiTrust Digital Library brings together the immense collections of partner institutions in digital form, preserving them securely to be accessed and used today, and in future generations.

Using Algorithms

After you have created a workset, go back to the HTRC Portal to run analyses of your workset(s):

1. Go back to the HTRC Portal homepage and click on “Text Analysis Algorithms” and then click “Execute an Algorithm” near the top of the page. You will be taken to this "Algorithms" page:

The "Algorithms" page provides a brief description of what each algorithm does. You should select your desired algorithm by clicking on it.

2. Click on "Meandre_OpenNLP_Entities_List." and you will be taken to the algorithm page shown here:

 

 

3. Enter a Job Name for this analysis. It will show up later as "Job Title" when you look at the results.

4. Select the workset THATCamptest@harrigreen in the drop-down list below the message "Please select a workset for analysis."

Note: Avoid running against a randomly chosen public workset, as many of the public worksets are very large, and large worksets may not run to completion within a reasonable amount of time, or may crash due to running out of memory space.

5. Click on submit to execute the algorithm → you will be taken to "Job Staging" screen, where you may have to refresh to see the status of your job. 

Viewing Your Results

Your workset may take a few minutes to process. On the analysis page, you will see the status of your job under "Active Jobs". 

When your results are ready, you will see the job under "Completed Jobs":

Now your results are ready to view! Click on the blue link under "Job Name" to view results.

 

Here is an example result using the TagCloud algorithm:

 

Now you can perform topic modeling, generate word clouds, and more! 

Try experimenting with different worksets. Run different algorithms, experimenting with different parameters when possible, to see how your results turn out.

Credit and Licensing

Credit

Adopted from A Guide to the HathiTrust Research Center from the University of Illinois Urbana-Champaign Library Scholarly Commons.

Licensing

Creative Commons License

Except where otherwise indicated, original content in this guide is licensed under a  Creative Commons Attribution (CC BY) 4.0 license. You are free to share, adopt, or adapt the materials. We encourage broad adoption of these materials for teaching and other professional development purposes, and invite you to customize them for your own needs.