Tutorials
Distant Reader
The Distant Reader is a web-based text analysis toolset for “reading” and analyzing texts. It takes unstructured data (text) as input, and it outputs sets of structured data for analysis. The Distant Reader is intended to supplement the traditional reading process by simplifying the process of identifying trends and anomalies in large volumes of text.
By the end of this tutorial, you will be able to:
- Identify the types of problems the Distant Reader addresses
- Submit content to the Distant Reader and download results
- Compare some methods for interpreting your downloaded results
OpenRefine
OpenRefine (previously Google Refine) is a powerful, opensource tool for working with messy data. Key features include cleaning up small to large mistakes en masse, converting data from one format to another, and adding to a dataset by pulling data from an online source into your dataset.
By the end of this tutorial, you will be able to:
- Use Facets to clean up messy data
- Filter and export a specific portion of your dataset
Voyant
Voyant Tools is an open-source, web-based text reading and analysis environment. When you upload or submit text into Voyant, it generates a corpus with a plethora of different tools built right into the program to help you quickly “see through your data,” as the Voyant catchphrase states.
By the end of this tutorial, you will be able to:
- Understand the basic functions of Voyant
- Create a default corpus using Voyant
Distant Reader + OpenRefine
This tutorial overviews how to use the Distant Reader and OpenRefine together. Please complete or review the Distant Reader and OpenRefine tutorials before proceeding.
By the end of this tutorial, you will be able to:
- Identify and locate tab-delimited files in a Distant Reader study carrel
- Facet, filter, and export study carrel data using OpenRefine
spaCy
spaCy is a free, open-source library for advanced Natural Language Processing (NLP) in Python. If you’re working with a lot of text, you’ll eventually want to know more about it.
By the end of this tutorial, you will be able to:
- Understand spaCy’s approach to natural language processing
- Pre-process a string of text using spaCy
- Pull linguistic annotations for further analysis
- Use spaCy’s built-in features to visualize dependencies and entities