Tutorials

Distant Reader

The Distant Reader is a web-based text analysis toolset for “reading” and analyzing texts. It takes unstructured data (text) as input, and it outputs sets of structured data for analysis. The Distant Reader is intended to supplement the traditional reading process by simplifying the process of identifying trends and anomalies in large volumes of text.

By the end of this tutorial, you will be able to:

OpenRefine

OpenRefine (previously Google Refine) is a powerful, opensource tool for working with messy data. Key features include cleaning up small to large mistakes en masse, converting data from one format to another, and adding to a dataset by pulling data from an online source into your dataset.

By the end of this tutorial, you will be able to:

Voyant

Voyant Tools is an open-source, web-based text reading and analysis environment. When you upload or submit text into Voyant, it generates a corpus with a plethora of different tools built right into the program to help you quickly “see through your data,” as the Voyant catchphrase states.

By the end of this tutorial, you will be able to:

Distant Reader + OpenRefine

This tutorial overviews how to use the Distant Reader and OpenRefine together. Please complete or review the Distant Reader and OpenRefine tutorials before proceeding.

By the end of this tutorial, you will be able to:

spaCy

spaCy is a free, open-source library for advanced Natural Language Processing (NLP) in Python. If you’re working with a lot of text, you’ll eventually want to know more about it.

By the end of this tutorial, you will be able to: