Tutorials

Distant Reader

The Distant Reader is a web-based text analysis toolset for “reading” and analyzing texts. It takes unstructured data (text) as input, and it outputs sets of structured data for analysis. The Distant Reader is intended to supplement the traditional reading process by simplifying the process of identifying trends and anomalies in large volumes of text.

By the end of this tutorial, you will be able to:

Identify the types of problems the Distant Reader addresses
Submit content to the Distant Reader and download results
Compare some methods for interpreting your downloaded results

OpenRefine

OpenRefine (previously Google Refine) is a powerful, opensource tool for working with messy data. Key features include cleaning up small to large mistakes en masse, converting data from one format to another, and adding to a dataset by pulling data from an online source into your dataset.

By the end of this tutorial, you will be able to:

Use Facets to clean up messy data
Filter and export a specific portion of your dataset

Voyant

Voyant Tools is an open-source, web-based text reading and analysis environment. When you upload or submit text into Voyant, it generates a corpus with a plethora of different tools built right into the program to help you quickly “see through your data,” as the Voyant catchphrase states.

By the end of this tutorial, you will be able to:

Understand the basic functions of Voyant
Create a default corpus using Voyant

Distant Reader + OpenRefine

This tutorial overviews how to use the Distant Reader and OpenRefine together. Please complete or review the Distant Reader and OpenRefine tutorials before proceeding.

By the end of this tutorial, you will be able to:

Identify and locate tab-delimited files in a Distant Reader study carrel
Facet, filter, and export study carrel data using OpenRefine

spaCy

spaCy is a free, open-source library for advanced Natural Language Processing (NLP) in Python. If you’re working with a lot of text, you’ll eventually want to know more about it.

By the end of this tutorial, you will be able to:

Understand spaCy’s approach to natural language processing
Pre-process a string of text using spaCy
Pull linguistic annotations for further analysis
Use spaCy’s built-in features to visualize dependencies and entities

VMW DH Tool Tutorials

This site contains introductory tutorials for working with selected DH Tools

Tutorials

Distant Reader

OpenRefine

Voyant

Distant Reader + OpenRefine

spaCy