Text Mining with Voyant

1. Download the file of transcriptions of documents in your soldier’s pension file

NB The files do not contain transcriptions of every document in the file, and vary considerably in length. In addition, some of the files were created from images of typewritten pages using OCR software, which does not always recognize words correctly, producing errors in the transcriptions

2. Open Voyant in your browser: http://voyant-tools.org

3. Upload the file (its a txt file; Voyant can also read pdfs, doc and other file types)

4. When Voyant opens, the screen is divided into 5 windows, each containing a tool: Cirrus, Reader, Trends, Summary & Contexts

Opening

  • Rolling your cursor over the bar that appears next to each tool reveals the options for that tool

Stopwords

  • the left-hand button is for export (which includes opening that window in a tab of its own)
  • the middle button allow you to choose another tool to open in this window
  • the right hand button – which is only available for some tools – provides options to adjust the tool
  • ‘?’ provides help for each tool
CIRRUS (WORD CLOUD)

cirrus

  • A word cloud by default displays every word in the document, including common words that appear in every text. To produce a more revealing visualization, a list of common words — called stop words — is automatically removed from the visualization. You can edit that list by clicking on the option button in the Cirrus tool. The window below will appear:

cirrus options

  • Click Edit List
  • Scroll to the bottom of the list
  • Add additional stop words, each on its own line — your soldier’s first and last name, and any other words that appear that do not seem significant (your soldier’s state of origin or regiment?)
  • Click Save
  • Click Confirm
  • The Word cloud will now reformat with the words you added removed – and since we left the ‘apply globally’ box ticked, the words you added will also disappear from the other tools
  • Note: some of the txt files were created from images of typewritten pages using OCR software, which does not always recognize words correctly, producing errors in the transcriptions
  • The Terms slider on the bottom left of the Cirrus tool allows you to adjust the number of words that appear in the cloud
  • If you roll your cursor over a word in the cloud, the number of times the word appears in the document will appear
    • If you click on Terms in the menu bar (next to Cirrus), the window changes to a list of words and their frequency in the document

Terms

  • If you click on a word in the cloud, a graph of its frequency across the document will appear in the Reader and Trends windows, and it will appear in the Phrases tool in the Summary Window
  • If you click on Links in the menu bar, the window changes to a visualization of the highest frequency words that occur close to the specified search terms
    • Click on a word to highlight its collocated words
    • Double click on a word to fetch more words that appear close to it
    • The Context slider on the bottom right adjusts how close to the search terms the words that appear are  – if you slide it to the right, words further away are included

LInks

Don’t stop with the word cloud – the other tools can help make sense of the words that appear in the word cloud
SUMMARY
  • Provides a summary of the words in the document including the most frequent (you can adjust how many words appear using the Item slider on the bottom right)

Summary

  • If you click on Phrases, the window changes to a list of the phrases that reoccur in the document, starting with the longest and most frequent
    • If you select a word in the word cloud, the Phrases window will display a list of phrases that feature that word

Phrases

CONTEXTS
  • Shows all of the appearances of a selected word in context – with the words to its left and right

Contexts

  • Select a word by clicking on it in the Reader window (clicking on a word in the Cirrus word cloud does not adjust this tool)
  • Clicking on the + to the left of an example opens a window showing more of the text surrounding that example of a word

Context_opened

  • The Context slider on the bottom of the window adjusts how many words surrounding the search term are displayed sets how many words of surrounding text are shown if you click on the ‘+’ for each appearance of the word
READER
  • Shows all the text of the document — when you double click on a word:
    • all examples of the word will be highlighted in the document, and the frequency will be graphed at the bottom of the window
    • all examples of that word will also appear in the Contexts window
COMPARING DOCUMENTS

Voyant also allows you to compare documents – so you could compare your soldier’s transcriptions with those of a soldier being researched by another group

  • To add another document, select Documents in the Summary window

Documenta

  • Click the Modify button on the bottom of the window
  • A window opens which allows you to edit which documents you are analyzing – click the Add button at the bottom to add another soldier’s transcriptions

Modify

  • Download another transcription file using the links at the top of this tutorial, and then upload it

 

  • The Summary window will now show separate information on the two documents, and on the combined documents (which Voyant calls the corpus)
    • If you click on the title of a specific document, or a word from a specific document, that document will appear in the Reader window, and the word will appear in the Context window
    • A new summary appears showing the most frequent words that are distinctive to each document (not working on 6/21/16)
  • A Scale button now appears at the bottom of the Cirrus, Trends and Contexts windows – click on that button to select whether the tool will display the corpus or a specific document
    • For example, you can use the Scale button to create a word cloud of a specific document – and export each to a separate window to allow you to compare them

Scale

EXPORT

export button

  • To export the contents of a window, roll over the menu bar for that window and select the middle icon

Export

  • The window opens with the selected option to open a new browser tab containing just this tool
  • The Export View option provides code to allow you to embed an interactive view of this window in a web page (including your Omeka exhibit)
  • The Export Visualization option generates a static image of the window – note, many operating systems do not read the SVG format

See also: http://docs.voyant-tools.org/workshops/dh2015/