Text Mining

What is Text Mining?

Text mining is the process or practice of examining large collections of written resources in order to generate new information or insights, typically using specialized computer software.

Why use Text Mining?

Text mining can be used to analyze huge amounts of text that would be arduous to read and remember for the human brain. Text mining can also be used to compare two works to each other, to look for similarities and differences. This technique can also be used to track language change over time in a corpora.

What can I do with Text Mining?

Compare texts

Text mining can be an excellent way to get a sense of how similar texts are to each other. Using the cosine similarity score algorithm in the text mining tool SameDiff, Elizabeth Crowley Webber, Tess Henthorne, and Bridget Sellers, students in ENGL 693: Intro to Digital Humanities, compared the scripts of several witchcraft films to determine which films were the most and least similar.

Look for language patterns in a big chunk of text

Text Mining can also be used for distant reading, where a researcher uses software to look for patterns in a large corpus of text that would be difficult or impossible to analyze all at once through traditional means of reading. Students in Dr. Jenifer Rosales’ JUPS 299: Research Methods used the tool Voyant to do a distant reading of all of Martin Luther King Jr.’s speeches as part of Georgetown's Teach the Speech initiative.

How do I get started with Text Mining?

The Library's Text Mining webpage has more tools and resources. Workshops on various text mining topics and tools are hosted each Fall and Spring semester.