Text mining is the process or practice of examining large collections of written resources in order to generate new information or insights, typically using specialized computer software.
Text mining can be used to analyze huge amounts of text that would be arduous to read and remember for the human brain. Text mining can also be used to compare two works to each other, to look for similarities and differences. This technique can also be used to track language change over time in a corpora.
Text mining can be an excellent way to get a sense of how similar texts are to each other. Using the cosine similarity score algorithm in the text mining tool SameDiff, Elizabeth Crowley Webber, Tess Henthorne, and Bridget Sellers, students in ENGL 693: Intro to Digital Humanities, compared the scripts of several witchcraft films to determine which films were the most and least similar.
Text Mining can also be used for distant reading, where a researcher uses software to look for patterns in a large corpus of text that would be difficult or impossible to analyze all at once through traditional means of reading. Students in Dr. Jenifer Rosales’ JUPS 299: Research Methods used the tool Voyant to do a distant reading of all of Martin Luther King Jr.’s speeches as part of Georgetown's Teach the Speech initiative.
The Library's Text Mining webpage has more tools and resources. Workshops on various text mining topics and tools are hosted each Fall and Spring semester. For a one-on-one consultation on data visualization, email firstname.lastname@example.org.