Skip to content

Text Mining

What Support is Available for Text Mining?

Are you looking for patterns in large sets of text or researching ways to make sense of textual data using sentiment analysis, topic modeling, or more? Whether you’re new to text mining or stuck with text mining questions, we’re here to help!

Ask Us!

Get help from librarians by email, phone, 24/7 chat, or make an appointment with a subject expert

Resources

Text Mining Guide

Step-by-step guide on how to get started with your text mining project along with examples of past text mining projects from UW researchers and students.

List of Libraries Databases With Text Mining Permissions

Need a corpus for your text mining project? While not every UW Libraries database allows users to mine content, many do; ensure your text mining project follows copyright law by using these databases.

Tools

Web Scraping

Programming based – Beautiful Soup, Scrapy, Selenium
Commercial Software (Free/Paid) – Parse Hub, Dexi.io, Scraping-bot.io

Learn more about web scraping

Text Cleaning

TextClean – Collection of open-source tools for cleaning & normalizing text documents in R
OpenRefine – Open-Source data cleansing tool by Google
Trifacta Wrangler – Free tool for data preparation

Text Analytics & Visualization

Gale Digital Scholar Lab – Apply natural language processing tools to raw text data (OCR) from Gale Primary Sources in a single research platform.

ProQuest TDM Studio – text mine large sets of news, scholarly, and other publications UW Libraries licenses with ProQuest

Natural Language Toolkit – Industrial strength NLP libraries in Python

Orange Data Mining – Open-source and visual drag-and-drop tool to build interactive text-mining workflows

WordStat – Advanced Content Analysis

Apache OpenNLP – Document Categorizer and more

Rosette Text Analytics – Suite of interoperable components for text analytics

Learn more about text analytics and visualizations

Computing Resources

UW-IT Research Computing – For projects with large datasets or intensive workflows, UW-IT offers secure access to high-performance clusters, GPUs, cloud platforms, and expert guidance to help you plan and choose the right computing platform for your project.

Google Colab – Browser-based Jupyter notebooks (R and Python) that require no setup; great for quick prototypes and small projects. Free tiers include access to CPUs and, when available, GPUs/TPUs.

Workshops

Software Carpentry Workshops

Quarterly workshops to build skills in R or python are available through the UW eScience Institute. Free of charge.

Text Mining Crash Course

An asynchronous course that will equip you with foundational concepts, common tools, and essential techniques in text mining. 

Includes examples of past text mining projects from UW researchers and students.

For help with any of the topics mentioned above…