Text Mining

What Support is Available for Text Mining?

Are you looking for patterns in large sets of text or researching ways to make sense of textual data using sentiment analysis, topic modeling, or more? Whether you’re new to text mining or stuck with text mining questions, we’re here to help!

Contact our Text Mining Specialist

Ask Us!

Get help from librarians by email, phone, 24/7 chat, or make an appointment with a subject expert

Ask Us!

Resources

Text Mining Guide

Step-by-step guide on how to get started with your text mining project along with examples of past text mining projects from UW researchers and students.

View the Text Mining Guide

List of Libraries Databases With Text Mining Permissions

Need a corpus for your text mining project? While not every UW Libraries database allows users to mine content, many do; ensure your text mining project follows copyright law by using these databases.

View List of Databases With Permissions

Tools

Web Scraping

Programming based – Beautiful Soup, Scrapy, Selenium
Commercial Software (Free/Paid) – Parse Hub, Dexi.io, Scraping-bot.io

Learn more about web scraping

Text Cleaning

TextClean – Collection of open-source tools for cleaning & normalizing text documents in R
OpenRefine – Open-Source data cleansing tool by Google
Trifacta Wrangler – Free tool for data preparation

Text Analytics & Visualization

Gale Digital Scholar Lab – Apply natural language processing tools to raw text data (OCR) from Gale Primary Sources in a single research platform.

ProQuest TDM Studio – text mine large sets of news, scholarly, and other publications UW Libraries licenses with ProQuest

Natural Language Toolkit – Industrial strength NLP libraries in Python

Orange Data Mining – Open-source and visual drag-and-drop tool to build interactive text-mining workflows

WordStat – Advanced Content Analysis

Apache OpenNLP – Document Categorizer and more

Rosette Text Analytics – Suite of interoperable components for text analytics

Learn more about text analytics and visualizations

Computing Resources

UW-IT Research Computing – For projects with large datasets or intensive workflows, UW-IT offers secure access to high-performance clusters, GPUs, cloud platforms, and expert guidance to help you plan and choose the right computing platform for your project.

Google Colab – Browser-based Jupyter notebooks (R and Python) that require no setup; great for quick prototypes and small projects. Free tiers include access to CPUs and, when available, GPUs/TPUs.