Text Mining
What Support is Available for Text Mining?
Are you looking for patterns in large sets of text or researching ways to make sense of textual data using sentiment analysis, topic modeling, or more? Whether you’re new to text mining or stuck with text mining questions, we’re here to help!
Ask Us!
Get help from librarians by email, phone, 24/7 chat, or make an appointment with a subject expert
Resources
Text Mining Guide
Step-by-step guide on how to get started with your text mining project along with examples of past text mining projects from UW researchers and students.
List of Libraries Databases With Text Mining Permissions
Need a corpus for your text mining project? While not every UW Libraries database allows users to mine content, many do; ensure your text mining project follows copyright law by using these databases.
Tools
Web Scraping
Programming based – Beautiful Soup, Scrapy, Selenium
Commercial Software (Free/Paid) – Parse Hub, Dexi.io, Scraping-bot.io
Text Cleaning
TextClean – Collection of open-source tools for cleaning & normalizing text documents in R
OpenRefine – Open-Source data cleansing tool by Google
Trifacta Wrangler – Free tool for data preparation
Text Analytics & Visualization
Gale Digital Scholar Lab – Apply natural language processing tools to raw text data (OCR) from Gale Primary Sources in a single research platform.
ProQuest TDM Studio – text mine large sets of news, scholarly, and other publications UW Libraries licenses with ProQuest
Natural Language Toolkit – Industrial strength NLP libraries in Python
Orange Data Mining – Open-source and visual drag-and-drop tool to build interactive text-mining workflows
WordStat – Advanced Content Analysis
Apache OpenNLP – Document Categorizer and more
Rosette Text Analytics – Suite of interoperable components for text analytics
Computing Resources
UW-IT Research Computing – For projects with large datasets or intensive workflows, UW-IT offers secure access to high-performance clusters, GPUs, cloud platforms, and expert guidance to help you plan and choose the right computing platform for your project.
Google Colab – Browser-based Jupyter notebooks (R and Python) that require no setup; great for quick prototypes and small projects. Free tiers include access to CPUs and, when available, GPUs/TPUs.
Workshops
Software Carpentry Workshops
Quarterly workshops to build skills in R or python are available through the UW eScience Institute. Free of charge.
Text Mining Crash Course
An asynchronous course that will equip you with foundational concepts, common tools, and essential techniques in text mining.
Includes examples of past text mining projects from UW researchers and students.
