Text Mining with Bag-of-Words in R

Learn the bag of words technique for text mining with R.

Start Course for Free

4 hours15 videos69 exercises43,091 learnersStatement of Accomplishment

Create Your Free Account

Google LinkedIn Facebook

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.

Training 2 or more people?

Try DataCamp for Business

Loved by learners at thousands of companies

Course Description

It is estimated that over 70% of potentially usable business information is unstructured, often in the form of text data. Text mining provides a collection of techniques that allows us to derive actionable insights from unstructured data. In this course, we explore the basics of text mining using the bag of words method. The first three chapters introduce a variety of essential topics for analyzing and visualizing text data. The final chapter allows you to apply everything you've learned in a real-world case study to extract insights from employee reviews of two major tech companies.

For Business

Training 2 or more people?

Get your team access to the full DataCamp platform, including all the features.

In the following Tracks

Text Mining in R

Go To Track

1
Jumping into Text Mining with Bag-of-Words
Free
In this chapter, you'll learn the basics of using the bag-of-words method for analyzing text data.
Play Chapter Now
What is text mining?
50 xp
Understanding text mining
50 xp
Quick taste of text mining
100 xp
Getting started
50 xp
Load some text
100 xp
Make the vector a VCorpus object (1)
100 xp
Make the vector a VCorpus object (2)
100 xp
Make a VCorpus from a data frame
100 xp
Cleaning and preprocessing text
50 xp
Common cleaning functions from tm
100 xp
Cleaning with qdap
100 xp
All about stop words
100 xp
Intro to word stemming and stem completion
100 xp
Word stemming and stem completion on a sentence
100 xp
Apply preprocessing steps to a corpus
100 xp
The TDM & DTM
50 xp
Understanding TDM and DTM
50 xp
Make a document-term matrix
100 xp
Make a term-document matrix
100 xp
2
Word Clouds and More Interesting Visuals
This chapter will teach you how to visualize text data in a way that's both informative and engaging.
Play Chapter Now
Common text mining visuals
50 xp
Test your understanding of text mining
50 xp
Frequent terms with tm
100 xp
Frequent terms with qdap
100 xp
Intro to word clouds
50 xp
A simple word cloud
100 xp
Stop words and word clouds
100 xp
Plot the better word cloud
100 xp
Improve word cloud colors
100 xp
Use prebuilt color palettes
100 xp
Other word clouds and word networks
50 xp
Find common words
100 xp
Visualize common words
100 xp
Visualize dissimilar words
100 xp
Polarized tag cloud
100 xp
Visualize word networks
100 xp
Teaser: simple word clustering
100 xp
3
Adding to Your TM Skills
In this chapter, you'll learn more basic text mining techniques based on the bag of words method.
Play Chapter Now
Simple word clustering
50 xp
Test your understanding of text mining
50 xp
Distance matrix and dendrogram
100 xp
Make a dendrogram friendly TDM
100 xp
Put it all together: a text-based dendrogram
100 xp
Dendrogram aesthetics
100 xp
Using word association
100 xp
Getting past single words
50 xp
N-gram tokenization
50 xp
Changing n-grams
100 xp
How do bigrams affect word clouds?
100 xp
Different frequency criteria
50 xp
Changing frequency weights
100 xp
Capturing metadata in tm
100 xp
4
Battle of the Tech Giants for Talent
This chapter ties everything together with a case study in text mining for HR analytics.
Play Chapter Now
Amazon vs. Google
50 xp
Organizing a text mining project
50 xp
Step 1: Problem definition
50 xp
Step 2: Identifying the text sources
100 xp
Step 3: Text organization
50 xp
Text organization
100 xp
Working with Google reviews
100 xp
Steps 4 & 5: Feature extraction & analysis
50 xp
Feature extraction & analysis: amzn_pros
100 xp
Feature extraction & analysis: amzn_cons
100 xp
amzn_cons dendrogram
100 xp
Word association
100 xp
Quick review of Google reviews
100 xp
Cage match! Amazon vs. Google pro reviews
100 xp
Cage match, part 2! Negative reviews
100 xp
Step 6: Reach a conclusion
50 xp
Draw conclusions, insights, or recommendations
50 xp
Draw another conclusion, insight, or recommendation
50 xp
Finished!
50 xp

For Business

Training 2 or more people?

Get your team access to the full DataCamp platform, including all the features.

In the following Tracks

Text Mining in R

Go To Track

datasets

Coffee tweets Chardonnay tweets Anonymous online reviews: Amazon Anonymous online reviews: Google

collaborators

Nick Carchedi

Tom Jeon

prerequisites

Intermediate R

Ted Kwartler

Adjunct Professor, Harvard University

Ted Kwartler is the VP, Trusted AI at DataRobot. At DataRobot, Ted sets product strategy for explainable and ethical uses of data technology in the company's application. Ted brings unique insights and experience utilizing data, business acumen and ethics to his current and previous positions at Liberty Mutual Insurance and Amazon. In addition to having 4 DataCamp courses he teaches graduate courses at the Harvard Extension School and is the author of Text Mining in Practice with R.

What do other learners have to say?

Join over 15 million learners and start Text Mining with Bag-of-Words in R today!

Create Your Free Account

Google LinkedIn Facebook

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.

Course Description

.css-10r9e5n{-webkit-margin-end:8px;margin-inline-end:8px;}.css-1309hh9{-webkit-flex-shrink:0;-ms-flex-negative:0;flex-shrink:0;-webkit-margin-end:8px;margin-inline-end:8px;}Training 2 or more people?

In the following Tracks

Text Mining in R

Jumping into Text Mining with Bag-of-Words

Word Clouds and More Interesting Visuals

Adding to Your TM Skills

Battle of the Tech Giants for Talent

Training 2 or more people?

In the following Tracks

Text Mining in R

What do other learners have to say?

Join over .css-ou6dz6{color:#03ef62;}15 million learners and start Text Mining with Bag-of-Words in R today!

Create Your Free Account

Training 2 or more people?

Join over 15 million learners and start Text Mining with Bag-of-Words in R today!