Skip to main content

course

Cluster Analysis in R

Intermediate

4.8+

Updated 12/2024

Develop a strong intuition for how hierarchical and k-means clustering work and learn how to apply them to extract insights from your data.

Start course for free

Included for FreePremium or Teams

RMachine Learning4 hours16 videos52 exercises3,800 XP41,605Statement of Accomplishment

Create Your Free Account

Google LinkedIn Facebook

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.

Training 2 or more people?

Try DataCamp for Business

Loved by learners at thousands of companies

Course Description

Learn How to Perform Cluster Analysis

Cluster analysis is a powerful toolkit in the data science workbench. It is used to find groups of observations (clusters) that share similar characteristics. These similarities can inform all kinds of business decisions; for example, in marketing, it is used to identify distinct groups of customers for which advertisements can be tailored.

Explore Hierarchical and K-Means Clustering Techniques

In this course, you will learn about two commonly used clustering methods - hierarchical clustering and k-means clustering. You won't just learn how to use these methods, you'll build a strong intuition for how they work and how to interpret their results. You'll develop this intuition by exploring three different datasets: soccer player positions, wholesale customer spending data, and longitudinal occupational wage data.

Hone Your Skills with a Hands-On Case Study

You’ll finish the course by applying your new skills to a case study based around average salaries and how they have changed over time. This will combine hierarchical clustering techniques such as occupation trees, preparing for exploration, and plotting occupational clusters, with k-means techniques including elbow analysis and average silhouette widths.

DataCamp courses are comprised of a mixture of videos, articles, and practice exercises so that you have the chance to test and cement your new-found skills so that you feel confident applying them outside a course setting.

Prerequisites

1

Calculating Distance Between Observations

What is cluster analysis?

When to cluster?

Distance between two observations

Calculate & plot the distance between two players

Using the dist() function

Who are the closest players?

The importance of scale

Effects of scale

When to scale data?

Measuring distance for categorical data

Calculating distance between categorical variables

The closest observation to a pair

2

Hierarchical Clustering

Comparing more than two observations

Calculating linkage

Revisited: The closest observation to a pair

Capturing K clusters

Assign cluster membership

Exploring the clusters

Validating the clusters

Visualizing the dendrogram

Comparing average, single & complete linkage

Height of the tree

Cutting the tree

Clusters based on height

Exploring the branches cut from the tree

What do we know about our clusters?

Making sense of the clusters

Segment wholesale customers

Explore wholesale customer clusters

Interpreting the wholesale customer clusters

3

K-means Clustering

Introduction to K-means

K-means on a soccer field

K-means on a soccer field (part 2)

Evaluating different values of K by eye

Many K's many models

Elbow (Scree) plot

Interpreting the elbow plot

Silhouette analysis: observation level performance

Silhouette analysis

Making sense of the K-means clusters

Revisiting wholesale data: "Best" k

Revisiting wholesale data: Exploration

4

Case Study: National Occupational Mean Wage

Occupational wage data

Initial exploration of the data

Hierarchical clustering: Occupation trees

Hierarchical clustering: Preparing for exploration

Hierarchical clustering: Plotting occupational clusters

Reviewing the HC results

K-means: Elbow analysis

K-means: Average Silhouette Widths

The "best" number of clusters

Review K-means results

Cluster Analysis in R

Course
Complete

Earn Statement of Accomplishment

Add this credential to your LinkedIn profile, resume, or CV
Share it on social media and in your performance review

Included withPremium or Teams

Don’t just take our word for it

*4.8

from 13 reviews

85%

15%

0%

0%

0%

Highest to Lowest
Lowest to Highest
Most recent
Top reviews

Júlia S.

5 months

Really good course! I definitely plan to apply the knowledge :) good proportion of theory and practice

Daniel L.

8 months

It is a good course, but felt to repetitive from the course “Unsupervised Learning in R”. I would suggest this course to focus more on other clustering methods: GMM, DBSCAN, etc… Also other methods to evaluate the clustering performance.

Milan F.

12 months

Thank you

Daniel S.

about 1 year

Very clear and well explained

Anil G.

over 1 year

Good

"Really good course! I definitely plan to apply the knowledge :) good proportion of theory and practice"

Júlia S.

"It is a good course, but felt to repetitive from the course “Unsupervised Learning in R”. I would suggest this course to focus more on other clustering methods: GMM, DBSCAN, etc… Also other methods to evaluate the clustering performance."

Daniel L.

"Thank you"

Milan F.

FAQs

Join over 15 million learners and start Cluster Analysis in R today!

Create Your Free Account

Google LinkedIn Facebook

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.