Data Privacy and Anonymization in Python

Learn to process sensitive information with privacy-preserving techniques.

Start Course for Free

4 hours16 videos49 exercises2,964 learnersStatement of Accomplishment

Create Your Free Account

Google LinkedIn Facebook

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.

Training 2 or more people?

Try DataCamp for Business

Loved by learners at thousands of companies

Course Description

Data privacy has never been more important. But how do you balance privacy with the need to gather and share valuable business insights? In this course, you'll learn how to do just that, using the same methods as Google and Amazon—including data generalization and privacy models, like k-Anonymity and differential privacy. In addition to touching on topics such as GDPR, you'll also discover how to build and train machine learning models in Python while protecting users’ sensitive information such as employee and income data. Let’s get started!

For Business

Training 2 or more people?

Get your team access to the full DataCamp platform, including all the features.

1
Introduction to Data Privacy
Free
Get ready to apply anonymization techniques such as data suppression, masking, synthetic data generation, and generalization. In this chapter, you’ll learn how to distinguish between sensitive and non-sensitive personally identifiable information (PII), quasi-identifiers, and the basics of the GDPR. You'll also encounter real-life examples of what can go wrong if you don't follow these best practices.
Play Chapter Now
What's private, and why do we care?
50 xp
Privacy is power
50 xp
Is it sensitive or non-sensitive?
100 xp
Suppression of sensitive attributes
100 xp
Data masking and data generation with Faker
50 xp
Masking sensitive PII
100 xp
Removing names with faker
100 xp
Anonymizing with data generalization
50 xp
Reducing identification risk with generalization
100 xp
Data aggregation and data generalization
50 xp
Top and bottom coding White House salaries
100 xp
2
More on Privacy-Preserving Techniques
Discover how to anonymize data by sampling from datasets following the probability distribution of the columns. You’ll then learn how to apply the k-anonymity privacy model to prevent linkage or re-identification attacks and use hierarchies to perform data generalization in categorical variables.
Play Chapter Now
Anonymizing categorical data
50 xp
Explore the distribution of data
100 xp
Sampling from the same probability distribution
100 xp
Anonymizing continuous data
50 xp
Different distributions
50 xp
Sampling from the best continuous distribution
100 xp
Introduction to K-anonymity
50 xp
Privacy attributes
100 xp
Generalizing into ranges
100 xp
Generalizing data using hierarchies
50 xp
Using hierarchies for categorical data
100 xp
K-anonymizing a dataset
100 xp
3
Differential Privacy
Learn about differential privacy, the model used by major technology companies such as Apple, Google, and Uber. In this chapter, you’ll explore data by generating private histograms and computing private averages in data. You’ll also create differentially private machine learning models that allow businesses to increase the utility of their data.
Play Chapter Now
Introduction to differential privacy
50 xp
Epsilon (ϵ): the magic number
50 xp
Histograms with differential privacy
100 xp
Privacy budgets
50 xp
Using privacy budgets
100 xp
When no budget is left
100 xp
Exploring data with a privacy budget accountant
100 xp
Differentially private machine learning models
50 xp
Build a differentially private classifier
100 xp
Predicting salaries
100 xp
Differentially private clustering models
50 xp
Pre-processing data
100 xp
Segmenting customers
100 xp
4
Anonymizing and Releasing Datasets
In this final chapter, you’ll learn how to apply dimensionality reduction methods such as principal component analysis (PCA) to anonymize large multi-column datasets. You’ll then use Faker to generate realistic and consistent datasets, and scikit-learn to create synthetic datasets that follow a normal distribution. Lastly, you’ll tie everything you learned in this course together as you combine multiple techniques to safely release datasets to the public.
Play Chapter Now
PCA for anonymization
50 xp
Anonymization of high-dimensional data
50 xp
Data masking with PCA
100 xp
Generating realistic datasets with Faker
50 xp
Consistent synthetic dataset
100 xp
Datasets with the same probabilistic distribution
100 xp
Creating synthetic datasets using scikit-learn
50 xp
Generating datasets for classification
100 xp
Generating datasets for clustering
100 xp
Safely release datasets to the public
50 xp
Exploring and pseudonymizing a dataset
100 xp
Preparing employee data for safe release
100 xp
Great work!
50 xp

For Business

Training 2 or more people?

Get your team access to the full DataCamp platform, including all the features.

datasets

IBM HR Analytics Employee Attrition & Performance US Adult Income Mall Customers 2017-2018 NBA Salaries

collaborators

Richie Cotton

Justin Saddlemyer

prerequisites

Unsupervised Learning in Python

Rebeca Gonzalez

Data Scientist, Hiberus Tecnologia

Rebeca is a data scientist and an entrepreneurial spirit. She has worked in companies like Ayesa and is now co-founder of Alio.li and APTIC, a startup that focuses on helping visually impaired people to see through Artificial Vision. Besides this, she loves animals, brainstorming sessions, and meeting new people. You can follow or contact her on Twitter and LinkedIn.

What do other learners have to say?

FAQs

Join over 15 million learners and start Data Privacy and Anonymization in Python today!

Create Your Free Account

Google LinkedIn Facebook

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.

Data Privacy and Anonymization in Python

Create Your Free Account

Training 2 or more people?

Loved by learners at thousands of companies

Course Description

Training 2 or more people?

Introduction to Data Privacy

More on Privacy-Preserving Techniques

Differential Privacy

Anonymizing and Releasing Datasets

Training 2 or more people?

What do other learners have to say?

FAQs

Is this course suitable for beginners?

Who will benefit from this course?

What topics does this course cover?

What techniques will I learn in this course?

Will I receive a certificate at the end of the course?

How long does it take to complete this course?

Can I test and apply what I learn in this course?

Join over 15 million learners and start Data Privacy and Anonymization in Python today!

Create Your Free Account

Course Description

.css-10r9e5n{-webkit-margin-end:8px;margin-inline-end:8px;}.css-1309hh9{-webkit-flex-shrink:0;-ms-flex-negative:0;flex-shrink:0;-webkit-margin-end:8px;margin-inline-end:8px;}Training 2 or more people?

Introduction to Data Privacy

More on Privacy-Preserving Techniques

Differential Privacy

Anonymizing and Releasing Datasets

Training 2 or more people?

What do other learners have to say?

FAQs

What topics does this course cover?

What techniques will I learn in this course?

Will I receive a certificate at the end of the course?

How long does it take to complete this course?

Can I test and apply what I learn in this course?

Join over .css-ou6dz6{color:#03ef62;}15 million learners and start Data Privacy and Anonymization in Python today!

Create Your Free Account

Training 2 or more people?

Join over 15 million learners and start Data Privacy and Anonymization in Python today!