Data Privacy and Anonymization in Python
Learn to process sensitive information with privacy-preserving techniques.
Kurs Kostenlos Starten4 Stunden16 Videos49 Übungen2.967 LernendeLeistungsnachweis
Kostenloses Konto erstellen
oder
Durch Klick auf die Schaltfläche akzeptierst du unsere Nutzungsbedingungen, unsere Datenschutzrichtlinie und die Speicherung deiner Daten in den USA.Trainierst du 2 oder mehr?
Versuchen DataCamp for BusinessBeliebt bei Lernenden in Tausenden Unternehmen
Kursbeschreibung
Data privacy has never been more important. But how do you balance privacy with the need to gather and share valuable business insights? In this course, you'll learn how to do just that, using the same methods as Google and Amazon—including data generalization and privacy models, like k-Anonymity and differential privacy. In addition to touching on topics such as GDPR, you'll also discover how to build and train machine learning models in Python while protecting users’ sensitive information such as employee and income data. Let’s get started!
Trainierst du 2 oder mehr?
Verschaffen Sie Ihrem Team Zugriff auf die vollständige DataCamp-Plattform, einschließlich aller Funktionen.- 1
Introduction to Data Privacy
KostenlosGet ready to apply anonymization techniques such as data suppression, masking, synthetic data generation, and generalization. In this chapter, you’ll learn how to distinguish between sensitive and non-sensitive personally identifiable information (PII), quasi-identifiers, and the basics of the GDPR. You'll also encounter real-life examples of what can go wrong if you don't follow these best practices.
What's private, and why do we care?50 xpPrivacy is power50 xpIs it sensitive or non-sensitive?100 xpSuppression of sensitive attributes100 xpData masking and data generation with Faker50 xpMasking sensitive PII100 xpRemoving names with faker100 xpAnonymizing with data generalization50 xpReducing identification risk with generalization100 xpData aggregation and data generalization50 xpTop and bottom coding White House salaries100 xp - 2
More on Privacy-Preserving Techniques
Discover how to anonymize data by sampling from datasets following the probability distribution of the columns. You’ll then learn how to apply the k-anonymity privacy model to prevent linkage or re-identification attacks and use hierarchies to perform data generalization in categorical variables.
Anonymizing categorical data50 xpExplore the distribution of data100 xpSampling from the same probability distribution100 xpAnonymizing continuous data50 xpDifferent distributions50 xpSampling from the best continuous distribution100 xpIntroduction to K-anonymity50 xpPrivacy attributes100 xpGeneralizing into ranges100 xpGeneralizing data using hierarchies50 xpUsing hierarchies for categorical data100 xpK-anonymizing a dataset100 xp - 3
Differential Privacy
Learn about differential privacy, the model used by major technology companies such as Apple, Google, and Uber. In this chapter, you’ll explore data by generating private histograms and computing private averages in data. You’ll also create differentially private machine learning models that allow businesses to increase the utility of their data.
Introduction to differential privacy50 xpEpsilon (ϵ): the magic number50 xpHistograms with differential privacy100 xpPrivacy budgets50 xpUsing privacy budgets100 xpWhen no budget is left100 xpExploring data with a privacy budget accountant100 xpDifferentially private machine learning models50 xpBuild a differentially private classifier100 xpPredicting salaries100 xpDifferentially private clustering models50 xpPre-processing data100 xpSegmenting customers100 xp - 4
Anonymizing and Releasing Datasets
In this final chapter, you’ll learn how to apply dimensionality reduction methods such as principal component analysis (PCA) to anonymize large multi-column datasets. You’ll then use Faker to generate realistic and consistent datasets, and scikit-learn to create synthetic datasets that follow a normal distribution. Lastly, you’ll tie everything you learned in this course together as you combine multiple techniques to safely release datasets to the public.
PCA for anonymization50 xpAnonymization of high-dimensional data50 xpData masking with PCA100 xpGenerating realistic datasets with Faker50 xpConsistent synthetic dataset100 xpDatasets with the same probabilistic distribution100 xpCreating synthetic datasets using scikit-learn50 xpGenerating datasets for classification100 xpGenerating datasets for clustering100 xpSafely release datasets to the public50 xpExploring and pseudonymizing a dataset100 xpPreparing employee data for safe release100 xpGreat work!50 xp
Trainierst du 2 oder mehr?
Verschaffen Sie Ihrem Team Zugriff auf die vollständige DataCamp-Plattform, einschließlich aller Funktionen.Datensätze
IBM HR Analytics Employee Attrition & PerformanceUS Adult IncomeMall Customers2017-2018 NBA SalariesMitwirkende
Voraussetzungen
Unsupervised Learning in PythonRebeca Gonzalez
Mehr AnzeigenData Scientist, Hiberus Tecnologia
Was sagen andere Lernende?
Melden Sie sich an 15 Millionen Lernende und starten Sie Data Privacy and Anonymization in Python Heute!
Kostenloses Konto erstellen
oder
Durch Klick auf die Schaltfläche akzeptierst du unsere Nutzungsbedingungen, unsere Datenschutzrichtlinie und die Speicherung deiner Daten in den USA.