Cleaning Data in PostgreSQL Databases
Learn to tame your raw, messy data stored in a PostgreSQL database to extract accurate insights.
Comece O Curso Gratuitamente4 horas15 vídeos49 exercícios10.695 aprendizesDeclaração de Realização
Crie sua conta gratuita
ou
Ao continuar, você aceita nossos Termos de Uso, nossa Política de Privacidade e que seus dados são armazenados nos EUA.Treinar 2 ou mais pessoas?
Tentar DataCamp for BusinessAmado por alunos de milhares de empresas
Descrição do Curso
If you surveyed a large number of data scientists and data analysts about which tasks are most common in their workday, cleaning data would likely be in almost all responses. This is the case because real-world data is messy. To help you tame messy data, this course teaches you how to clean data stored in a PostgreSQL database. You’ll learn how to solve common problems such as how to clean messy strings, deal with empty values, compare the similarity between strings, and much more. You’ll get hands-on practice with these tasks using interesting (but messy) datasets made available by New York City's Open Data program. Are you ready to whip that messy data into shape?
Treinar 2 ou mais pessoas?
Obtenha acesso à sua equipe à plataforma DataCamp completa, incluindo todos os recursos.- 1
Data Cleaning Basics
GratuitoIn this chapter, you’ll gain an understanding of data cleaning approaches when working with PostgreSQL databases and learn the value of cleaning data as early as possible in the pipeline. You’ll also learn basic string editing approaches such as removing unnecessary spaces as well as more involved topics such as pattern matching and string similarity to identify string values in need of cleaning.
Introduction to data cleaning50 xpDeveloping a data cleaning mindset50 xpApplying functions for string cleaning100 xpPattern matching50 xpClassifying parking violations by time of day100 xpMasking identifying information with regular expressions100 xpMatching similar strings50 xpMatching inconsistent color names100 xpStandardizing color names100 xpStandardizing multiple colors100 xpFormatting text for colleagues100 xp - 2
Missing, Duplicate, and Invalid Data
You’ll learn how to write queries to solve common problems of missing, duplicate, and invalid data in the context of PostgreSQL database tables. Through hands-on exercises, you’ll use the COALESCE() function, SELECT query, and WHERE clause to clean messy data.
Handling missing data50 xpQuantifying completeness50 xpUsing a fill-in value100 xpAnalyzing incomplete records100 xpHandling duplicated data50 xpDuplicate parking violations100 xpResolving impartial duplicates100 xpDetecting invalid values50 xpDetecting invalid values with regular expressions100 xpIdentifying out-of-range vehicle model years100 xpDetecting inconsistent data50 xpIdentifying invalid parking violations100 xpInvalid violations with overnight parking restrictions100 xpRecovering deleted data100 xp - 3
Converting Data
Sometimes you need to convert data stored in a PostgreSQL database from one data type to another. In this chapter, you’ll explore the expressions you need to convert text to numeric types and how to format strings for temporal data.
Data type conversions50 xpType conversion with a CASE clause100 xpApplying aggregate functions to converted values100 xpDate parsing and formatting50 xpCleaning invalid dates100 xpConverting and displaying dates100 xpTimestamp parsing and formatting50 xpExtracting hours from a time value100 xpA parking violation report by day of the month100 xpRisky parking behavior100 xp - 4
Transforming Data
In the final chapter, you’ll learn how to transform your data and construct pivot tables. Working with real-world postal data, you’ll discover how to combine and split addresses into city, state, and zip codes using a multitude of powerful functions including CONCAT(), SUBSTRING(), and REGEXP_SPLIT_TO_TABLE().
Combining columns50 xpTallying corner parking violations100 xpCreating a TIMESTAMP with concatenation100 xpSplitting column data50 xpExtracting time units with SUBSTRING()100 xpExtracting house numbers from a string100 xpSplitting data with delimiters50 xpSplitting house numbers with a delimiter100 xpMapping parking restrictions100 xpCreating pivot tables50 xpSelecting data for a pivot table100 xpUsing FILTER to create a pivot table100 xpAggregating film categories100 xpCourse wrap-up50 xp
Treinar 2 ou mais pessoas?
Obtenha acesso à sua equipe à plataforma DataCamp completa, incluindo todos os recursos.colaboradores
pré-requisitos
Data Manipulation in SQLDarryl Reeves Ph.D
Ver MaisIndustry Assistant Professor, NYU Tandon School of Engineering
O que os outros alunos têm a dizer?
Junte-se a mais de 15 milhões de alunos e comece Cleaning Data in PostgreSQL Databases hoje mesmo!
Crie sua conta gratuita
ou
Ao continuar, você aceita nossos Termos de Uso, nossa Política de Privacidade e que seus dados são armazenados nos EUA.