Intermediate Regular Expressions in R
Manipulate text data, analyze it and more by mastering regular expressions and string distances in R.
Kurs Kostenlos Starten4 Stunden14 Videos48 Übungen4.178 LernendeLeistungsnachweis
Kostenloses Konto erstellen
oder
Durch Klick auf die Schaltfläche akzeptierst du unsere Nutzungsbedingungen, unsere Datenschutzrichtlinie und die Speicherung deiner Daten in den USA.Trainierst du 2 oder mehr?
Versuchen DataCamp for BusinessBeliebt bei Lernenden in Tausenden Unternehmen
Kursbeschreibung
Analyzing data that comes in tables is fun. But what if the things that we find most interesting are not available as a neatly organized dataset but in plain text? Do not despair: In this course, you'll learn everything you need to know to create powerful regular expressions that will help you find all the information you need for your analyses from just a blob of text. But not only that. Using the concept of string distances, you will learn to work even with text that contains typos or scanning errors, as you will be able to match them to their correct counterparts from other data sources (record linkage). As a learning material, we will analyze real documents about box office figures in Swiss cinemas.
Trainierst du 2 oder mehr?
Verschaffen Sie Ihrem Team Zugriff auf die vollständige DataCamp-Plattform, einschließlich aller Funktionen.- 1
Regular Expressions: Writing Custom Patterns
KostenlosRegular expressions can be pretty intimidating at first as they contain vast amounts of special characters. In this chapter, you'll learn to decipher these and write your own patterns to find exactly what you're looking for.
Welcome50 xpStarts with, ends with100 xpIf you don't know what you're looking for100 xpCharacter classes and repetitions50 xpDigits, words and spaces100 xpMatch repetitions100 xpWhich special character did what again?100 xpThe pipe and the question mark50 xpThis or that100 xpThe question mark and its two meanings100 xpYou can now read this!50 xp - 2
Creating Strings with Data
In this chapter, we will slightly move away from regular expressions and focus on string manipulation by creating strings from other data structures like vectors or lists.
Getting to know glue50 xpStop pasting, start gluing100 xpGluing data frames100 xpHow many arguments can glue take?50 xpCollapsing multiple elements into a string50 xpFormulating a question from a list100 xpCollapsing data frames100 xpGlue and Collapse, what's the difference?50 xpGluing regular expressions50 xpConstruct "or patterns" with glue100 xpUsing the "or pattern" with a larger dataset100 xpMake advanced patterns more readable100 xp - 3
Extracting Structured Data From Text
One task where regular expressions really shine is making sense from a blob of text. In this chapter, you'll learn to extract the information from messy data that doesn't come in neatly arranged tables but in plain text.
Capturing groups50 xpMatch all capturing groups100 xpSearch and replace100 xpCan you nest capturing groups?50 xptidyr's extract50 xpCreating a regex that matches your needs100 xpWhy does this fail?50 xpExtracting an advanced regular expression100 xpExtracting matches and surroundings from a text50 xpExtract names with context100 xpSo many special characters100 xp - 4
Similarities Between Strings
In the last chapter, we will shift gears away from regular expressions to understanding string distances. By calculating the differences of multiple strings, we can match those that are similar. This will help us to find duplicates even when they contain small errors like typos. This is an important part to record linkage where we combine datasets from multiple sources.
Understanding string distances50 xpCalculating a string distance50 xpFinding a match to a search typo100 xpMethods of string distances50 xpEdit distances vs. q-gram methods100 xpTrying out different methods100 xpIs one distance better than the other?50 xpFuzzy joins50 xpPerforming a string distance join100 xpString distances of short strings50 xpCustom Fuzzy Matching50 xpFinding matches based on two conditions100 xpWhy join on multiple columns?50 xpCongratulations50 xp
Trainierst du 2 oder mehr?
Verschaffen Sie Ihrem Team Zugriff auf die vollständige DataCamp-Plattform, einschließlich aller Funktionen.Mitwirkende
Benja Zehr
Mehr AnzeigenData Journalist
Was sagen andere Lernende?
Melden Sie sich an 15 Millionen Lernende und starten Sie Intermediate Regular Expressions in R Heute!
Kostenloses Konto erstellen
oder
Durch Klick auf die Schaltfläche akzeptierst du unsere Nutzungsbedingungen, unsere Datenschutzrichtlinie und die Speicherung deiner Daten in den USA.