Scalable Data Processing in R
Learn how to write scalable code for working with big data in R using the bigmemory and iotools packages.
Commencer Le Cours Gratuitement4 heures15 vidéos49 exercices5 841 apprenantsDéclaration de réalisation
Créez votre compte gratuit
ou
En continuant, vous acceptez nos Conditions d'utilisation, notre Politique de confidentialité et le fait que vos données sont stockées aux États-Unis.Formation de 2 personnes ou plus ?
Essayer DataCamp for BusinessApprécié par les apprenants de milliers d'entreprises
Description du cours
Datasets are often larger than available RAM, which causes problems for R programmers since by default all the variables are stored in memory. You’ll learn tools for processing, exploring, and analyzing data directly from disk. You’ll also implement the split-apply-combine approach and learn how to write scalable code using the bigmemory and iotools packages. In this course, you'll make use of the Federal Housing Finance Agency's data, a publicly available data set chronicling all mortgages that were held or securitized by both Federal National Mortgage Association (Fannie Mae) and Federal Home Loan Mortgage Corporation (Freddie Mac) from 2009-2015.
Formation de 2 personnes ou plus ?
Donnez à votre équipe l’accès à la plateforme DataCamp complète, y compris toutes les fonctionnalités.Dans les titres suivants
Big Data en R
Aller à la piste- 1
Working with increasingly large data sets
GratuitIn this chapter, we cover the reasons you need to apply new techniques when data sets are larger than available RAM. We show that importing and exporting data using the base R functions can be slow and some easy ways to remedy this. Finally, we introduce the bigmemory package.
What is Scalable Data Processing?50 xpWhy is your code slow?50 xpHow does processing time vary by data size?100 xpWorking with "Out-of-Core" Objects using the Bigmemory Project50 xpReading a big.matrix object100 xpAttaching a big.matrix object100 xpCreating tables with big.matrix objects100 xpData summary using bigsummary100 xpReferences vs. Copies50 xpCopying matrices and big matrices100 xp - 2
Processing and Analyzing Data with bigmemory
Now that you've got some experience using bigmemory, we're going to go through some simple data exploration and analysis techniques. In particular, we'll see how to create tables and implement the split-apply-combine approach.
The Bigmemory Suite of Packages50 xpTabulating using bigtable100 xpBorrower Race and Ethnicity by Year (I)100 xpSplit-Apply-Combine50 xpFemale Proportion Borrowing100 xpSplit100 xpApply100 xpCombine100 xpVisualize your results using the tidyverse50 xpVisualizing Female Proportion Borrowing100 xpThe Borrower Income Ratio100 xpTidy Big Tables100 xpLimitations of bigmemory50 xpWhere should you use bigmemory?50 xp - 3
Working with iotools
We'll use the iotools package that can process both numeric and string data, and introduce the concept of chunk-wise processing.
Introduction to chunk-wise processing50 xpCan you split-compute-combine it?50 xpFoldable operations (I)100 xpFoldable operations (II)100 xpA first look at iotools: Importing data50 xpCompare read.delim() and read.delim.raw()100 xpReading raw data and turning it into a data structure100 xpchunk.apply50 xpReading chunks in as a matrix100 xpReading chunks in as a data.frame100 xpParallelizing calls to chunk.apply100 xp - 4
Case Study: A Preliminary Analysis of the Housing Data
In the previous chapters, we've introduced the housing data and shown how to compute with data that is about as big, or bigger than, the amount of RAM on a single machine. In this chapter, we'll go through a preliminary analysis of the data, comparing various trends over time.
Overview of types of analysis for this chapter50 xpRace and Ethnic Representation in the Mortgage Data100 xpComparing the Borrower Race/Ethnicity and their Proportions100 xpAre the data missing at random?50 xpLooking for Predictable Missingness100 xpA little more about missingness50 xpAnalyzing the Housing Data50 xpBorrower Race and Ethnicity by Year (II)100 xpVisualizing the Adjusted Demographic Trends100 xpRelative change in demographic trend100 xpBorrower Lending Trends: City vs. Rural50 xpBorrower Region by Year100 xpWho is securing federally guaranteed loans?100 xpCongratulations!50 xp
Formation de 2 personnes ou plus ?
Donnez à votre équipe l’accès à la plateforme DataCamp complète, y compris toutes les fonctionnalités.Dans les titres suivants
Big Data en R
Aller à la pisteMichael Kane
Voir PlusAssistant Professor at Yale University
Simon Urbanek
Voir PlusMember of the R-Core; Lead Inventive Scientist at AT&T Labs Research
Qu’est-ce que les autres apprenants ont à dire ?
Inscrivez-vous 15 millions d’apprenants et commencer Scalable Data Processing in R Aujourd’hui!
Créez votre compte gratuit
ou
En continuant, vous acceptez nos Conditions d'utilisation, notre Politique de confidentialité et le fait que vos données sont stockées aux États-Unis.