Scalable Data Processing in R
Learn how to write scalable code for working with big data in R using the bigmemory and iotools packages.
Comienza El Curso Gratis4 horas15 vídeos49 ejercicios5839 aprendicesDeclaración de cumplimiento
Crea Tu Cuenta Gratuita
o
Al continuar, acepta nuestros Términos de uso, nuestra Política de privacidad y que sus datos se almacenan en los EE. UU.¿Entrenar a 2 o más personas?
Probar DataCamp for BusinessPreferido por estudiantes en miles de empresas
Descripción del curso
Datasets are often larger than available RAM, which causes problems for R programmers since by default all the variables are stored in memory. You’ll learn tools for processing, exploring, and analyzing data directly from disk. You’ll also implement the split-apply-combine approach and learn how to write scalable code using the bigmemory and iotools packages. In this course, you'll make use of the Federal Housing Finance Agency's data, a publicly available data set chronicling all mortgages that were held or securitized by both Federal National Mortgage Association (Fannie Mae) and Federal Home Loan Mortgage Corporation (Freddie Mac) from 2009-2015.
¿Entrenar a 2 o más personas?
Obtén a tu equipo acceso a la plataforma DataCamp completa, incluidas todas las funciones.En las siguientes pistas
Grandes datos in R
Ir a la pista- 1
Working with increasingly large data sets
GratuitoIn this chapter, we cover the reasons you need to apply new techniques when data sets are larger than available RAM. We show that importing and exporting data using the base R functions can be slow and some easy ways to remedy this. Finally, we introduce the bigmemory package.
What is Scalable Data Processing?50 xpWhy is your code slow?50 xpHow does processing time vary by data size?100 xpWorking with "Out-of-Core" Objects using the Bigmemory Project50 xpReading a big.matrix object100 xpAttaching a big.matrix object100 xpCreating tables with big.matrix objects100 xpData summary using bigsummary100 xpReferences vs. Copies50 xpCopying matrices and big matrices100 xp - 2
Processing and Analyzing Data with bigmemory
Now that you've got some experience using bigmemory, we're going to go through some simple data exploration and analysis techniques. In particular, we'll see how to create tables and implement the split-apply-combine approach.
The Bigmemory Suite of Packages50 xpTabulating using bigtable100 xpBorrower Race and Ethnicity by Year (I)100 xpSplit-Apply-Combine50 xpFemale Proportion Borrowing100 xpSplit100 xpApply100 xpCombine100 xpVisualize your results using the tidyverse50 xpVisualizing Female Proportion Borrowing100 xpThe Borrower Income Ratio100 xpTidy Big Tables100 xpLimitations of bigmemory50 xpWhere should you use bigmemory?50 xp - 3
Working with iotools
We'll use the iotools package that can process both numeric and string data, and introduce the concept of chunk-wise processing.
Introduction to chunk-wise processing50 xpCan you split-compute-combine it?50 xpFoldable operations (I)100 xpFoldable operations (II)100 xpA first look at iotools: Importing data50 xpCompare read.delim() and read.delim.raw()100 xpReading raw data and turning it into a data structure100 xpchunk.apply50 xpReading chunks in as a matrix100 xpReading chunks in as a data.frame100 xpParallelizing calls to chunk.apply100 xp - 4
Case Study: A Preliminary Analysis of the Housing Data
In the previous chapters, we've introduced the housing data and shown how to compute with data that is about as big, or bigger than, the amount of RAM on a single machine. In this chapter, we'll go through a preliminary analysis of the data, comparing various trends over time.
Overview of types of analysis for this chapter50 xpRace and Ethnic Representation in the Mortgage Data100 xpComparing the Borrower Race/Ethnicity and their Proportions100 xpAre the data missing at random?50 xpLooking for Predictable Missingness100 xpA little more about missingness50 xpAnalyzing the Housing Data50 xpBorrower Race and Ethnicity by Year (II)100 xpVisualizing the Adjusted Demographic Trends100 xpRelative change in demographic trend100 xpBorrower Lending Trends: City vs. Rural50 xpBorrower Region by Year100 xpWho is securing federally guaranteed loans?100 xpCongratulations!50 xp
¿Entrenar a 2 o más personas?
Obtén a tu equipo acceso a la plataforma DataCamp completa, incluidas todas las funciones.En las siguientes pistas
Grandes datos in R
Ir a la pistaMichael Kane
Ver MásAssistant Professor at Yale University
Simon Urbanek
Ver MásMember of the R-Core; Lead Inventive Scientist at AT&T Labs Research
¿Qué tienen que decir otros alumnos?
¡Únete a 15 millones de estudiantes y empieza Scalable Data Processing in R hoy mismo!
Crea Tu Cuenta Gratuita
o
Al continuar, acepta nuestros Términos de uso, nuestra Política de privacidad y que sus datos se almacenan en los EE. UU.