Scalable Data Processing in R

Learn how to write scalable code for working with big data in R using the bigmemory and iotools packages.

4 heures15 vidéos49 exercices5 841 apprenantsDéclaration de réalisation

Créez votre compte gratuit

En continuant, vous acceptez nos Conditions d'utilisation, notre Politique de confidentialité et le fait que vos données sont stockées aux États-Unis.

Formation de 2 personnes ou plus ?

Essayer DataCamp for Business

Apprécié par les apprenants de milliers d'entreprises

Description du cours

Datasets are often larger than available RAM, which causes problems for R programmers since by default all the variables are stored in memory. You’ll learn tools for processing, exploring, and analyzing data directly from disk. You’ll also implement the split-apply-combine approach and learn how to write scalable code using the bigmemory and iotools packages. In this course, you'll make use of the Federal Housing Finance Agency's data, a publicly available data set chronicling all mortgages that were held or securitized by both Federal National Mortgage Association (Fannie Mae) and Federal Home Loan Mortgage Corporation (Freddie Mac) from 2009-2015.

Pour les entreprises

Formation de 2 personnes ou plus ?

Donnez à votre équipe l’accès à la plateforme DataCamp complète, y compris toutes les fonctionnalités.

Dans les titres suivants

Big Data en R

Aller à la piste

1
Working with increasingly large data sets
Gratuit
In this chapter, we cover the reasons you need to apply new techniques when data sets are larger than available RAM. We show that importing and exporting data using the base R functions can be slow and some easy ways to remedy this. Finally, we introduce the bigmemory package.
Jouez Au Chapitre Maintenant
What is Scalable Data Processing?
50 xp
Why is your code slow?
50 xp
How does processing time vary by data size?
100 xp
Working with "Out-of-Core" Objects using the Bigmemory Project
50 xp
Reading a big.matrix object
100 xp
Attaching a big.matrix object
100 xp
Creating tables with big.matrix objects
100 xp
Data summary using bigsummary
100 xp
References vs. Copies
50 xp
Copying matrices and big matrices
100 xp
2
Processing and Analyzing Data with bigmemory
Now that you've got some experience using bigmemory, we're going to go through some simple data exploration and analysis techniques. In particular, we'll see how to create tables and implement the split-apply-combine approach.
Jouez Au Chapitre Maintenant
The Bigmemory Suite of Packages
50 xp
Tabulating using bigtable
100 xp
Borrower Race and Ethnicity by Year (I)
100 xp
Split-Apply-Combine
50 xp
Female Proportion Borrowing
100 xp
Split
100 xp
Apply
100 xp
Combine
100 xp
Visualize your results using the tidyverse
50 xp
Visualizing Female Proportion Borrowing
100 xp
The Borrower Income Ratio
100 xp
Tidy Big Tables
100 xp
Limitations of bigmemory
50 xp
Where should you use bigmemory?
50 xp
3
Working with iotools
We'll use the iotools package that can process both numeric and string data, and introduce the concept of chunk-wise processing.
Jouez Au Chapitre Maintenant
Introduction to chunk-wise processing
50 xp
Can you split-compute-combine it?
50 xp
Foldable operations (I)
100 xp
Foldable operations (II)
100 xp
A first look at iotools: Importing data
50 xp
Compare read.delim() and read.delim.raw()
100 xp
Reading raw data and turning it into a data structure
100 xp
chunk.apply
50 xp
Reading chunks in as a matrix
100 xp
Reading chunks in as a data.frame
100 xp
Parallelizing calls to chunk.apply
100 xp
4
Case Study: A Preliminary Analysis of the Housing Data
In the previous chapters, we've introduced the housing data and shown how to compute with data that is about as big, or bigger than, the amount of RAM on a single machine. In this chapter, we'll go through a preliminary analysis of the data, comparing various trends over time.
Jouez Au Chapitre Maintenant
Overview of types of analysis for this chapter
50 xp
Race and Ethnic Representation in the Mortgage Data
100 xp
Comparing the Borrower Race/Ethnicity and their Proportions
100 xp
Are the data missing at random?
50 xp
Looking for Predictable Missingness
100 xp
A little more about missingness
50 xp
Analyzing the Housing Data
50 xp
Borrower Race and Ethnicity by Year (II)
100 xp
Visualizing the Adjusted Demographic Trends
100 xp
Relative change in demographic trend
100 xp
Borrower Lending Trends: City vs. Rural
50 xp
Borrower Region by Year
100 xp
Who is securing federally guaranteed loans?
100 xp
Congratulations!
50 xp

Pour les entreprises

Formation de 2 personnes ou plus ?

Donnez à votre équipe l’accès à la plateforme DataCamp complète, y compris toutes les fonctionnalités.

Dans les titres suivants

Big Data en R

Aller à la piste

ensembles de données

Mortgage data (sample)

collaborateurs

Sumedh Panchadhar

Richie Cotton

prérequis

Writing Efficient R Code

Michael Kane

Assistant Professor at Yale University

Simon Urbanek

Member of the R-Core; Lead Inventive Scientist at AT&T Labs Research

Qu’est-ce que les autres apprenants ont à dire ?

Inscrivez-vous 15 millions d’apprenants et commencer Scalable Data Processing in R Aujourd’hui!

Créez votre compte gratuit

Google LinkedIn Facebook

En continuant, vous acceptez nos Conditions d'utilisation, notre Politique de confidentialité et le fait que vos données sont stockées aux États-Unis.

Description du cours

.css-10r9e5n{-webkit-margin-end:8px;margin-inline-end:8px;}.css-1309hh9{-webkit-flex-shrink:0;-ms-flex-negative:0;flex-shrink:0;-webkit-margin-end:8px;margin-inline-end:8px;}Formation de 2 personnes ou plus ?

Dans les titres suivants

Big Data en R

Working with increasingly large data sets

Processing and Analyzing Data with bigmemory

Working with iotools

Case Study: A Preliminary Analysis of the Housing Data

Formation de 2 personnes ou plus ?

Dans les titres suivants

Big Data en R

Qu’est-ce que les autres apprenants ont à dire ?

Inscrivez-vous .css-ou6dz6{color:#03ef62;}15 millions d’apprenants et commencer Scalable Data Processing in R Aujourd’hui!

Créez votre compte gratuit

Formation de 2 personnes ou plus ?

Inscrivez-vous 15 millions d’apprenants et commencer Scalable Data Processing in R Aujourd’hui!