Saltar al contenido principal

curso

Scalable Data Processing in R

Avanzado

Updated 12/2024

Learn how to write scalable code for working with big data in R using the bigmemory and iotools packages.

Comienza el curso gratis

Incluido de forma gratuitaPremium or Teams

RDesarrollo de software4 horas15 vídeos49 ejercicios3,950 XP5,852Declaración de cumplimiento

Crea Tu Cuenta Gratuita

Google LinkedIn Facebook

o

Al continuar, acepta nuestros Términos de uso, nuestra Política de privacidad y que sus datos se almacenan en los EE. UU.

¿Entrenar a 2 o más personas?

Probar DataCamp for Business

Preferido por estudiantes en miles de empresas

Descripción del curso

Datasets are often larger than available RAM, which causes problems for R programmers since by default all the variables are stored in memory. You’ll learn tools for processing, exploring, and analyzing data directly from disk. You’ll also implement the split-apply-combine approach and learn how to write scalable code using the bigmemory and iotools packages. In this course, you'll make use of the Federal Housing Finance Agency's data, a publicly available data set chronicling all mortgages that were held or securitized by both Federal National Mortgage Association (Fannie Mae) and Federal Home Loan Mortgage Corporation (Freddie Mac) from 2009-2015.

Prerrequisitos

Writing Efficient R Code

1

Working with increasingly large data sets

Iniciar capítulo

What is Scalable Data Processing?

Why is your code slow?

How does processing time vary by data size?

Working with "Out-of-Core" Objects using the Bigmemory Project

Reading a big.matrix object

Attaching a big.matrix object

Creating tables with big.matrix objects

Data summary using bigsummary

References vs. Copies

Copying matrices and big matrices

2

Processing and Analyzing Data with bigmemory

Iniciar capítulo

The Bigmemory Suite of Packages

Tabulating using bigtable

Borrower Race and Ethnicity by Year (I)

Split-Apply-Combine

Female Proportion Borrowing

Visualize your results using the tidyverse

Visualizing Female Proportion Borrowing

The Borrower Income Ratio

Tidy Big Tables

Limitations of bigmemory

Where should you use bigmemory?

3

Working with iotools

Iniciar capítulo

Introduction to chunk-wise processing

Can you split-compute-combine it?

Foldable operations (I)

Foldable operations (II)

A first look at iotools: Importing data

Compare read.delim() and read.delim.raw()

Reading raw data and turning it into a data structure

chunk.apply

Reading chunks in as a matrix

Reading chunks in as a data.frame

Parallelizing calls to chunk.apply

4

Case Study: A Preliminary Analysis of the Housing Data

Iniciar capítulo

Overview of types of analysis for this chapter

Race and Ethnic Representation in the Mortgage Data

Comparing the Borrower Race/Ethnicity and their Proportions

Are the data missing at random?

Looking for Predictable Missingness

A little more about missingness

Analyzing the Housing Data

Borrower Race and Ethnicity by Year (II)

Visualizing the Adjusted Demographic Trends

Relative change in demographic trend

Borrower Lending Trends: City vs. Rural

Borrower Region by Year

Who is securing federally guaranteed loans?

Congratulations!

Scalable Data Processing in R

Curso
Completo

Obtener Declaración de Logro

Añade esta credencial a tu perfil, currículum vitae o CV de LinkedIn
Compártelo en las redes sociales y en tu evaluación de desempeño

Incluido conPremium or Teams

Inscríbete ahora

Únete a más 15 millones de estudiantes y empezar Scalable Data Processing in R ¡Hoy!

Crea Tu Cuenta Gratuita

Google LinkedIn Facebook

o

Al continuar, acepta nuestros Términos de uso, nuestra Política de privacidad y que sus datos se almacenan en los EE. UU.