Data Manipulation with data.table in R
Master core concepts about data manipulation such as filtering, selecting and calculating groupwise statistics using data.table.
Start Course for Free4 Hours15 Videos59 Exercises
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.Loved by learners at thousands of companies
Course Description
The data.table package provides a high-performance version of base R's data.frame with syntax and feature enhancements for ease of use, convenience and programming speed. This course shows you how to create, subset, and manipulate data.tables. You'll also learn about the database-inspired features of data.tables, including built-in groupwise operations. The course concludes with fast methods of importing and exporting tabular text data such as CSV files. Upon completion of the course, you will be able to use data.table in R for a more efficient manipulation and analysis process. Throughout the course you'll explore the San Francisco Bay Area bike share trip dataset from 2014.
- 1
Introduction to data.table
FreeThis chapter introduces data.tables as a drop-in replacement for data.frames and shows how to use data.table's i argument to filter rows.
Welcome to the course!50 xpdata.table pop quiz50 xpCreating a data.table100 xpIntroducing bikes data100 xpFiltering rows in a data.table50 xpFiltering rows using positive integers100 xpFiltering rows using negative integers100 xpFiltering rows using logical vectors100 xpHelpers for filtering50 xpI %like% data.tables100 xpFiltering with %in%100 xpFiltering with %between% and %chin%100 xp - 2
Selecting and Computing on Columns
Just as the i argument lets you filter rows, the j argument of data.table lets you select columns and also perform computations. The syntax is far more convenient and flexible when compared to data.frames.
Selecting columns from a data.table50 xpSelecting a single column50 xpSelecting columns by name100 xpDeselecting specific columns100 xpComputing on columns the data.table way50 xpComputing in j (I)100 xpComputing in j (II)100 xpAdvanced computations in j50 xpComputing in j (III)100 xpCombining i and j100 xp - 3
Groupwise Operations
This chapter introduces data.table's by argument that lets you perform computations by groups. By the end of this chapter, you will master the concise DT[i, j, by] syntax of data.table.
Computations by groups50 xpComputing stats by groups (I)100 xpComputing stats by groups (II)100 xpComputing multiple stats100 xpChaining data.table expressions50 xpOrdering rows100 xpWhat are the top 5 destinations?100 xpWhat is the most popular destination from each start station?100 xpCombining i, j, and by (I)100 xpComputations in j using .SD50 xpUsing .SD (I)100 xpUsing .SD (II)100 xp - 4
Reference Semantics
You will learn about a unique feature of data.table in this chapter: modifying existing data.tables in place. Modifying data.tables in place makes your operations incredibly fast and is easy to learn.
Adding and updating columns by reference50 xpAdding a new column100 xpUpdating an existing column (I)100 xpUpdating an existing column (II)100 xpGrouped aggregations50 xpAdding columns by group100 xpUpdating columns by group100 xpAdvanced aggregations50 xpAdding multiple columns (I)100 xpAdding multiple columns (II)100 xpCombining i, j, and by (II)100 xp - 5
Importing and Exporting Data
Not only does the data.table package help you perform incredibly fast computations, it can also help you read and write data to disk with amazing speeds. This chapter focuses on data.table's fread() and fwrite() functions which let you import and export flat files quickly and easily!
Fast data reading with fread()50 xpFast reading from disk100 xpImporting a CSV file100 xpImporting selected columns100 xpImporting selected rows100 xpAdvanced file reading50 xpReading large integers100 xpSpecifying column classes100 xpDealing with empty and incomplete lines100 xpDealing with missing values100 xpFast data writing with fwrite()50 xpWriting files to disk100 xpWriting date and time columns100 xpFast writing to disk100 xp
Collaborators




Prerequisites
Intermediate RMatt Dowle
See MoreAuthor of data.table
Matt Dowle is the main author of the data.table package. Matt has worked for some of the world’s largest financial organizations and has been programming in R for over a decade.
Arun Srinivasan
See MoreR's data.table co-developer
Arun Srinivasan is originally from Tamilnadu, India. He holds a Bachelors degree in Electronics engineering and a Masters degree in Bioinformatics. He started using R in 2010 and has contributed to R's data.table package since late 2013. He currently lives in London, where he works as a developer and analyst in Finance. He has a passion for developing tools and algorithms facilitating analyses on large data.
What do other learners have to say?
Join over 12 million learners and start Data Manipulation with data.table in R today!
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.