Data Processing in Shell
Learn powerful command-line skills to download, process, and transform data, including machine learning pipeline.
Start Course for Free4 hours13 videos46 exercises19,275 learnersStatement of Accomplishment
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.Training 2 or more people?
Try DataCamp for BusinessLoved by learners at thousands of companies
Course Description
We live in a busy world with tight deadlines. As a result, we fall back on what is familiar and easy, favoring GUI interfaces like Visual Studio and RStudio. However, taking the time to learn data analysis on the command line is a great long-term investment because it makes us stronger and more productive data people.
In this course, we will take a practical approach to learn simple, powerful, and data-specific command-line skills. Using publicly available Spotify datasets, we will learn how to download, process, clean, and transform data, all via the command line. We will also learn advanced techniques such as command-line based SQL database operations. Finally, we will combine the powers of command line and Python to build a data pipeline for automating a predictive model.
In this course, we will take a practical approach to learn simple, powerful, and data-specific command-line skills. Using publicly available Spotify datasets, we will learn how to download, process, clean, and transform data, all via the command line. We will also learn advanced techniques such as command-line based SQL database operations. Finally, we will combine the powers of command line and Python to build a data pipeline for automating a predictive model.
For Business
Training 2 or more people?
Get your team access to the full DataCamp library, with centralized reporting, assignments, projects and more- 1
Downloading Data on the Command Line
FreeIn this chapter, we learn how to download data files from web servers via the command line. In the process, we also learn about documentation manuals, option flags, and multi-file processing.
Downloading data using curl50 xpUsing curl documentation50 xpDownloading single file using curl100 xpDownloading multiple files using curl100 xpDownloading data using Wget50 xpInstalling Wget50 xpDownloading single file using wget100 xpAdvanced downloading using Wget50 xpSetting constraints for multiple file downloads50 xpCreating wait time using Wget100 xpData downloading with Wget and curl100 xp - 2
Data Cleaning and Munging on the Command Line
We continue our data journey from data downloading to data processing. In this chapter, we utilize the command line library csvkit to convert, preview, filter and manipulate files to prepare our data for further analyses.
Getting started with csvkit50 xpInstallation and documentation for csvkit100 xpConverting and previewing data with csvkit100 xpFile conversion and summary statistics with csvkit100 xpFiltering data using csvkit50 xpPrinting column headers with csvkit100 xpFiltering data by column with csvkit100 xpFiltering data by row with csvkit100 xpStacking data and chaining commands with csvkit50 xpStacking files with csvkit100 xpChaining commands using operators100 xpData processing with csvkit100 xp - 3
Database Operations on the Command Line
In this chapter, we dig deeper into all that csvkit library has to offer. In particular, we focus on database operations we can do on the command line, including table creation, data pull, and various ETL transformation.
Pulling data from database50 xpUsing sql2csv documentation50 xpUnderstand sql2csv connectors50 xpPractice pulling data from database100 xpManipulating data using SQL syntax50 xpApplying SQL to a local CSV file100 xpCleaner scripting via shell variables100 xpJoining local CSV files using SQL100 xpPushing data back to database50 xpPractice pushing data back to database100 xpDatabase and SQL with csvkit100 xp - 4
Data Pipeline on the Command Line
In the last chapter, we bridge the connection between command line and other data science languages and learn how they can work together. Using Python as a case study, we learn to execute Python on the command line, to install dependencies using the package manager pip, and to build an entire model pipeline using the command line.
Python on the command line50 xpFinding Python version on the command line50 xpExecuting Python script on the command line100 xpPython package installation with pip50 xpUnderstanding pip's capabilities50 xpInstalling Python dependencies100 xpRunning a Python model100 xpData job automation with cron50 xpUnderstanding cron scheduling syntax50 xpScheduling a job with crontab100 xpModel production on the command line100 xpCourse recap50 xp
For Business
Training 2 or more people?
Get your team access to the full DataCamp library, with centralized reporting, assignments, projects and morecollaborators
What do other learners have to say?
FAQs
Join over 14 million learners and start Data Processing in Shell today!
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.