Skip to main content
HomeCheat sheetsData Science

Docker for Data Science Cheat Sheet

In this cheat sheet, learn how to apply Docker in your Data Science projects
Mar 2023  · 5 min read

Have this cheat sheet at your fingertips

Download PDF

Advantages and disadvantages of containers

Frame 799.png

Definitions

  • Docker image: A read-only template containing all the necessary files, libraries, and dependencies required to run an application in a Docker container.

  • Docker container: A running instance of an image. That is, an executable package including the code, libraries, and dependencies needed to run the application.

  • Dockerfile: A script containing instructions to create the image.

  • Docker registry: A repository for storing, sharing, and managing Docker images.  These include Docker Hub, Amazon Elastic Container Registry, Microsoft Azure Container Registry, and Google Cloud Container Registry.

  • Docker Engine: An application for managing Docker containers. It includes a server (the "Docker daemon"),  a command line tool (the Docker client), and an API for other software to interact with the Docker Daemon.

  • Docker client: A command-line tool to interact with Docker Engine to manage Docker images and containers.

  • Docker daemon (a.ka. Docker server): A background process that manages Docker images and containers according to the commands sent from the Docker client.

Getting help

# Display Docker version with docker --version
docker --version

# Display Docker system info with docker info
docker info

# Get help on Docker with docker --help
docker --help

# Get help on Docker command usage with docker {command} --help
docker run --help

Running Containers

# Run a container with docker run {image}
docker run hello-world # Runs a test container to check your installation works

# Run a container then use it to run a command with docker run {image} {command}
docker run python python -c "print('Python in Docker')" # Run Python & print text
docker run rocker/r-base r -e "print(lm(dist~speed, cars))" # Run R & print a model

# Run a container interactively with docker run --interactive --tty
docker run --interactive --tty rocker/r-base # Run R interactively

# Run a container, and remove it once you've finished with docker run --rm
docker run --rm mysql # Run MySQL, then clean up

# Run an image in the background with docker run --detach
docker run --detach postgres

# Run an image, assigning a name, with docker --name {name} run
docker run --name red1 redis # Run redis, naming the container as red1

# Run an image as a user with docker run --user {username}
docker run --user doctordocker mongo

Inspecting Containers

# List all running containers with docker ps
docker ps

# List all containers with docker ps --all
docker ps --all

# List all containers matching a conditions with docker ps --filter '{key}={value}'
docker ps --filter 'name=red1'

# Show container log output with docker logs --follow {container}
docker run --name bb busybox sh -c "$(echo date)" # Print current datetime
docker logs --follow bb # Print what bb container printed

Managing Containers

# docker run is equivalent to docker create + docker start
# Create a container from an image with docker create {image}
# Start a container with docker start {container}
docker create --name py --interactive --tty python
docker start --interactive --attach py
# Same as docker run --name py --interactive --tty python

# Create a new image from a container with docker container commit {container}
docker container commit

# Stop a container with docker stop {container}
# Container has option to save state or ignore request
docker stop py

# Kill a container with docker kill {container}# Container process finished immediately
docker kill py

# Kill and remove a container with docker rm --force {container}
docker rm --force py

# ​​Stop then start a container with docker restart {container}
docker restart py

# Delete stopped containers with docker container prune
docker container prune

# Create an image from a container with docker container commit {container_id} {image}
# Find the container ID with docker ps --all
docker container commit 123456789abc newimage

Building Images

# Build an image with docker build {path}
docker build .

# Build a tagged image with docker build --tag {name:tag} {path}
docker build --tag myimage:2023-edition .

# Build an image without using the cache docker build -no-cache {path}
docker build --no-cache .

Image Management

# List all local images with docker images
docker images

# Show Docker disk usage with docker system df
docker system df

# Show image creation steps from intermediate layers with docker history {image}
docker history alpine

# Save an image to a file with docker save --output {filename}
# Usually combined with a compression tool like gzip
docker save julia | gzip > julia.tar.gz

# Load an image from a file with docker load --input {filename}
docker load --input julia.tar.gz

# Delete an image with docker rmi {image}
docker rmi rocker/r-base

Working with Registries

# Log in to Docker with docker login --username {username}
docker login --username doctordocker

# Pull an image from a registry with docker pull {image}
docker pull julia

# Pull a version of an image from a registry with docker pull {image}:{tag}
docker pull julia:1.8.5-bullseye

# Tag an image to a repo with docker tag {image} {user}/{repo}
docker tag python doctordocker/myrepo

# Push an image to a registry with docker push {repo_tag}
docker push doctordocker/myrepo

# Search for an image with docker search "{image-search-text}
"docker search "py"

Creating Dockerfiles

# Derive image from another image with FROM{image}
FROM ubuntu:jammy-20230301

# Set a build and runtime environment variable with ENV {name}={value}
ENV TZ="America/New_York"

# Set a build-time variable with ARG {name}={default_value}
ARG VERBOSE=1

# Set the working directory with WORKDIR {path}
WORKDIR /home

# Switch to the user with USER {username}
USER doctordocker

# Copy a local file into the image with COPY {existing_path} {image_path}
COPY ./settings/config.yml ./settings/config.yml

# Run a shell command during the build step with RUN {command}
# \ lets commands continue across multiple lines
# && means run this command only if the preceding command succeeded
RUN apt-get update \
  && install -y libxml2-dev

# Run a shell command on launch with CMD ["{executable}", "{param1}"]
# Each Dockerfile should only have 1 CMD statement
CMD ["python", "-i"] # Start Python interactively
Topics
Related
Data Cleaning Checklist@1x.png

blog

[Infographic] Data Science Learning Checklist

Use this handy checklist to guide your data science learning journey.
DataCamp Team's photo

DataCamp Team

4 min

cheat sheet

Markdown Cheat Sheet

Learn everything you need to know about Markdown in this convenient cheat sheet!
Richie Cotton's photo

Richie Cotton

cheat sheet

Python for Data Science - A Cheat Sheet for Beginners

This handy one-page reference presents the Python basics that you need to do data science
Karlijn Willems's photo

Karlijn Willems

4 min

tutorial

Docker for Data Science: An Introduction

In this Docker tutorial, discover the setup, common Docker commands, dockerizing machine learning applications, and industry-wide best practices.
Arunn Thevapalan's photo

Arunn Thevapalan

15 min

tutorial

Learn Data Science - Infographic

Start your data science journey now and follow these 8 steps to learn data science.
Karlijn Willems's photo

Karlijn Willems

3 min

tutorial

Become a Data Scientist in 8 Steps: Infographic

This DataCamp infographic outlines how you can become a data scientist in 8 steps. This guide is for everyone interested in Data Science.
Karlijn Willems's photo

Karlijn Willems

3 min

See MoreSee More