Blog

How to Make a Histogram in Base R: 6 Steps With Examples

Learn how to create a histogram with basic R using the hist() function. In 6 simple steps (with examples) you can make a basic R histogram for exploratory analysis.

Updated Feb 2023 · 10 min read

In this tutorial, we will be visualizing distributions of data by plotting histograms using the R programming language. We will cover what a histogram is, how to read data in R, how to create a histogram, and how to customize the plot.

We will be using the base R programming language with no additional packages. This approach is only recommended when additional packages cannot be used or for quick exploratory analyses. In most cases, you should use ggplot2, as covered in the How to make a histogram in R in ggplot2 tutorial.

Practice how to make a histogram in R with this hands-on exercise.

What is a histogram?

A histogram is a very popular graph that is used to show frequency distributions across continuous (numeric) variables. Histograms allow us to see the count of observations in data within ranges that the variable spans.

Histograms look similar to bar charts. A key difference between the two is that bar charts have a value associated with a specific category or discrete variable, while a histogram visualizes frequencies for continuous variables.

Housing Data

We will be using this housing dataset which includes details about different house listings, including the size of the house, the number of rooms, the price, and location information. We can read the data using the read.csv() function, either directly from the URL or by downloading the csv file into a directory and reading it from our local storage. We can also specify that we only want to store the columns we are interested in for this tutorial; price and condition.

home_data <- read.csv("https://raw.githubusercontent.com/rashida048/Datasets/master/home_data.csv")[ ,c('price', 'condition')]

Let’s look at the first few rows of data using the head() function

head(home_data, 5)

How to Make a Histogram with base R using the hist() function

Next, we will create a histogram using the hist() function to look at the distribution of prices in our dataset.

hist(home_data$price)

Add descriptive statistics to histogram

We can add descriptive statistics to the histogram using the abline() function. This adds a vertical line to the plot.

Set the v argument to the position on the x-axis for the vertical line. Here, we get the mean house price using mean().
The col argument set the line color, in this case to red.
The lwd argument sets the line width. A value of 3 increases the thickness of the line to make it easier to see.

hist(home_data$price)

abline(v = mean(home_data$price), col='red', lwd = 3)

Plotting probability densities instead of counts

To add a probability density line to the histogram, we first change the y-axis to be scaled to density. In the call to hist() , we set the probability argument to TRUE.

The probability density line is made with a combination of density(), which calculates the position of the probability density curve, and lines(), which adds the line to the existing plot.

hist(home_data$price, probability = TRUE)
abline(v = mean(home_data$price), col='red', lwd = 3)
lines(density(home_data$price), col = 'green', lwd = 3)

Notice that the numbers on the y-axis have changed.

Customize the color of the histogram using col

We can change the colors inside of the bins on the histogram using the col parameter of the hist() function. We will change the fill to blue. We can also change the outline color of the bars using the border parameter. We will change the color of the outlines to white.

hist(home_data$price, col = 'blue', border = "white")

Add labels and titles using ylab, xlab, and main

We can change the labels on the plot to make it more readable and presentable. This is useful if you share the plot with others.

xlab sets the x-axis label
ylab sets the y-axis label
main sets the plot title

hist(home_data$price, xlab = 'Price (USD)', ylab = 'Number of Listings', main = 'Distribution of House Prices')

Start Learning R For Free

Visualization Best Practices in R

BeginnerSkill Level

4 hr

16.2K learners

Learn to effectively convey your data with an overview of common charts, alternative visualization types, and perception-driven style enhancements.

See Details

Introduction to Data Visualization with ggplot2

BeginnerSkill Level

4 hr

132.8K learners

Learn to produce meaningful and beautiful data visualizations with ggplot2 by understanding the grammar of graphics.

See Details

Update binning using breaks

With the default arguments, it is challenging to see the full distribution of the housing prices across the range of prices. We can see they are centralized in the first few bins, but they are not very descriptive.

We can add more bins using the breaks parameter. With this argument, we can pass a vector of specific breakpoints to use, a function to compute the breakpoints, a number of breaks we would like, or a function to compute the number of cells.

For this example, we will pass the number of bins we would like. This number is context-specific based on what you are trying to show in your graph.

hist(home_data$price, breaks = 100)

With breaks set to 100, we have significantly more visibility into the distribution in the first few buckets.

We can also specify the number of breaks using the names of common calculations for calculating optimal breaks in a histogram. By default, hist() uses the “Sturges” method (this is optional to pass as an argument because it is the default).

hist(home_data$price, breaks = "Sturges")

We can also pass “Scott” as an argument for the breaks attribute to use the Scott Method.

hist(home_data$price, breaks = "Scott")

Finally, we could also use the Freedman-Diaconis (FD) method.

hist(home_data$price, breaks = "Freedman-Diaconis")

Setting x-axis limits

We can set the x-axis limits of our plot using the xlim argument to zoom in on the data we are interested in. For example, it is sometimes helpful to focus on the central part of the distribution, rather than over the long tail we currently see when we view the whole plot.

Changing the y-axis limits is also possible (using the ylim argument) but this is less useful for histograms since the automatically calculated values are almost always ideal.

We will zoom in on prices between $0 and $2M.

hist(home_data$price, breaks = 100, xlim = c(0, 2000000))

Take it to the next level

As you get more comfortable with R, you can explore more powerful packages that make it easier to build more interesting and useful visualizations. A very popular and easy-to-use library for plotting in R is called ggplot2. Below we create an interesting view of the distributions of prices based on the number of bedrooms in the house.

ggplot2 is the best way to visualize data in R, and you can learn about using it to create histograms in the How to make a histogram in R in ggplot2 tutorial. Check out our Introduction to ggplot2 course and our Intermediate ggplot2 course to learn how to make more interesting visualizations in R.

In summary

In this tutorial, we learned that histograms are great visualizations for looking at distributions of continuous variables. We learned how to make a histogram in R, how to plot summary statistics on top of our histogram, how to customize features of the plot like the axis titles, the color, how we bin the x-axis, and how to set limits on the axes. Finally, we demonstrated some of the power of the ggplot2 library.

Further Reading and Resources for Histogram Plotting in R:

Get certified in your dream Data Analyst role

Our certification programs help you stand out and prove your skills are job-ready to potential employers.

Get Your Certification

Topics

R Programming

Data Visualization

Data Analysis

R Courses

Course

Introduction to R

4 hr

2.7M

Master the basics of data analysis in R, including vectors, lists, and data frames, and practice R with real data sets.

See Details

Start Course

Course

Intermediate R

6 hr

596.7K

Continue your journey to becoming an R ninja by learning about conditional statements, loops, and vector functions.

See Details

Start Course

Course

Exploratory Data Analysis in R

4 hr

98.3K

Learn how to use graphical and numerical techniques to begin uncovering the structure of your data.

See Details

Start Course

blog

The 4 Best Data Analytics Bootcamps in 2024

Discover the best data analytics bootcamps in 2024, discussing what they are, how to choose the best bootcamp, and you can learn.

Kevin Babitz

5 min

blog

A Guide to Corporate Data Analytics Training

Understand the importance of corporate data analytics training in driving business success. Learn about key building blocks and steps to launch an effective training initiative tailored to your organization's needs.

Kevin Babitz

6 min

podcast

[Radar Recap] From Data Governance to Data Discoverability: Building Trust in Data Within Your Organization with Esther Munyi, Amy Grace, Stefaan Verhulst and Malarvizhi Veerappan

Esther Munyi, Amy Grace, Stefaan Verhulst and Malarvizhi Veerappan focus on strategies for improving data quality, fostering a culture of trust around data, and balancing robust governance with the need for accessible, high-quality data.

Richie Cotton

39 min

podcast

[Radar Recap] Scaling Data ROI: Driving Analytics Adoption Within Your Organization with Laura Gent Felker, Omar Khawaja and Tiffany Perkins-Munn

Laura, Omar and Tiffany explore best practices when it comes to scaling analytics adoption within the wider organization

Richie Cotton

40 min

tutorial

Sorting Data in R

How to sort a data frame in R.

DataCamp Team

2 min

tutorial

How to Transpose a Matrix in R: A Quick Tutorial

Learn three methods to transpose a matrix in R in this quick tutorial

Adel Nehme

See More See More

What is a histogram?

Housing Data

How to Make a Histogram with base R using the hist() function

Add descriptive statistics to histogram

Plotting probability densities instead of counts

Customize the color of the histogram using col

Add labels and titles using ylab, xlab, and main

Start Learning R For Free

.css-1531qan{-webkit-text-decoration:none;text-decoration:none;color:inherit;}Visualization Best Practices in R

Introduction to Data Visualization with ggplot2

Update binning using breaks

Setting x-axis limits

Take it to the next level

In summary

Get certified in your dream Data Analyst role

The 4 Best Data Analytics Bootcamps in 2024

A Guide to Corporate Data Analytics Training

[Radar Recap] From Data Governance to Data Discoverability: Building Trust in Data Within Your Organization with Esther Munyi, Amy Grace, Stefaan Verhulst and Malarvizhi Veerappan

[Radar Recap] Scaling Data ROI: Driving Analytics Adoption Within Your Organization with Laura Gent Felker, Omar Khawaja and Tiffany Perkins-Munn

Sorting Data in R

How to Transpose a Matrix in R: A Quick Tutorial

Introduction to R

Intermediate R

Exploratory Data Analysis in R

The 4 Best Data Analytics Bootcamps in 2024

A Guide to Corporate Data Analytics Training

[Radar Recap] From Data Governance to Data Discoverability: Building Trust in Data Within Your Organization with Esther Munyi, Amy Grace, Stefaan Verhulst and Malarvizhi Veerappan

[Radar Recap] Scaling Data ROI: Driving Analytics Adoption Within Your Organization with Laura Gent Felker, Omar Khawaja and Tiffany Perkins-Munn

Sorting Data in R

How to Transpose a Matrix in R: A Quick Tutorial

Visualization Best Practices in R