HomeTutorialsPython

How to Create a Histogram with Plotly

Learn how to implement histograms in Python using the Plotly data visualization library.
Jan 2023  · 12 min read

Histograms are popular graphical displays of data points organized into specified ranges. We typically see them in statistics, used to display the amount of a specific type of variable that appears within a certain range. For example, your old math teacher may have used a histogram to visualize the number of marks obtained for all the people in your class.

Figure 1: An example Histogram displaying the distribution of marks for a class.

[Source: Example Histogram with Matplotlib]

Their distinctive bars of varying heights resemble the aesthetics of bar charts, except histograms are used to visualize quantitative data instead of categorical data – which is the case for bar charts.

Generally, numerical data visualized in histograms are continuous, meaning they have infinite values. Attempting to plot the values of continuous variables is often impractical. Histograms mitigate this issue by grouping several data points into logical ranges (known as bins) and allowing them to be visualized.

In this tutorial, we will cover how to implement histograms in Python using the Plotly data visualization library. We will also touch on different ways to customize your Plotly histogram and why data visualization is important in the first place.

Check out this DataCamp Workspace to follow along with the code in this tutorial.

The Importance of Data Visualization in Data Analysis

No business can claim it will not benefit from understanding its data better. The role of data analysis is to help businesses optimize their performance and improve their bottom line; this science spans its wings to all fields where data exists since insight is generally regarded as leverage.

Data visualizations are the graphical representations analysts use to represent information and data. Its purpose is to summarize data to be presented in an easily understandable format, highlighting vital lessons from the data to its readers.

In other words, businesses use data visualizations to aid in rapidly identifying data trends which would otherwise be difficult without visual assistance. Graphical representations make it easy to communicate information in a manner that is universal, fast, and effective.

An Introduction to Plotly

Plotly is an interactive, open-source data visualization library for Python and R written in JavaScript. For this article, we will focus on Plotly Python, occasionally referred to as “plotly.py,” to distinguish it from its JavaScript counterpart.

Tip: Take the Introduction to data visualization with Plotly in Python or Interactive data visualization with Plotly in R courses to get to grips with Plotly.

Many users of Ploty often remark they were initially “captivated” by the modern aesthetics the library enables storytellers to implement quite easily. Plotly Python allows users to create attractive, interactive web-based visualizations that can be:

• displayed in Jupyter notebooks,
• saved to standalone HTML files,
• served as part of pure Python-built web applications using Dash.

Another appeal of Plotly is the vast range of chart types available. The library supports more than 40 unique charts spanning various use cases, including statistics, finance, geography, science, and 3-dimensional charts.

It’s easy to export visualizations created with Plotly due to the library's deep integration with Kaleido – an image export utility. There’s also extensive support for non-web contexts, including desktop editors (i.e., Spyder, PyCharm, etc.) and publishing static documents (i.e., exporting Jupyter notebooks to PDF format).

Installing plotly.py is reasonably straightforward; It may be installed using pip:

``pip install plotly>=5.11.0, <5.12.0``

Or using Conda:

``conda install -c plotly plotly>=5.11.0, <5.12.0``

Note: It’s recommended to install libraries in a virtual environment to keep the dependencies for different projects separate. You can also get started with a DataCamp Workspace - a Python & R Data Science IDE in the cloud.

Using DataCamp Workspaces

The DataCamp Workspace is an online Integrated Development Environment (IDE) that allows users to write code, analyze data individually or collectively, and share data insights. In other words, you can think of DataCamp Workspace as a kind of Google Docs specifically conceived for data science; learn more about how they work in our article on Using DataCamp Workspace in the Classroom.

For this task, we must create a workspace from a dataset. Fortunately, DataCamp provides a rich and growing body of interesting datasets to analyze. When we create a workspace from a dataset, the environment will contain all the pre-configured data and a brief description of the data so you can get your hands dirty from the offset.

The steps to create a workspace using a DataCamp dataset are as follows:

• From the Workspace dashboard, select Use a dataset in the sidebar.
• Select a dataset you're interested in. A dialog box displays. It provides a preview of the selected dataset, including a data dictionary and a link to the dataset source, a possible choice of language (Python or R), and an overview of publications that other Workspace users created with this dataset.
• Select Use Dataset.

Figure 2: A GIF demonstrating the steps to create a DataCamp Workspace from a dataset.

Note: View Creating a Plotly Histogram with Olympics Data to follow along with the code.

We will use the Olympics Data dataset from DataCamp’s internal data collection. According to the description, this is a historical dataset on the modern Olympic Games, from Athens 1896 to Rio 2016. Each row consists of an individual athlete (271116 instances) competing in an Olympic event and which medal was won (if any).

The columns consist of the following:

 Column Explanation id Unique number for each athlete name Athlete's name sex M or F age Age of the athlete height In centimeters weight In kilograms team Team name noc National Olympic Committee 3 games Year and season year Integer season Summer or Winter city Host city sport Sport event Event medal Gold, Silver, Bronze, or NA

Given our current understanding, the columns we can expect to build a histogram from are: age, height, weight, and year. This is because histograms are constructed using continuous numerical values - they are used to represent the distribution of numerical data, where the data are binned, and the count for each bin is displayed.

We will use the age column to create our histogram.

Creating a Plotly Histogram

Creating a histogram in Python with Plotly is pretty simple; We can use Plotly Express, which is an easy-to-use, high-level interface…

``````import plotly.express as px

# Create a histogram
fig = px.histogram(olympic_data.age, x="age",
title="Distribution of Athletes age")
fig.show()``````

Figure 3: A histogram created using Plotly Express

We also have the option to use Graph Objects.

``````import plotly.graph_objects as go

fig = go.Figure(data=[go.Histogram(x=olympic_data.age)])

fig.update_layout(title=dict(text="Distribution of Athletes age"))
fig.show()``````

Figure 4: A histogram created using Plotly Graph Objects.

Plotly `express` functions internally make calls to `graph_objects`, which returns a value. Therefore, Plotly express functions are useful because they enable users to draw data visualizations that would typically take more lines of code if they were to be drawn using `graph_objects`

“The `plotly.express` module (usually imported as px) contains functions that can create entire figures at once and is referred to as Plotly Express or PX. Plotly Express is a built-in part of the Plotly library and is the recommended starting point for creating most common figures. Every Plotly Express function uses graph objects internally and returns a `plotly.graph_objects.Figure` instance.”

Source: Plotly Express in Python

Despite Plotly `express` being the recommended entry point to the Plotly library, we will use `graph_objects` to customize the appearance of our histogram as it has more flexibility.

Customizing the appearance of the Plotly histogram

Currently, our histogram is pretty bland. Let’s spice it up by changing the fonts we use for our title and adding axis labels.

``````# Create histogram
fig = go.Figure(data=[go.Histogram(x=olympic_data.age)])

fig.update_layout(
# Set the global font
font = {
"family":"Times new Roman",
"size":16
},
# Update title font
title = {
"text": "Distribution of Athletes age",
"y": 0.9, # Sets the y position with respect to `yref`
"x": 0.5, # Sets the x position of title with respect to `xref`
"xanchor":"center", # Sets the title's horizontal alignment with respect to its x position
"yanchor": "top", # Sets the title's vertical alignment with respect to its y position. "
"font": { # Only configures font for title
"family":"Arial",
"size":20,
"color": "red"
}
}
)

# Add X and Y labels
fig.update_xaxes(title_text="Age")
fig.update_yaxes(title_text="Number of Athletes")

# Display plot
fig.show()``````

Figure 5: Plotly histogram with an updated title and axis labels added in.

Next, let’s change the size of each bin.

``````# Create histogram
fig = go.Figure(data = [
go.Histogram(
x = olympic_data.age,
xbins=go.histogram.XBins(size=5) # Change the bin size to 5
)
]
)``````

Figure 6: Plotly histogram with bin size equal to 5.

In the code above, we set the `xbins` parameter in the `go.Histogram` instance to a `go.histogram.Xbins` instance and defined the bin size we want as a parameter of that instance. Note we could have also performed this step using a `dict` object with compatible properties.

Lastly, let’s change the color of the bins to orange to make the visualization stand out a little.

``````# Create histogram
fig = go.Figure(data = [
go.Histogram(
x = olympic_data.age,
xbins=go.histogram.XBins(size=5), # Change the bin size
marker=go.histogram.Marker(color="orange") # Change the color
)
]
)``````

Figure 7: Plotly histogram with orange bins.

Much better.

In the code above, we set the `marker` parameter in the `go.Histogram` instance to a `go.histogram.Marker` instance and defined the color we want as a parameter of that instance.

However, there are many other customizations we can make to our Plotly histogram, such as adding hover text.

Enabling hover text

Text detailing what each bar represents is displayed whenever you hover over a bar in the current histogram - see Figure 8.

Figure 8: The current hover text displayed when hovering over our histogram.

This is not very intuitive. When we perform data visualization, we want our plots to be self-explanatory - even down to the hover text. Therefore, let’s update the information displayed when we hover over a bin.

``````# Create histogram
fig = go.Figure(data = [
go.Histogram(
x = olympic_data.age,
xbins=go.histogram.XBins(size=5), # Change the bin size
marker=go.histogram.Marker(color="orange"), # Change the color
hovertemplate="<br>".join([
"Age Range: %{x}",
)
]
)``````

Figure 9: The updated hover text information.

The text shown when we hover over a bin is much more informative, which makes our visualization much easier to interpret. We did this by updating the `hovertemplate` parameter in our `go.Histogram` instance.

Earlier in the tutorial, we said we could expect to build a histogram including the age, height, weight, and year features. There’s no need for us to build an individual histogram for each one.

Instead, we could build a single histogram and add filter buttons that allow you to toggle through which feature to display on the visualization.

Here’s how it looks in code.

``````"""
Starter code provided by vestland on StackOverflow
"""
# continuous features
continuous_vars = ["age", "height", "weight", "year"]

# plotly
fig = go.Figure()

# set up ONE trace
go.Histogram(x = olympic_data.age,
xbins=go.histogram.XBins(size=5), # Change the bin size
marker=go.histogram.Marker(color="orange"), # Change the color
)
)

buttons = []

# button with one option for each dataframe
for col in continuous_vars:
buttons.append(dict(method='restyle',
label=col,
visible=True,
args=[{"x":[olympic_data[col]],
"type":'histogram', [0]],
)
)

fig.show()``````

Figure 10: The histogram with filters applied that enable users to toggle through various features on a single visualization.

And there you have it; be sure to check out the Creating a Plotly Histogram with Olympics Data workspace, and play around with what we’ve built in this tutorial.

Wrap up

You now know how to create a histogram in Plotly Python; In this tutorial, we covered:

• What a histogram is and how it works
• The importance of data visualization in data analysis
• How to get started with DataCamp workspaces
• How to create a histogram in Plotly
• Different ways to customize your histogram
• How to apply filters

If you’re looking for further resources about creating histograms or using Plotly, check out the links below:

Topics

Plotly Courses

Course

.css-1531qan{-webkit-text-decoration:none;text-decoration:none;color:inherit;}Introduction to Data Visualization with Plotly in Python

4 hr
12.1K
Create interactive data visualizations in Python using Plotly.
See Details
Start Course

Course

Building Dashboards with Dash and Plotly

4 hr
10.1K
Learn how to build interactive and insight-rich dashboards with Dash and Plotly.

Course

Interactive Data Visualization with plotly in R

4 hr
11.8K
Learn how to use plotly in R to create interactive data visualizations to enhance your data storytelling.
See More
Related

Mastering the Pandas .explode() Method: A Comprehensive Guide

Learn all you need to know about the pandas .explode() method, covering single and multiple columns, handling nested data, and common pitfalls with practical Python code examples.

5 min

Python NaN: 4 Ways to Check for Missing Values in Python

Explore 4 ways to detect NaN values in Python, using NumPy and Pandas. Learn key differences between NaN and None to clean and analyze data efficiently.

5 min

Seaborn Heatmaps: A Guide to Data Visualization

Learn how to create eye-catching Seaborn heatmaps

Joleen Bothma

9 min

Test-Driven Development in Python: A Beginner's Guide

Dive into test-driven development (TDD) with our comprehensive Python tutorial. Learn how to write robust tests before coding with practical examples.

Amina Edmunds

7 min

Exponents in Python: A Comprehensive Guide for Beginners

Master exponents in Python using various methods, from built-in functions to powerful libraries like NumPy, and leverage them in real-world scenarios to gain a deeper understanding.

Satyam Tripathi

9 min

Python Linked Lists: Tutorial With Examples

Learn everything you need to know about linked lists: when to use them, their types, and implementation in Python.

Natassha Selvaraj

9 min

See MoreSee More