How to Make Histogram with Plotly

Let’s build a histogram!

Import Data

import pandas as pd
from plotly import express as px

url = "https://raw.githubusercontent.com/PhilChodrow/PIC16B/master/datasets/palmer_penguins.csv"
penguins = pd.read_csv(url)

The Basic Information of Data

We should always check waht are the variables in our data, and the total number of observations

penguins.head()
studyName Sample Number Species Region Island Stage Individual ID Clutch Completion Date Egg Culmen Length (mm) Culmen Depth (mm) Flipper Length (mm) Body Mass (g) Sex Delta 15 N (o/oo) Delta 13 C (o/oo) Comments
0 PAL0708 1 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen Adult, 1 Egg Stage N1A1 Yes 11/11/07 39.1 18.7 181.0 3750.0 MALE NaN NaN Not enough blood for isotopes.
1 PAL0708 2 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen Adult, 1 Egg Stage N1A2 Yes 11/11/07 39.5 17.4 186.0 3800.0 FEMALE 8.94956 -24.69454 NaN
2 PAL0708 3 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen Adult, 1 Egg Stage N2A1 Yes 11/16/07 40.3 18.0 195.0 3250.0 FEMALE 8.36821 -25.33302 NaN
3 PAL0708 4 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen Adult, 1 Egg Stage N2A2 Yes 11/16/07 NaN NaN NaN NaN NaN NaN NaN Adult not sampled.
4 PAL0708 5 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen Adult, 1 Egg Stage N3A1 Yes 11/16/07 36.7 19.3 193.0 3450.0 FEMALE 8.76651 -25.32426 NaN
penguins.shape
(344, 17)

Data cleanning

penguins = pd.read_csv(url)
penguins = penguins.dropna(subset = ["Body Mass (g)", "Sex"])
penguins["Species"] = penguins["Species"].str.split().str.get(0)
penguins = penguins[penguins["Sex"] != "."]

cols = ["Species", "Island", "Sex", "Culmen Length (mm)", "Culmen Depth (mm)", "Flipper Length (mm)", "Body Mass (g)"]
penguins = penguins[cols]
penguins.head()
Species Island Sex Culmen Length (mm) Culmen Depth (mm) Flipper Length (mm) Body Mass (g)
0 Adelie Torgersen MALE 39.1 18.7 181.0 3750.0
1 Adelie Torgersen FEMALE 39.5 17.4 186.0 3800.0
2 Adelie Torgersen FEMALE 40.3 18.0 195.0 3250.0
4 Adelie Torgersen FEMALE 36.7 19.3 193.0 3450.0
5 Adelie Torgersen MALE 39.3 20.6 190.0 3650.0

We have total 344 obseravtions, and 17 variables in this data set.

More About the Data

For example, I am interested in how many observations of different species of penguins, we could use groupby () to achieve it.

penguins.groupby("Species").count()
Island Sex Culmen Length (mm) Culmen Depth (mm) Flipper Length (mm) Body Mass (g)
Species
Adelie 146 146 146 146 146 146
Chinstrap 68 68 68 68 68 68
Gentoo 119 119 119 119 119 119

Make a histgram plot by use plotly

A histogram is the most commonly used graph to show frequency distributions. To plot a simple histogram, we need a data frame, and a variable you want to observe.

fig = px.histogram(data_frame = penguins, 
                   x = "Culmen Length (mm)", 
                   )

# show the plot
fig.show()

output_12_1.png

The plot looks too simple, let’s add more components:

  • the **color** argument takes a name of a column in data_frame and uses it to assign a color to the plot. Here, we want to assign different colors with respect to different species of penguins. Using this argument will also create a legend.
  • the color of bars looks too bright, then we could use **opacity** argument which takes value from 0 to 1 to adjust the transparency
fig = px.histogram(data_frame = penguins, 
                   x = "Culmen Length (mm)", 
                   color = "Species",
                   opacity = 0.5
                   )

# show the plot
fig.show()

output_14_0.png

Now, our plot looks a little better than the first version, but it may look too wide compared to its height. So by using width and height arguments, we could adjust the ratio of the plot. Also, using nbins can set the bins we want to show in the plot

fig = px.histogram(data_frame = penguins, 
                   x = "Culmen Length (mm)", 
                   color = "Species",
                   opacity = 0.5,
                   nbins = 30, 
                   width = 900,
                   height = 450,
                   barmode='stack'
                   )

# show the plot
fig.show()

output_16_0.png

Finally, add a title to the plot

fig = px.histogram(data_frame = penguins, 
                   x = "Culmen Length (mm)", 
                   color = "Species",
                   opacity = 0.5,
                   nbins = 30, 
                   width = 900,
                   height = 450,
                   barmode='stack'
                   )

# set title and its alignment
fig.update_layout(title_text="Histogram of Penguins' Culmen Length", title_x=0.5, title_y=0.92)
# show the plot
fig.show()

output_18_0.png

Written on April 4, 2022