Home » Tutorial on Creating Histograms in Matplotlib

Tutorial on Creating Histograms in Matplotlib

1 Year Subscription
Java SE 11 Programmer I [1Z0-815] Practice Tests
Spring Framework Basics Video Course
Oracle Java Certification
Java SE 11 Programmer II [1Z0-816] Practice Tests
Java SE 11 Developer (Upgrade) [1Z0-817]

Histograms are a powerful tool for visualizing the distribution of a dataset.

They allow us to see the frequency of data points within specified ranges or bins, making them useful for understanding patterns, outliers, and the overall distribution of values.

Matplotlib provides extensive customization options for histograms, allowing you to adjust bin size, colors, transparency, and more.

In this tutorial, we’ll explore how to create and customize histograms in Matplotlib with examples, covering the basics, binning options, customizing colors, adding density, plotting multiple histograms, and more.

1. Basic Histogram

A basic histogram can be created using the hist function in Matplotlib. The data is divided into bins, and the height of each bar represents the frequency of data points within each bin.

import matplotlib.pyplot as plt
import numpy as np

# Generate sample data
data = np.random.randn(1000)  # Normally distributed data

# Create a basic histogram
plt.figure(figsize=(8, 5))
plt.hist(data, bins=30)
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.title("Basic Histogram")
plt.show()

In this example:

  • data contains 1000 random numbers from a normal distribution.
  • plt.hist(data, bins=30) creates a histogram with 30 bins.

2. Adjusting Bin Size

The bins parameter in the hist function controls the number of bins. Adjusting this value changes the level of detail in the histogram.

# Histogram with different bin sizes
plt.figure(figsize=(12, 5))

# Plot with fewer bins
plt.subplot(1, 2, 1)
plt.hist(data, bins=10, color='skyblue', edgecolor='black')
plt.title("Histogram with 10 Bins")
plt.xlabel("Value")
plt.ylabel("Frequency")

# Plot with more bins
plt.subplot(1, 2, 2)
plt.hist(data, bins=50, color='salmon', edgecolor='black')
plt.title("Histogram with 50 Bins")
plt.xlabel("Value")

plt.tight_layout()
plt.show()

In this example:

  • The first plot has 10 bins, providing a broad view of the distribution.
  • The second plot has 50 bins, showing more detail.

3. Customizing Colors and Transparency

You can customize the color of the histogram bars and add transparency to make overlapping histograms more visible.

# Histogram with custom color and transparency
plt.figure(figsize=(8, 5))
plt.hist(data, bins=30, color='purple', edgecolor='black', alpha=0.7)
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.title("Histogram with Custom Color and Transparency")
plt.show()

In this example:

  • color='purple' sets the color of the bars.
  • edgecolor='black' adds a black outline around each bar.
  • alpha=0.7 makes the bars slightly transparent.

4. Adding a Density Plot

You can normalize the histogram to show density instead of frequency by setting density=True, making the total area under the histogram equal to 1.

# Histogram with density
plt.figure(figsize=(8, 5))
plt.hist(data, bins=30, color='skyblue', edgecolor='black', density=True)
plt.xlabel("Value")
plt.ylabel("Density")
plt.title("Histogram with Density")
plt.show()

In this example:

  • density=True scales the histogram so that the area of the bars adds up to 1.

5. Overlaying a Density Line on a Histogram

You can add a smooth density line on top of a histogram to show the distribution curve more clearly.

from scipy.stats import norm

# Overlay density line
plt.figure(figsize=(8, 5))
plt.hist(data, bins=30, color='lightgray', edgecolor='black', density=True)
xmin, xmax = plt.xlim()
x = np.linspace(xmin, xmax, 100)
p = norm.pdf(x, np.mean(data), np.std(data))
plt.plot(x, p, color='blue', linewidth=2, label="Density")
plt.xlabel("Value")
plt.ylabel("Density")
plt.title("Histogram with Density Line")
plt.legend()
plt.show()

In this example:

  • plt.plot(x, p) overlays a density line using a normal distribution.
  • norm.pdf(x, np.mean(data), np.std(data)) computes the probability density function for a normal distribution fitted to the data.

6. Multiple Histograms on the Same Plot

You can plot multiple histograms on the same plot to compare different datasets. Use transparency (alpha) to see overlapping data more clearly.

# Generate two sample datasets
data1 = np.random.normal(0, 1, 1000)
data2 = np.random.normal(2, 1.5, 1000)

# Plot multiple histograms
plt.figure(figsize=(8, 5))
plt.hist(data1, bins=30, color='blue', alpha=0.6, label="Data 1")
plt.hist(data2, bins=30, color='orange', alpha=0.6, label="Data 2")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.title("Multiple Histograms")
plt.legend()
plt.show()

In this example:

  • alpha=0.6 makes each dataset partially transparent, making it easier to see overlap.
  • plt.legend() adds a legend to differentiate between datasets.

7. Stacked Histogram

A stacked histogram allows you to display multiple datasets as cumulative bars. This is useful for visualizing the sum of different categories.

# Stacked histogram
plt.figure(figsize=(8, 5))
plt.hist([data1, data2], bins=30, stacked=True, color=['blue', 'orange'], label=['Data 1', 'Data 2'])
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.title("Stacked Histogram")
plt.legend()
plt.show()

In this example:

  • stacked=True stacks the histograms on top of each other.
  • [data1, data2] allows both datasets to be plotted together in a stacked manner.

8. Cumulative Histogram

A cumulative histogram shows the cumulative frequency for each bin, indicating the running total of data points up to that bin.

# Cumulative histogram
plt.figure(figsize=(8, 5))
plt.hist(data, bins=30, cumulative=True, color='teal', edgecolor='black')
plt.xlabel("Value")
plt.ylabel("Cumulative Frequency")
plt.title("Cumulative Histogram")
plt.show()

In this example:

  • cumulative=True makes the histogram cumulative, showing the total frequency up to each bin.

9. Horizontal Histogram

You can create a horizontal histogram using the orientation='horizontal' parameter. This is useful when you have long category names or when displaying data in rank order.

# Horizontal histogram
plt.figure(figsize=(8, 5))
plt.hist(data, bins=30, color='salmon', edgecolor='black', orientation='horizontal')
plt.xlabel("Frequency")
plt.ylabel("Value")
plt.title("Horizontal Histogram")
plt.show()

In this example:

  • orientation='horizontal' rotates the histogram so that bars extend horizontally.

10. Histogram with Logarithmic Scale

You can use logarithmic scaling on the y-axis if your data spans a wide range of values. This is useful for datasets with large variances.

# Histogram with log scale
plt.figure(figsize=(8, 5))
plt.hist(data, bins=30, color='purple', edgecolor='black', log=True)
plt.xlabel("Value")
plt.ylabel("Log Frequency")
plt.title("Histogram with Logarithmic Scale")
plt.show()

In this example:

  • log=True applies a logarithmic scale to the y-axis, making it easier to interpret data with a large range of frequencies.

11. Histogram with Custom Bin Ranges

You can specify custom bin edges by passing a list to the bins parameter. This gives you full control over the range and size of each bin.

# Custom bin ranges
custom_bins = [-3, -1, 0, 1, 3]

plt.figure(figsize=(8, 5))
plt.hist(data, bins=custom_bins, color='darkcyan', edgecolor='black')
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.title("Histogram with Custom Bin Ranges")
plt.show()

In this example:

  • bins=custom_bins uses custom bin edges to define the bins.

12. Adding Annotations to a Histogram

Annotations help in displaying additional information, such as the count of each bin directly on the bars.

# Histogram with annotations
plt.figure(figsize=(8, 5))
counts, bins, patches = plt.hist(data, bins=30, color='royalblue', edgecolor='black')

# Annotate each bar with its count
for count, bin, patch in zip(counts, bins, patches):
    plt.text(bin + 0.1, count + 1, str(int(count)), ha='center', color='black')

plt.xlabel("Value")
plt.ylabel("Frequency")
plt.title("Histogram with Annotations")
plt.show()

In this example:

  • plt.text() places the count of each bin above the corresponding bar.

13. Histogram with a Legend

Adding a legend to a histogram helps when displaying multiple datasets or providing more context to a single histogram.

# Histogram with legend
plt.figure(figsize=(

8, 5))
plt.hist(data1, bins=30, color='skyblue', alpha=0.6, label="Dataset 1")
plt.hist(data2, bins=30, color='salmon', alpha=0.6, label="Dataset 2")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.title("Histogram with Legend")
plt.legend()
plt.show()

In this example:

  • plt.legend() provides a legend to differentiate between data1 and data2.

14. Multiple Histograms with Subplots

If you want to compare several histograms side by side, using subplots is an effective approach.

# Subplots for multiple histograms
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))

# First histogram
ax1.hist(data1, bins=30, color='skyblue', edgecolor='black')
ax1.set_title("Histogram of Dataset 1")
ax1.set_xlabel("Value")
ax1.set_ylabel("Frequency")

# Second histogram
ax2.hist(data2, bins=30, color='salmon', edgecolor='black')
ax2.set_title("Histogram of Dataset 2")
ax2.set_xlabel("Value")

plt.suptitle("Comparison of Multiple Histograms")
plt.tight_layout()
plt.show()

In this example:

  • plt.subplots(1, 2) creates two histograms side by side, allowing for easy comparison.

Summary

In this tutorial, we covered various ways to create and customize histograms in Matplotlib:

  1. Basic Histogram to show the distribution of values.
  2. Adjusting Bin Size to change the level of detail.
  3. Custom Colors and Transparency for visual customization.
  4. Density Histogram to show proportions instead of counts.
  5. Overlaying a Density Line on a histogram for a smoothed view.
  6. Multiple Histograms on the same plot with transparency.
  7. Stacked Histogram to add up categories.
  8. Cumulative Histogram for cumulative distribution.
  9. Horizontal Histogram for a rotated view.
  10. Logarithmic Scale for large variances.
  11. Custom Bin Ranges to control specific intervals.
  12. Adding Annotations to show the count for each bin.
  13. Legend for Multiple Datasets to provide context.
  14. Multiple Histograms with Subplots for side-by-side comparisons.

These examples demonstrate how histograms can be tailored to effectively represent data distributions, allowing for in-depth and visually clear analysis of various data characteristics.

You may also like

Leave a Comment

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More