Histograms are a powerful tool for visualizing the distribution of a dataset.
They allow us to see the frequency of data points within specified ranges or bins, making them useful for understanding patterns, outliers, and the overall distribution of values.
Matplotlib provides extensive customization options for histograms, allowing you to adjust bin size, colors, transparency, and more.
In this tutorial, we’ll explore how to create and customize histograms in Matplotlib with examples, covering the basics, binning options, customizing colors, adding density, plotting multiple histograms, and more.
1. Basic Histogram
A basic histogram can be created using the hist function in Matplotlib. The data is divided into bins, and the height of each bar represents the frequency of data points within each bin.
import matplotlib.pyplot as plt import numpy as np # Generate sample data data = np.random.randn(1000) # Normally distributed data # Create a basic histogram plt.figure(figsize=(8, 5)) plt.hist(data, bins=30) plt.xlabel("Value") plt.ylabel("Frequency") plt.title("Basic Histogram") plt.show()
In this example:
- data contains 1000 random numbers from a normal distribution.
- plt.hist(data, bins=30) creates a histogram with 30 bins.
2. Adjusting Bin Size
The bins parameter in the hist function controls the number of bins. Adjusting this value changes the level of detail in the histogram.
# Histogram with different bin sizes plt.figure(figsize=(12, 5)) # Plot with fewer bins plt.subplot(1, 2, 1) plt.hist(data, bins=10, color='skyblue', edgecolor='black') plt.title("Histogram with 10 Bins") plt.xlabel("Value") plt.ylabel("Frequency") # Plot with more bins plt.subplot(1, 2, 2) plt.hist(data, bins=50, color='salmon', edgecolor='black') plt.title("Histogram with 50 Bins") plt.xlabel("Value") plt.tight_layout() plt.show()
In this example:
- The first plot has 10 bins, providing a broad view of the distribution.
- The second plot has 50 bins, showing more detail.
3. Customizing Colors and Transparency
You can customize the color of the histogram bars and add transparency to make overlapping histograms more visible.
# Histogram with custom color and transparency plt.figure(figsize=(8, 5)) plt.hist(data, bins=30, color='purple', edgecolor='black', alpha=0.7) plt.xlabel("Value") plt.ylabel("Frequency") plt.title("Histogram with Custom Color and Transparency") plt.show()
In this example:
- color='purple' sets the color of the bars.
- edgecolor='black' adds a black outline around each bar.
- alpha=0.7 makes the bars slightly transparent.
4. Adding a Density Plot
You can normalize the histogram to show density instead of frequency by setting density=True, making the total area under the histogram equal to 1.
# Histogram with density plt.figure(figsize=(8, 5)) plt.hist(data, bins=30, color='skyblue', edgecolor='black', density=True) plt.xlabel("Value") plt.ylabel("Density") plt.title("Histogram with Density") plt.show()
In this example:
- density=True scales the histogram so that the area of the bars adds up to 1.
5. Overlaying a Density Line on a Histogram
You can add a smooth density line on top of a histogram to show the distribution curve more clearly.
from scipy.stats import norm # Overlay density line plt.figure(figsize=(8, 5)) plt.hist(data, bins=30, color='lightgray', edgecolor='black', density=True) xmin, xmax = plt.xlim() x = np.linspace(xmin, xmax, 100) p = norm.pdf(x, np.mean(data), np.std(data)) plt.plot(x, p, color='blue', linewidth=2, label="Density") plt.xlabel("Value") plt.ylabel("Density") plt.title("Histogram with Density Line") plt.legend() plt.show()
In this example:
- plt.plot(x, p) overlays a density line using a normal distribution.
- norm.pdf(x, np.mean(data), np.std(data)) computes the probability density function for a normal distribution fitted to the data.
6. Multiple Histograms on the Same Plot
You can plot multiple histograms on the same plot to compare different datasets. Use transparency (alpha) to see overlapping data more clearly.
# Generate two sample datasets data1 = np.random.normal(0, 1, 1000) data2 = np.random.normal(2, 1.5, 1000) # Plot multiple histograms plt.figure(figsize=(8, 5)) plt.hist(data1, bins=30, color='blue', alpha=0.6, label="Data 1") plt.hist(data2, bins=30, color='orange', alpha=0.6, label="Data 2") plt.xlabel("Value") plt.ylabel("Frequency") plt.title("Multiple Histograms") plt.legend() plt.show()
In this example:
- alpha=0.6 makes each dataset partially transparent, making it easier to see overlap.
- plt.legend() adds a legend to differentiate between datasets.
7. Stacked Histogram
A stacked histogram allows you to display multiple datasets as cumulative bars. This is useful for visualizing the sum of different categories.
# Stacked histogram plt.figure(figsize=(8, 5)) plt.hist([data1, data2], bins=30, stacked=True, color=['blue', 'orange'], label=['Data 1', 'Data 2']) plt.xlabel("Value") plt.ylabel("Frequency") plt.title("Stacked Histogram") plt.legend() plt.show()
In this example:
- stacked=True stacks the histograms on top of each other.
- [data1, data2] allows both datasets to be plotted together in a stacked manner.
8. Cumulative Histogram
A cumulative histogram shows the cumulative frequency for each bin, indicating the running total of data points up to that bin.
# Cumulative histogram plt.figure(figsize=(8, 5)) plt.hist(data, bins=30, cumulative=True, color='teal', edgecolor='black') plt.xlabel("Value") plt.ylabel("Cumulative Frequency") plt.title("Cumulative Histogram") plt.show()
In this example:
- cumulative=True makes the histogram cumulative, showing the total frequency up to each bin.
9. Horizontal Histogram
You can create a horizontal histogram using the orientation='horizontal' parameter. This is useful when you have long category names or when displaying data in rank order.
# Horizontal histogram plt.figure(figsize=(8, 5)) plt.hist(data, bins=30, color='salmon', edgecolor='black', orientation='horizontal') plt.xlabel("Frequency") plt.ylabel("Value") plt.title("Horizontal Histogram") plt.show()
In this example:
- orientation='horizontal' rotates the histogram so that bars extend horizontally.
10. Histogram with Logarithmic Scale
You can use logarithmic scaling on the y-axis if your data spans a wide range of values. This is useful for datasets with large variances.
# Histogram with log scale plt.figure(figsize=(8, 5)) plt.hist(data, bins=30, color='purple', edgecolor='black', log=True) plt.xlabel("Value") plt.ylabel("Log Frequency") plt.title("Histogram with Logarithmic Scale") plt.show()
In this example:
- log=True applies a logarithmic scale to the y-axis, making it easier to interpret data with a large range of frequencies.
11. Histogram with Custom Bin Ranges
You can specify custom bin edges by passing a list to the bins parameter. This gives you full control over the range and size of each bin.
# Custom bin ranges custom_bins = [-3, -1, 0, 1, 3] plt.figure(figsize=(8, 5)) plt.hist(data, bins=custom_bins, color='darkcyan', edgecolor='black') plt.xlabel("Value") plt.ylabel("Frequency") plt.title("Histogram with Custom Bin Ranges") plt.show()
In this example:
- bins=custom_bins uses custom bin edges to define the bins.
12. Adding Annotations to a Histogram
Annotations help in displaying additional information, such as the count of each bin directly on the bars.
# Histogram with annotations plt.figure(figsize=(8, 5)) counts, bins, patches = plt.hist(data, bins=30, color='royalblue', edgecolor='black') # Annotate each bar with its count for count, bin, patch in zip(counts, bins, patches): plt.text(bin + 0.1, count + 1, str(int(count)), ha='center', color='black') plt.xlabel("Value") plt.ylabel("Frequency") plt.title("Histogram with Annotations") plt.show()
In this example:
- plt.text() places the count of each bin above the corresponding bar.
13. Histogram with a Legend
Adding a legend to a histogram helps when displaying multiple datasets or providing more context to a single histogram.
# Histogram with legend plt.figure(figsize=( 8, 5)) plt.hist(data1, bins=30, color='skyblue', alpha=0.6, label="Dataset 1") plt.hist(data2, bins=30, color='salmon', alpha=0.6, label="Dataset 2") plt.xlabel("Value") plt.ylabel("Frequency") plt.title("Histogram with Legend") plt.legend() plt.show()
In this example:
- plt.legend() provides a legend to differentiate between data1 and data2.
14. Multiple Histograms with Subplots
If you want to compare several histograms side by side, using subplots is an effective approach.
# Subplots for multiple histograms fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5)) # First histogram ax1.hist(data1, bins=30, color='skyblue', edgecolor='black') ax1.set_title("Histogram of Dataset 1") ax1.set_xlabel("Value") ax1.set_ylabel("Frequency") # Second histogram ax2.hist(data2, bins=30, color='salmon', edgecolor='black') ax2.set_title("Histogram of Dataset 2") ax2.set_xlabel("Value") plt.suptitle("Comparison of Multiple Histograms") plt.tight_layout() plt.show()
In this example:
- plt.subplots(1, 2) creates two histograms side by side, allowing for easy comparison.
Summary
In this tutorial, we covered various ways to create and customize histograms in Matplotlib:
- Basic Histogram to show the distribution of values.
- Adjusting Bin Size to change the level of detail.
- Custom Colors and Transparency for visual customization.
- Density Histogram to show proportions instead of counts.
- Overlaying a Density Line on a histogram for a smoothed view.
- Multiple Histograms on the same plot with transparency.
- Stacked Histogram to add up categories.
- Cumulative Histogram for cumulative distribution.
- Horizontal Histogram for a rotated view.
- Logarithmic Scale for large variances.
- Custom Bin Ranges to control specific intervals.
- Adding Annotations to show the count for each bin.
- Legend for Multiple Datasets to provide context.
- Multiple Histograms with Subplots for side-by-side comparisons.
These examples demonstrate how histograms can be tailored to effectively represent data distributions, allowing for in-depth and visually clear analysis of various data characteristics.