Box plots (or box-and-whisker plots) are useful for displaying the distribution of data based on a five-number summary: minimum, first quartile (Q1), median, third quartile (Q3), and maximum.
They provide insights into the spread, skewness, and outliers of the data, making them valuable for data analysis and exploratory data visualization.
In this tutorial, we’ll explore how to create and customize box plots in Matplotlib, covering the basics, horizontal and grouped box plots, adding custom colors, displaying individual data points, handling multiple datasets, and more.
1. Basic Box Plot
A basic box plot can be created using the boxplot function in Matplotlib, which takes a list or array of data.
import matplotlib.pyplot as plt import numpy as np # Generate sample data np.random.seed(0) data = np.random.normal(0, 1, 100) # Create a basic box plot plt.figure(figsize=(8, 6)) plt.boxplot(data) plt.ylabel("Values") plt.title("Basic Box Plot") plt.show()
In this example:
- data contains 100 random numbers generated from a normal distribution.
- plt.boxplot(data) creates a box plot of the data.
2. Customizing Box Plot Appearance
You can customize the appearance of a box plot by modifying its colors and adding labels.
# Customized box plot plt.figure(figsize=(8, 6)) plt.boxplot(data, patch_artist=True, boxprops=dict(facecolor='lightblue', color='blue'), whiskerprops=dict(color='blue'), capprops=dict(color='blue'), medianprops=dict(color='red', linewidth=2)) plt.ylabel("Values") plt.title("Customized Box Plot") plt.show()
In this example:
- patch_artist=True allows the box to be filled with color.
- boxprops, whiskerprops, capprops, and medianprops are dictionaries that set the colors for the box, whiskers, caps, and median line, respectively.
3. Horizontal Box Plot
A horizontal box plot is useful when you have long category names or when you prefer to display data horizontally.
# Horizontal box plot plt.figure(figsize=(8, 6)) plt.boxplot(data, vert=False, patch_artist=True, boxprops=dict(facecolor='lightgreen')) plt.xlabel("Values") plt.title("Horizontal Box Plot") plt.show()
In this example:
- vert=False displays the box plot horizontally.
- boxprops=dict(facecolor='lightgreen') sets the color of the box.
4. Box Plot for Multiple Datasets
You can visualize multiple datasets side by side by passing a list of arrays to boxplot.
# Generate multiple datasets data1 = np.random.normal(0, 1, 100) data2 = np.random.normal(2, 1.5, 100) data3 = np.random.normal(-2, 1, 100) # Create box plots for multiple datasets plt.figure(figsize=(8, 6)) plt.boxplot([data1, data2, data3], patch_artist=True) plt.xticks([1, 2, 3], ['Dataset 1', 'Dataset 2', 'Dataset 3']) plt.ylabel("Values") plt.title("Box Plot for Multiple Datasets") plt.show()
In this example:
- [data1, data2, data3] passes multiple datasets to plt.boxplot.
- plt.xticks() labels each box for clarity.
5. Grouped Box Plot
A grouped box plot is useful for comparing different categories within multiple groups. Here’s an example using np.random.rand to create data for each group.
# Data for grouped box plot np.random.seed(0) data_grouped = [np.random.normal(loc, 0.5, 100) for loc in [1, 2, 3, 4]] # Create grouped box plot plt.figure(figsize=(10, 6)) plt.boxplot(data_grouped, patch_artist=True) plt.xticks([1, 2, 3, 4], ['Group 1', 'Group 2', 'Group 3', 'Group 4']) plt.xlabel("Groups") plt.ylabel("Values") plt.title("Grouped Box Plot") plt.show()
In this example:
- data_grouped contains four arrays of random data centered at different locations (1, 2, 3, and 4).
- plt.xticks() labels each group.
6. Adding Individual Data Points to Box Plot
You can add individual data points on top of a box plot using a scatter plot. This can help visualize the distribution of individual data points.
# Create box plot with individual data points plt.figure(figsize=(8, 6)) plt.boxplot(data, patch_artist=True, boxprops=dict(facecolor='lightblue')) plt.scatter(np.ones(data.size) + 0.1 * np.random.rand(data.size) - 0.05, data, color='red', alpha=0.6) plt.ylabel("Values") plt.title("Box Plot with Individual Data Points") plt.show()
In this example:
- plt.scatter() adds red points, slightly offset horizontally, to show individual data points.
- np.ones(data.size) + 0.1 * np.random.rand(data.size) – 0.05 slightly randomizes the x-coordinates for better visibility.
7. Adding Notches to the Box Plot
Notches in a box plot represent the confidence interval around the median. They make it easier to compare medians between boxes.
# Box plot with notches plt.figure(figsize=(8, 6)) plt.boxplot(data, notch=True, patch_artist=True, boxprops=dict(facecolor='lightcoral')) plt.ylabel("Values") plt.title("Box Plot with Notches") plt.show()
In this example:
- notch=True adds notches to the box plot, indicating the confidence interval around the median.
8. Displaying Outliers
Outliers are displayed as points outside the whiskers by default. You can customize the appearance of these outliers.
# Box plot with customized outlier symbols plt.figure(figsize=(8, 6)) plt.boxplot(data, patch_artist=True, flierprops=dict(marker='o', color='red', markersize=8)) plt.ylabel("Values") plt.title("Box Plot with Customized Outliers") plt.show()
In this example:
- flierprops customizes the appearance of outliers, setting them as red circles with a size of 8.
9. Box Plot with Multiple Box Properties
You can create a box plot with different colors and line styles for each part of the box to add more style to the plot.
# Box plot with custom properties for each component plt.figure(figsize=(8, 6)) plt.boxplot(data, patch_artist=True, boxprops=dict(facecolor='lightblue', color='navy'), whiskerprops=dict(color='darkblue', linestyle='--'), capprops=dict(color='blue'), medianprops=dict(color='red', linewidth=2), flierprops=dict(marker='D', color='red', markersize=6)) plt.ylabel("Values") plt.title("Box Plot with Multiple Custom Properties") plt.show()
In this example:
- Different components of the box plot, including the box, whiskers, caps, median line, and outliers, are customized using various properties.
10. Horizontal Box Plot with Multiple Datasets
If you prefer a horizontal layout for multiple datasets, you can set vert=False for each dataset.
# Horizontal box plot with multiple datasets plt.figure(figsize=(8, 6)) plt.boxplot([data1, data2, data3], vert=False, patch_artist=True, boxprops=dict(facecolor='lightgreen')) plt.yticks([1, 2, 3], ['Dataset 1', 'Dataset 2', 'Dataset 3']) plt.xlabel("Values") plt.title("Horizontal Box Plot for Multiple Datasets") plt.show()
In this example:
- vert=False displays each box horizontally.
- plt.yticks() labels each box with the dataset name.
11. Overlaying a Box Plot on Top of a Violin Plot
A violin plot provides additional information by showing the kernel density of the data along with the box plot. This allows for a more comprehensive view of the data distribution.
import seaborn as sns # Create violin plot with overlayed box plot plt.figure(figsize=(8, 6)) sns.violinplot(data=[data1, data2, data3], inner=None, color='lightgrey') plt.boxplot([data1, data2, data3], patch_artist=True, widths=0.2) plt.xticks([0, 1, 2], ['Dataset 1', 'Dataset 2', 'Dataset 3']) plt.xlabel("Datasets") plt.title("Violin Plot with Overlayed Box Plot") plt.show()
In this example:
- sns.violinplot() from Seaborn creates a violin plot.
- plt.boxplot() overlays a box plot on top of the violin plot to provide the summary statistics.
12. Box Plot with Subplots for Different Groups
You can use subplots to display multiple box plots side by side for easy comparison between different groups or datasets.
# Create data for different subplots data_group1 = np.random.normal (0, 1, 100) data_group2 = np.random.normal(1, 1.2, 100) data_group3 = np.random.normal(-1, 0.8, 100) fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(15, 5)) # Plot each group in a separate subplot ax1.boxplot(data_group1, patch_artist=True, boxprops=dict(facecolor='skyblue')) ax1.set_title("Group 1") ax2.boxplot(data_group2, patch_artist=True, boxprops=dict(facecolor='lightgreen')) ax2.set_title("Group 2") ax3.boxplot(data_group3, patch_artist=True, boxprops=dict(facecolor='salmon')) ax3.set_title("Group 3") plt.suptitle("Box Plots for Different Groups") plt.show()
In this example:
- plt.subplots(1, 3) creates a figure with three side-by-side subplots, each displaying a box plot for a different dataset.
Summary
In this tutorial, we covered a variety of ways to create and customize box plots in Matplotlib:
- Basic Box Plot to display distribution.
- Customizing Box Plot Appearance by setting colors and styles.
- Horizontal Box Plot for better readability.
- Box Plot for Multiple Datasets for comparative analysis.
- Grouped Box Plot to visualize multiple categories.
- Adding Individual Data Points to show distribution within each category.
- Box Plot with Notches for comparing medians.
- Displaying Outliers with customized symbols.
- Customizing Each Box Component for a detailed look.
- Horizontal Box Plot with Multiple Datasets to handle many categories.
- Overlaying Box Plot on Violin Plot to show both density and summary statistics.
- Box Plot with Subplots to compare different groups side by side.
These examples demonstrate the flexibility of Matplotlib’s boxplot function, allowing you to create visually informative and highly customizable box plots suitable for various data visualization needs.