Home » Tutorial on Creating Box Plots in Matplotlib

Tutorial on Creating Box Plots in Matplotlib

Java SE 11 Developer (Upgrade) [1Z0-817]
Oracle Java Certification
Spring Framework Basics Video Course
Java SE 11 Programmer II [1Z0-816] Practice Tests
1 Year Subscription
Java SE 11 Programmer I [1Z0-815] Practice Tests

Box plots (or box-and-whisker plots) are useful for displaying the distribution of data based on a five-number summary: minimum, first quartile (Q1), median, third quartile (Q3), and maximum.

They provide insights into the spread, skewness, and outliers of the data, making them valuable for data analysis and exploratory data visualization.

In this tutorial, we’ll explore how to create and customize box plots in Matplotlib, covering the basics, horizontal and grouped box plots, adding custom colors, displaying individual data points, handling multiple datasets, and more.

1. Basic Box Plot

A basic box plot can be created using the boxplot function in Matplotlib, which takes a list or array of data.

import matplotlib.pyplot as plt
import numpy as np

# Generate sample data
np.random.seed(0)
data = np.random.normal(0, 1, 100)

# Create a basic box plot
plt.figure(figsize=(8, 6))
plt.boxplot(data)
plt.ylabel("Values")
plt.title("Basic Box Plot")
plt.show()

In this example:

  • data contains 100 random numbers generated from a normal distribution.
  • plt.boxplot(data) creates a box plot of the data.

2. Customizing Box Plot Appearance

You can customize the appearance of a box plot by modifying its colors and adding labels.

# Customized box plot
plt.figure(figsize=(8, 6))
plt.boxplot(data, patch_artist=True, boxprops=dict(facecolor='lightblue', color='blue'), 
            whiskerprops=dict(color='blue'), capprops=dict(color='blue'), 
            medianprops=dict(color='red', linewidth=2))

plt.ylabel("Values")
plt.title("Customized Box Plot")
plt.show()

In this example:

  • patch_artist=True allows the box to be filled with color.
  • boxprops, whiskerprops, capprops, and medianprops are dictionaries that set the colors for the box, whiskers, caps, and median line, respectively.

3. Horizontal Box Plot

A horizontal box plot is useful when you have long category names or when you prefer to display data horizontally.

# Horizontal box plot
plt.figure(figsize=(8, 6))
plt.boxplot(data, vert=False, patch_artist=True, boxprops=dict(facecolor='lightgreen'))
plt.xlabel("Values")
plt.title("Horizontal Box Plot")
plt.show()

In this example:

  • vert=False displays the box plot horizontally.
  • boxprops=dict(facecolor='lightgreen') sets the color of the box.

4. Box Plot for Multiple Datasets

You can visualize multiple datasets side by side by passing a list of arrays to boxplot.

# Generate multiple datasets
data1 = np.random.normal(0, 1, 100)
data2 = np.random.normal(2, 1.5, 100)
data3 = np.random.normal(-2, 1, 100)

# Create box plots for multiple datasets
plt.figure(figsize=(8, 6))
plt.boxplot([data1, data2, data3], patch_artist=True)
plt.xticks([1, 2, 3], ['Dataset 1', 'Dataset 2', 'Dataset 3'])
plt.ylabel("Values")
plt.title("Box Plot for Multiple Datasets")
plt.show()

In this example:

  • [data1, data2, data3] passes multiple datasets to plt.boxplot.
  • plt.xticks() labels each box for clarity.

5. Grouped Box Plot

A grouped box plot is useful for comparing different categories within multiple groups. Here’s an example using np.random.rand to create data for each group.

# Data for grouped box plot
np.random.seed(0)
data_grouped = [np.random.normal(loc, 0.5, 100) for loc in [1, 2, 3, 4]]

# Create grouped box plot
plt.figure(figsize=(10, 6))
plt.boxplot(data_grouped, patch_artist=True)
plt.xticks([1, 2, 3, 4], ['Group 1', 'Group 2', 'Group 3', 'Group 4'])
plt.xlabel("Groups")
plt.ylabel("Values")
plt.title("Grouped Box Plot")
plt.show()

In this example:

  • data_grouped contains four arrays of random data centered at different locations (1, 2, 3, and 4).
  • plt.xticks() labels each group.

6. Adding Individual Data Points to Box Plot

You can add individual data points on top of a box plot using a scatter plot. This can help visualize the distribution of individual data points.

# Create box plot with individual data points
plt.figure(figsize=(8, 6))
plt.boxplot(data, patch_artist=True, boxprops=dict(facecolor='lightblue'))
plt.scatter(np.ones(data.size) + 0.1 * np.random.rand(data.size) - 0.05, data, color='red', alpha=0.6)

plt.ylabel("Values")
plt.title("Box Plot with Individual Data Points")
plt.show()

In this example:

  • plt.scatter() adds red points, slightly offset horizontally, to show individual data points.
  • np.ones(data.size) + 0.1 * np.random.rand(data.size) – 0.05 slightly randomizes the x-coordinates for better visibility.

7. Adding Notches to the Box Plot

Notches in a box plot represent the confidence interval around the median. They make it easier to compare medians between boxes.

# Box plot with notches
plt.figure(figsize=(8, 6))
plt.boxplot(data, notch=True, patch_artist=True, boxprops=dict(facecolor='lightcoral'))
plt.ylabel("Values")
plt.title("Box Plot with Notches")
plt.show()

In this example:

  • notch=True adds notches to the box plot, indicating the confidence interval around the median.

8. Displaying Outliers

Outliers are displayed as points outside the whiskers by default. You can customize the appearance of these outliers.

# Box plot with customized outlier symbols
plt.figure(figsize=(8, 6))
plt.boxplot(data, patch_artist=True, flierprops=dict(marker='o', color='red', markersize=8))
plt.ylabel("Values")
plt.title("Box Plot with Customized Outliers")
plt.show()

In this example:

  • flierprops customizes the appearance of outliers, setting them as red circles with a size of 8.

9. Box Plot with Multiple Box Properties

You can create a box plot with different colors and line styles for each part of the box to add more style to the plot.

# Box plot with custom properties for each component
plt.figure(figsize=(8, 6))
plt.boxplot(data, patch_artist=True,
            boxprops=dict(facecolor='lightblue', color='navy'),
            whiskerprops=dict(color='darkblue', linestyle='--'),
            capprops=dict(color='blue'),
            medianprops=dict(color='red', linewidth=2),
            flierprops=dict(marker='D', color='red', markersize=6))
plt.ylabel("Values")
plt.title("Box Plot with Multiple Custom Properties")
plt.show()

In this example:

  • Different components of the box plot, including the box, whiskers, caps, median line, and outliers, are customized using various properties.

10. Horizontal Box Plot with Multiple Datasets

If you prefer a horizontal layout for multiple datasets, you can set vert=False for each dataset.

# Horizontal box plot with multiple datasets
plt.figure(figsize=(8, 6))
plt.boxplot([data1, data2, data3], vert=False, patch_artist=True, boxprops=dict(facecolor='lightgreen'))
plt.yticks([1, 2, 3], ['Dataset 1', 'Dataset 2', 'Dataset 3'])
plt.xlabel("Values")
plt.title("Horizontal Box Plot for Multiple Datasets")
plt.show()

In this example:

  • vert=False displays each box horizontally.
  • plt.yticks() labels each box with the dataset name.

11. Overlaying a Box Plot on Top of a Violin Plot

A violin plot provides additional information by showing the kernel density of the data along with the box plot. This allows for a more comprehensive view of the data distribution.

import seaborn as sns

# Create violin plot with overlayed box plot
plt.figure(figsize=(8, 6))
sns.violinplot(data=[data1, data2, data3], inner=None, color='lightgrey')
plt.boxplot([data1, data2, data3], patch_artist=True, widths=0.2)
plt.xticks([0, 1, 2], ['Dataset 1', 'Dataset 2', 'Dataset 3'])
plt.xlabel("Datasets")
plt.title("Violin Plot with Overlayed Box Plot")
plt.show()

In this example:

  • sns.violinplot() from Seaborn creates a violin plot.
  • plt.boxplot() overlays a box plot on top of the violin plot to provide the summary statistics.

12. Box Plot with Subplots for Different Groups

You can use subplots to display multiple box plots side by side for easy comparison between different groups or datasets.

# Create data for different subplots
data_group1 = np.random.normal

(0, 1, 100)
data_group2 = np.random.normal(1, 1.2, 100)
data_group3 = np.random.normal(-1, 0.8, 100)

fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(15, 5))

# Plot each group in a separate subplot
ax1.boxplot(data_group1, patch_artist=True, boxprops=dict(facecolor='skyblue'))
ax1.set_title("Group 1")

ax2.boxplot(data_group2, patch_artist=True, boxprops=dict(facecolor='lightgreen'))
ax2.set_title("Group 2")

ax3.boxplot(data_group3, patch_artist=True, boxprops=dict(facecolor='salmon'))
ax3.set_title("Group 3")

plt.suptitle("Box Plots for Different Groups")
plt.show()

In this example:

  • plt.subplots(1, 3) creates a figure with three side-by-side subplots, each displaying a box plot for a different dataset.

Summary

In this tutorial, we covered a variety of ways to create and customize box plots in Matplotlib:

  1. Basic Box Plot to display distribution.
  2. Customizing Box Plot Appearance by setting colors and styles.
  3. Horizontal Box Plot for better readability.
  4. Box Plot for Multiple Datasets for comparative analysis.
  5. Grouped Box Plot to visualize multiple categories.
  6. Adding Individual Data Points to show distribution within each category.
  7. Box Plot with Notches for comparing medians.
  8. Displaying Outliers with customized symbols.
  9. Customizing Each Box Component for a detailed look.
  10. Horizontal Box Plot with Multiple Datasets to handle many categories.
  11. Overlaying Box Plot on Violin Plot to show both density and summary statistics.
  12. Box Plot with Subplots to compare different groups side by side.

These examples demonstrate the flexibility of Matplotlib’s boxplot function, allowing you to create visually informative and highly customizable box plots suitable for various data visualization needs.

You may also like

Leave a Comment

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More