Home » Tutorial on Creating Violin Plots in Matplotlib

Tutorial on Creating Violin Plots in Matplotlib

1 Year Subscription
Spring Framework Basics Video Course
Java SE 11 Developer (Upgrade) [1Z0-817]
Java SE 11 Programmer II [1Z0-816] Practice Tests
Java SE 11 Programmer I [1Z0-815] Practice Tests
Oracle Java Certification

Violin plots are a powerful tool for visualizing the distribution of data. They are similar to box plots, but instead of displaying only summary statistics, violin plots also show the kernel density of the data, providing more detail about the distribution.

Violin plots are especially useful when comparing multiple datasets, as they make it easier to see differences in the spread, skewness, and shape of the data.

Matplotlib provides the violinplot function to create violin plots. Additionally, Seaborn, a statistical data visualization library based on Matplotlib, offers more advanced violin plotting options.

In this tutorial, we’ll explore how to create and customize violin plots in Matplotlib, covering the basics, customizing appearance, adding summary statistics, comparing multiple distributions, using Seaborn for enhanced visualization, and more.

1. Basic Violin Plot

The violinplot function requires a list or array of data, which it uses to calculate and plot the distribution.

import matplotlib.pyplot as plt
import numpy as np

# Generate sample data
np.random.seed(0)
data = np.random.normal(0, 1, 100)

# Create a basic violin plot
plt.figure(figsize=(8, 6))
plt.violinplot(data)
plt.ylabel("Values")
plt.title("Basic Violin Plot")
plt.show()

In this example:

  • data is a sample of 100 random values drawn from a normal distribution.
  • plt.violinplot(data) creates a violin plot of the data.

2. Customizing the Violin Plot Appearance

You can customize the appearance of a violin plot using parameters like showmeans, showmedians, and showextrema.

# Customized violin plot
plt.figure(figsize=(8, 6))
plt.violinplot(data, showmeans=True, showmedians=True, showextrema=True)
plt.ylabel("Values")
plt.title("Violin Plot with Mean, Median, and Extremes")
plt.show()

In this example:

  • showmeans=True displays the mean value as a white dot.
  • showmedians=True shows the median as a line within the violin.
  • showextrema=True displays the extreme values (min and max) with small horizontal lines.

3. Horizontal Violin Plot

A horizontal violin plot can be created by adjusting the data orientation, which is useful for better readability when dealing with multiple categories.

# Horizontal violin plot
plt.figure(figsize=(8, 6))
plt.violinplot(data, vert=False, showmeans=True)
plt.xlabel("Values")
plt.title("Horizontal Violin Plot")
plt.show()

In this example:

  • vert=False displays the violin plot horizontally.

4. Violin Plot for Multiple Datasets

You can visualize multiple datasets side by side by passing a list of arrays to violinplot.

# Generate multiple datasets
data1 = np.random.normal(0, 1, 100)
data2 = np.random.normal(2, 1.5, 100)
data3 = np.random.normal(-2, 1, 100)

# Create violin plots for multiple datasets
plt.figure(figsize=(10, 6))
plt.violinplot([data1, data2, data3], showmedians=True)
plt.xticks([1, 2, 3], ['Dataset 1', 'Dataset 2', 'Dataset 3'])
plt.ylabel("Values")
plt.title("Violin Plot for Multiple Datasets")
plt.show()

In this example:

  • [data1, data2, data3] passes multiple datasets to plt.violinplot.
  • plt.xticks() labels each violin plot for clarity.

5. Grouped Violin Plot

A grouped violin plot is useful for comparing different categories within multiple groups. Here’s an example using np.random.normal to create data for each group.

# Generate data for grouped violin plot
np.random.seed(0)
data_grouped = [np.random.normal(loc, 0.5, 100) for loc in [1, 2, 3, 4]]

# Create grouped violin plot
plt.figure(figsize=(10, 6))
plt.violinplot(data_grouped, showmeans=True)
plt.xticks([1, 2, 3, 4], ['Group 1', 'Group 2', 'Group 3', 'Group 4'])
plt.xlabel("Groups")
plt.ylabel("Values")
plt.title("Grouped Violin Plot")
plt.show()

In this example:

  • data_grouped contains four arrays of random data centered at different locations.
  • plt.xticks() labels each group for clarity.

6. Customizing Violin Plot Colors

You can use the cmap argument to set a colormap for each violin plot or manually set colors by modifying the appearance of each violin.

# Customize violin plot colors manually
plt.figure(figsize=(10, 6))
parts = plt.violinplot([data1, data2, data3], showmeans=True)

# Customize colors for each part of the violin plot
colors = ['skyblue', 'lightgreen', 'salmon']
for i, pc in enumerate(parts['bodies']):
    pc.set_facecolor(colors[i])
    pc.set_edgecolor('black')
    pc.set_alpha(0.7)

plt.xticks([1, 2, 3], ['Dataset 1', 'Dataset 2', 'Dataset 3'])
plt.ylabel("Values")
plt.title("Violin Plot with Custom Colors")
plt.show()

In this example:

  • Each violin plot’s color is customized by accessing parts[‘bodies'], where each body corresponds to one violin.
  • set_facecolor, set_edgecolor, and set_alpha customize the fill color, border color, and transparency, respectively.

7. Adding Individual Data Points to Violin Plot

To show the underlying data points in the violin plot, you can use a scatter plot overlay to display each data point.

# Violin plot with individual data points
plt.figure(figsize=(10, 6))
plt.violinplot([data1, data2, data3], showmeans=True)

# Overlay data points
for i, dataset in enumerate([data1, data2, data3]):
    x = np.random.normal(i + 1, 0.04, size=len(dataset))  # jitter for x-axis
    plt.scatter(x, dataset, alpha=0.6, color='black', s=10)

plt.xticks([1, 2, 3], ['Dataset 1', 'Dataset 2', 'Dataset 3'])
plt.ylabel("Values")
plt.title("Violin Plot with Individual Data Points")
plt.show()

In this example:

  • np.random.normal(i + 1, 0.04, size=len(dataset)) adds a slight random offset to each data point’s x-coordinate to avoid overlap, creating a jitter effect.

8. Combining Box Plot and Violin Plot

You can combine a box plot with a violin plot to add summary statistics to the distribution visualization.

# Violin plot with overlayed box plot
plt.figure(figsize=(10, 6))
plt.violinplot([data1, data2, data3], showmeans=True, showextrema=True)
plt.boxplot([data1, data2, data3], widths=0.2, positions=[1, 2, 3])

plt.xticks([1, 2, 3], ['Dataset 1', 'Dataset 2', 'Dataset 3'])
plt.ylabel("Values")
plt.title("Violin Plot with Overlayed Box Plot")
plt.show()

In this example:

  • plt.boxplot() overlays a box plot on top of the violin plot, showing summary statistics like quartiles and medians.

9. Creating Violin Plots with Seaborn

Seaborn, a popular data visualization library built on Matplotlib, offers more advanced violin plotting options. You can use sns.violinplot for additional customization and automatic handling of categorical data.

import seaborn as sns

# Data preparation
data = np.concatenate([data1, data2, data3])
labels = ['Dataset 1'] * 100 + ['Dataset 2'] * 100 + ['Dataset 3'] * 100
df = {'Values': data, 'Category': labels}

# Create a violin plot using Seaborn
plt.figure(figsize=(10, 6))
sns.violinplot(x='Category', y='Values', data=df, palette='Pastel1', inner='box', linewidth=1.5)
plt.title("Violin Plot with Seaborn")
plt.show()

In this example:

  • sns.violinplot() uses a DataFrame-like format with x as the category and y as the values.
  • palette='Pastel1′ sets the color palette.
  • inner='box' adds a box plot inside the violin to show quartiles.

10. Violin Plot with Split Distribution

Seaborn’s violinplot also allows for split violins to compare two halves of the distribution, which is useful for comparing different groups side by side.

# Generate data for split violin plot
group1 = np.random.normal(0, 1, 100)
group2 = np.random.normal(1, 1, 100)
labels = ['Group A'] * 100 + ['Group B'] * 100
values = np.concatenate([group1, group2])
split_df = {'Values': values, 'Group':

 labels}

# Split violin plot
plt.figure(figsize=(8, 6))
sns.violinplot(x='Group', y='Values', data=split_df, split=True, inner='quartile', palette='Set2')
plt.title("Split Violin Plot with Seaborn")
plt.show()

In this example:

  • split=True splits each violin in half, allowing a comparison of the two groups on either side.
  • inner='quartile' shows quartiles within the split violins.

11. Violin Plot with Custom KDE Bandwidth

Seaborn’s violinplot allows for custom control over the kernel density estimation (KDE) bandwidth, which controls the smoothness of the distribution.

# Violin plot with custom KDE bandwidth
plt.figure(figsize=(10, 6))
sns.violinplot(x='Category', y='Values', data=df, bw=0.1, palette='Set3')
plt.title("Violin Plot with Custom KDE Bandwidth")
plt.show()

In this example:

  • bw=0.1 sets the KDE bandwidth to a smaller value, making the violin plot less smooth and showing more details in the data distribution.

12. Violin Plot with Subplots for Different Categories

You can use subplots to display multiple violin plots side by side for easy comparison.

fig, axes = plt.subplots(1, 3, figsize=(15, 6), sharey=True)

# Violin plot for each dataset
sns.violinplot(y=data1, ax=axes[0], color='lightblue')
axes[0].set_title("Dataset 1")

sns.violinplot(y=data2, ax=axes[1], color='lightgreen')
axes[1].set_title("Dataset 2")

sns.violinplot(y=data3, ax=axes[2], color='salmon')
axes[2].set_title("Dataset 3")

fig.suptitle("Violin Plots for Different Categories")
plt.show()

In this example:

  • plt.subplots(1, 3) creates three side-by-side subplots.
  • Each subplot contains a violin plot for a different dataset, allowing for an easy comparison.

Summary

In this tutorial, we covered how to create and customize violin plots in Matplotlib, using both Matplotlib’s violinplot and Seaborn’s violinplot functions for enhanced visualization:

  1. Basic Violin Plot to display data distribution.
  2. Customizing Appearance to add summary statistics like mean, median, and extremes.
  3. Horizontal Violin Plot for better readability.
  4. Violin Plot for Multiple Datasets for comparative analysis.
  5. Grouped Violin Plot for visualizing multiple categories.
  6. Custom Colors for Each Violin to enhance visualization.
  7. Adding Individual Data Points to show detailed data distribution.
  8. Combining Box Plot with Violin Plot for summary and density.
  9. Creating Violin Plots with Seaborn for advanced customization.
  10. Split Distribution in Seaborn for side-by-side comparisons.
  11. Custom KDE Bandwidth for adjusting the smoothness.
  12. Violin Plot with Subplots to compare different categories.

These examples demonstrate the versatility of violin plots for visualizing data distributions, providing more detail than box plots while offering rich customization options for effective data analysis and presentation.

You may also like

Leave a Comment

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More