Home » Tutorial on Creating Scatter Plots in Matplotlib

Tutorial on Creating Scatter Plots in Matplotlib

Oracle Java Certification
Java SE 11 Developer (Upgrade) [1Z0-817]
Java SE 11 Programmer I [1Z0-815] Practice Tests
Java SE 11 Programmer II [1Z0-816] Practice Tests
1 Year Subscription
Spring Framework Basics Video Course

Scatter plots are a powerful way to visualize the relationship between two variables, making them a staple in data analysis and visualization.

Matplotlib provides several ways to customize scatter plots, including color mapping, marker customization, adding labels, and using multiple data series.

In this tutorial, we’ll explore how to create and customize scatter plots in Matplotlib with examples covering the basics, adding color and size, plotting multiple series, adding trend lines, and more.

1. Basic Scatter Plot

A scatter plot can be created using the scatter function in Matplotlib. It requires x and y coordinates of points as inputs.

import matplotlib.pyplot as plt
import numpy as np

# Sample data
x = np.random.rand(50)
y = np.random.rand(50)

# Create a basic scatter plot
plt.figure(figsize=(8, 5))
plt.scatter(x, y, color='blue')
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.title("Basic Scatter Plot")
plt.show()

In this example:

  • plt.scatter(x, y, color='blue') creates a scatter plot with blue points.

2. Adding Colors and Marker Styles

You can change the color and style of markers in a scatter plot by passing additional arguments such as c, marker, and s.

# Scatter plot with customized markers
plt.figure(figsize=(8, 5))
plt.scatter(x, y, color='purple', marker='x', s=100)  # 's' controls size of markers
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.title("Scatter Plot with Custom Markers")
plt.show()

In this example:

  • color='purple' sets the color of the markers.
  • marker='x' uses an “x” symbol for each point.
  • s=100 controls the size of each marker.

3. Adding Color Based on Data Values

The c parameter in scatter allows you to color points based on a third variable, adding another layer of information to the plot.

# Sample data with color based on values
values = np.random.rand(50)  # A third variable for color mapping

plt.figure(figsize=(8, 5))
scatter = plt.scatter(x, y, c=values, cmap='viridis', s=100)  # Use colormap
plt.colorbar(scatter)  # Add color bar to show value scale
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.title("Scatter Plot with Color Mapping")
plt.show()

In this example:

  • c=values colors the points based on the values array.
  • cmap='viridis' applies a color map, and plt.colorbar() adds a color bar to the plot for reference.

4. Customizing Marker Size Based on Data Values

You can use the s parameter to vary the marker size according to a data variable, making it possible to visualize an additional dimension.

# Sample data with size based on values
sizes = values * 1000  # Scale the size values

plt.figure(figsize=(8, 5))
plt.scatter(x, y, c=values, cmap='cool', s=sizes, alpha=0.6, edgecolors='black')
plt.colorbar(label='Color Intensity')
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.title("Scatter Plot with Variable Marker Sizes")
plt.show()

In this example:

  • s=sizes scales each marker size according to sizes.
  • alpha=0.6 makes markers slightly transparent, and edgecolors='black' adds a black border to each marker for better visibility.

5. Plotting Multiple Series in a Scatter Plot

Scatter plots can include multiple series of data by calling scatter multiple times with different data and colors.

# Data for multiple series
x1, y1 = np.random.rand(30), np.random.rand(30)
x2, y2 = np.random.rand(30), np.random.rand(30)

plt.figure(figsize=(8, 5))
plt.scatter(x1, y1, color='blue', label='Series 1')
plt.scatter(x2, y2, color='orange', label='Series 2')
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.title("Scatter Plot with Multiple Series")
plt.legend()
plt.show()

In this example:

  • Two series (x1, y1 and x2, y2) are plotted using different colors and labels.
  • plt.legend() displays a legend to differentiate the series.

6. Adding Annotations to Points

Annotations help identify specific points in a scatter plot. The plt.text() function can be used to label individual points.

# Scatter plot with annotations
plt.figure(figsize=(8, 5))
plt.scatter(x, y, color='purple', s=100)

# Annotate specific points
for i in range(len(x)):
    plt.text(x[i] + 0.02, y[i] + 0.02, f'P{i+1}', fontsize=9, ha='center')

plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.title("Scatter Plot with Annotations")
plt.show()

In this example:

  • plt.text() is used to label each point with an identifier (P1, P2, etc.), offset slightly from the point for visibility.

7. Adding a Trend Line

A trend line can be added to show the general direction of the data. You can use np.polyfit() to fit a line and plt.plot() to add it to the scatter plot.

# Generate data with a trend
x = np.linspace(0, 10, 50)
y = 2 * x + np.random.normal(0, 2, size=x.shape)

plt.figure(figsize=(8, 5))
plt.scatter(x, y, color='darkcyan', label='Data Points')

# Calculate and plot a trend line
z = np.polyfit(x, y, 1)  # 1st degree polynomial (linear fit)
p = np.poly1d(z)
plt.plot(x, p(x), color='red', linestyle='--', label='Trend Line')

plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.title("Scatter Plot with Trend Line")
plt.legend()
plt.show()

In this example:

  • np.polyfit(x, y, 1) fits a linear regression (trend line) to the data.
  • plt.plot(x, p(x), color='red', linestyle='–‘) plots the trend line in red with a dashed style.

8. Adding Grid and Adjusting Transparency

Adding a grid and adjusting the transparency of points can make scatter plots easier to read, especially when dealing with overlapping points.

# Scatter plot with grid and transparency
plt.figure(figsize=(8, 5))
plt.scatter(x, y, color='teal', alpha=0.6, edgecolor='black', s=100)
plt.grid(True)
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.title("Scatter Plot with Grid and Transparency")
plt.show()

In this example:

  • alpha=0.6 makes the points semi-transparent, helping reduce overlap in dense regions.
  • plt.grid(True) adds a grid for better readability.

9. Scatter Plot with Logarithmic Scale

Scatter plots with logarithmic scales on the axes can be useful when data spans several orders of magnitude.

# Sample data for logarithmic scale
x = np.logspace(0, 2, 50)  # Values from 10^0 to 10^2
y = np.random.rand(50) * 100

plt.figure(figsize=(8, 5))
plt.scatter(x, y, color='dodgerblue', s=100)
plt.xscale('log')
plt.yscale('log')
plt.xlabel("X-axis (log scale)")
plt.ylabel("Y-axis (log scale)")
plt.title("Scatter Plot with Logarithmic Scale")
plt.show()

In this example:

  • plt.xscale(‘log') and plt.yscale(‘log') set logarithmic scales on the x and y axes.

10. Scatter Plot with Subplots

Multiple scatter plots can be shown in subplots to compare different datasets or visualizations.

# Create data for subplots
x1, y1 = np.random.rand(30), np.random.rand(30)
x2, y2 = np.random.rand(30), np.random.rand(30)

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))

# First subplot
ax1.scatter(x1, y1, color='blue')
ax1.set_title("Scatter Plot 1")
ax1.set_xlabel("X-axis")
ax1.set_ylabel("Y-axis")

# Second subplot
ax2.scatter(x2, y2, color='green')
ax2.set_title("Scatter Plot 2")
ax2.set_xlabel("X-axis")
ax2.set_ylabel("Y-axis")

plt.suptitle("Scatter Plots in Subplots")
plt.show()

In this example:

  • plt.subplots(1, 2) creates two subplots in a row.
  • Each axis (ax1 and ax2) contains a separate scatter plot with different data and colors.

11. Adding Error Bars to a Scatter Plot

Error bars can be added to each point to show variability or uncertainty in the data.


# Sample data with error values
x = np.linspace(0, 10, 20)
y = 2 * x + np.random.normal(0, 1, size=x.shape)
x_err = np.random.normal(0.2, 0.05, size=x.shape)
y_err = np.random.normal(0.5, 0.1, size=y.shape)

plt.figure(figsize=(8, 5))
plt.errorbar(x, y, xerr=x_err, yerr=y_err, fmt='o', color='darkred', ecolor='gray', elinewidth=1, capsize=3)
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.title("Scatter Plot with Error Bars")
plt.show()

In this example:

  • xerr=x_err and yerr=y_err add horizontal and vertical error bars.
  • fmt='o' specifies the marker style, ecolor='gray' sets the error bar color, and capsize=3 adds caps to the error bars.

Summary

In this tutorial, we covered how to create and customize scatter plots in Matplotlib:

  1. Basic Scatter Plot to display x and y values.
  2. Adding Colors and Marker Styles for visual variety.
  3. Color Mapping Based on Data Values to add more dimensions.
  4. Customizing Marker Size based on data.
  5. Multiple Data Series in a single plot.
  6. Annotations to label specific points.
  7. Adding a Trend Line to show data trends.
  8. Grid and Transparency to improve readability.
  9. Logarithmic Scale for wide-ranging data.
  10. Scatter Plot in Subplots for side-by-side comparisons.
  11. Error Bars to indicate variability.

These techniques provide a comprehensive toolkit for creating effective and informative scatter plots, allowing you to customize and add depth to your data visualizations in Matplotlib.

You may also like

Leave a Comment

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More