Scatter plots are a powerful way to visualize the relationship between two variables, making them a staple in data analysis and visualization.
Matplotlib provides several ways to customize scatter plots, including color mapping, marker customization, adding labels, and using multiple data series.
In this tutorial, we’ll explore how to create and customize scatter plots in Matplotlib with examples covering the basics, adding color and size, plotting multiple series, adding trend lines, and more.
1. Basic Scatter Plot
A scatter plot can be created using the scatter function in Matplotlib. It requires x and y coordinates of points as inputs.
import matplotlib.pyplot as plt import numpy as np # Sample data x = np.random.rand(50) y = np.random.rand(50) # Create a basic scatter plot plt.figure(figsize=(8, 5)) plt.scatter(x, y, color='blue') plt.xlabel("X-axis") plt.ylabel("Y-axis") plt.title("Basic Scatter Plot") plt.show()
In this example:
- plt.scatter(x, y, color='blue') creates a scatter plot with blue points.
2. Adding Colors and Marker Styles
You can change the color and style of markers in a scatter plot by passing additional arguments such as c, marker, and s.
# Scatter plot with customized markers plt.figure(figsize=(8, 5)) plt.scatter(x, y, color='purple', marker='x', s=100) # 's' controls size of markers plt.xlabel("X-axis") plt.ylabel("Y-axis") plt.title("Scatter Plot with Custom Markers") plt.show()
In this example:
- color='purple' sets the color of the markers.
- marker='x' uses an “x” symbol for each point.
- s=100 controls the size of each marker.
3. Adding Color Based on Data Values
The c parameter in scatter allows you to color points based on a third variable, adding another layer of information to the plot.
# Sample data with color based on values values = np.random.rand(50) # A third variable for color mapping plt.figure(figsize=(8, 5)) scatter = plt.scatter(x, y, c=values, cmap='viridis', s=100) # Use colormap plt.colorbar(scatter) # Add color bar to show value scale plt.xlabel("X-axis") plt.ylabel("Y-axis") plt.title("Scatter Plot with Color Mapping") plt.show()
In this example:
- c=values colors the points based on the values array.
- cmap='viridis' applies a color map, and plt.colorbar() adds a color bar to the plot for reference.
4. Customizing Marker Size Based on Data Values
You can use the s parameter to vary the marker size according to a data variable, making it possible to visualize an additional dimension.
# Sample data with size based on values sizes = values * 1000 # Scale the size values plt.figure(figsize=(8, 5)) plt.scatter(x, y, c=values, cmap='cool', s=sizes, alpha=0.6, edgecolors='black') plt.colorbar(label='Color Intensity') plt.xlabel("X-axis") plt.ylabel("Y-axis") plt.title("Scatter Plot with Variable Marker Sizes") plt.show()
In this example:
- s=sizes scales each marker size according to sizes.
- alpha=0.6 makes markers slightly transparent, and edgecolors='black' adds a black border to each marker for better visibility.
5. Plotting Multiple Series in a Scatter Plot
Scatter plots can include multiple series of data by calling scatter multiple times with different data and colors.
# Data for multiple series x1, y1 = np.random.rand(30), np.random.rand(30) x2, y2 = np.random.rand(30), np.random.rand(30) plt.figure(figsize=(8, 5)) plt.scatter(x1, y1, color='blue', label='Series 1') plt.scatter(x2, y2, color='orange', label='Series 2') plt.xlabel("X-axis") plt.ylabel("Y-axis") plt.title("Scatter Plot with Multiple Series") plt.legend() plt.show()
In this example:
- Two series (x1, y1 and x2, y2) are plotted using different colors and labels.
- plt.legend() displays a legend to differentiate the series.
6. Adding Annotations to Points
Annotations help identify specific points in a scatter plot. The plt.text() function can be used to label individual points.
# Scatter plot with annotations plt.figure(figsize=(8, 5)) plt.scatter(x, y, color='purple', s=100) # Annotate specific points for i in range(len(x)): plt.text(x[i] + 0.02, y[i] + 0.02, f'P{i+1}', fontsize=9, ha='center') plt.xlabel("X-axis") plt.ylabel("Y-axis") plt.title("Scatter Plot with Annotations") plt.show()
In this example:
- plt.text() is used to label each point with an identifier (P1, P2, etc.), offset slightly from the point for visibility.
7. Adding a Trend Line
A trend line can be added to show the general direction of the data. You can use np.polyfit() to fit a line and plt.plot() to add it to the scatter plot.
# Generate data with a trend x = np.linspace(0, 10, 50) y = 2 * x + np.random.normal(0, 2, size=x.shape) plt.figure(figsize=(8, 5)) plt.scatter(x, y, color='darkcyan', label='Data Points') # Calculate and plot a trend line z = np.polyfit(x, y, 1) # 1st degree polynomial (linear fit) p = np.poly1d(z) plt.plot(x, p(x), color='red', linestyle='--', label='Trend Line') plt.xlabel("X-axis") plt.ylabel("Y-axis") plt.title("Scatter Plot with Trend Line") plt.legend() plt.show()
In this example:
- np.polyfit(x, y, 1) fits a linear regression (trend line) to the data.
- plt.plot(x, p(x), color='red', linestyle='–‘) plots the trend line in red with a dashed style.
8. Adding Grid and Adjusting Transparency
Adding a grid and adjusting the transparency of points can make scatter plots easier to read, especially when dealing with overlapping points.
# Scatter plot with grid and transparency plt.figure(figsize=(8, 5)) plt.scatter(x, y, color='teal', alpha=0.6, edgecolor='black', s=100) plt.grid(True) plt.xlabel("X-axis") plt.ylabel("Y-axis") plt.title("Scatter Plot with Grid and Transparency") plt.show()
In this example:
- alpha=0.6 makes the points semi-transparent, helping reduce overlap in dense regions.
- plt.grid(True) adds a grid for better readability.
9. Scatter Plot with Logarithmic Scale
Scatter plots with logarithmic scales on the axes can be useful when data spans several orders of magnitude.
# Sample data for logarithmic scale x = np.logspace(0, 2, 50) # Values from 10^0 to 10^2 y = np.random.rand(50) * 100 plt.figure(figsize=(8, 5)) plt.scatter(x, y, color='dodgerblue', s=100) plt.xscale('log') plt.yscale('log') plt.xlabel("X-axis (log scale)") plt.ylabel("Y-axis (log scale)") plt.title("Scatter Plot with Logarithmic Scale") plt.show()
In this example:
- plt.xscale(‘log') and plt.yscale(‘log') set logarithmic scales on the x and y axes.
10. Scatter Plot with Subplots
Multiple scatter plots can be shown in subplots to compare different datasets or visualizations.
# Create data for subplots x1, y1 = np.random.rand(30), np.random.rand(30) x2, y2 = np.random.rand(30), np.random.rand(30) fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5)) # First subplot ax1.scatter(x1, y1, color='blue') ax1.set_title("Scatter Plot 1") ax1.set_xlabel("X-axis") ax1.set_ylabel("Y-axis") # Second subplot ax2.scatter(x2, y2, color='green') ax2.set_title("Scatter Plot 2") ax2.set_xlabel("X-axis") ax2.set_ylabel("Y-axis") plt.suptitle("Scatter Plots in Subplots") plt.show()
In this example:
- plt.subplots(1, 2) creates two subplots in a row.
- Each axis (ax1 and ax2) contains a separate scatter plot with different data and colors.
11. Adding Error Bars to a Scatter Plot
Error bars can be added to each point to show variability or uncertainty in the data.
# Sample data with error values x = np.linspace(0, 10, 20) y = 2 * x + np.random.normal(0, 1, size=x.shape) x_err = np.random.normal(0.2, 0.05, size=x.shape) y_err = np.random.normal(0.5, 0.1, size=y.shape) plt.figure(figsize=(8, 5)) plt.errorbar(x, y, xerr=x_err, yerr=y_err, fmt='o', color='darkred', ecolor='gray', elinewidth=1, capsize=3) plt.xlabel("X-axis") plt.ylabel("Y-axis") plt.title("Scatter Plot with Error Bars") plt.show()
In this example:
- xerr=x_err and yerr=y_err add horizontal and vertical error bars.
- fmt='o' specifies the marker style, ecolor='gray' sets the error bar color, and capsize=3 adds caps to the error bars.
Summary
In this tutorial, we covered how to create and customize scatter plots in Matplotlib:
- Basic Scatter Plot to display x and y values.
- Adding Colors and Marker Styles for visual variety.
- Color Mapping Based on Data Values to add more dimensions.
- Customizing Marker Size based on data.
- Multiple Data Series in a single plot.
- Annotations to label specific points.
- Adding a Trend Line to show data trends.
- Grid and Transparency to improve readability.
- Logarithmic Scale for wide-ranging data.
- Scatter Plot in Subplots for side-by-side comparisons.
- Error Bars to indicate variability.
These techniques provide a comprehensive toolkit for creating effective and informative scatter plots, allowing you to customize and add depth to your data visualizations in Matplotlib.