Home » NumPy statistical functions with code examples

NumPy statistical functions with code examples

Oracle Java Certification
1 Year Subscription
Spring Framework Basics Video Course
Java SE 11 Developer (Upgrade) [1Z0-817]
Java SE 11 Programmer II [1Z0-816] Practice Tests
Java SE 11 Programmer I [1Z0-815] Practice Tests

NumPy provides a wide range of statistical functions for analyzing data.

These functions operate on arrays and can compute statistics like mean, median, variance, standard deviation, minimum, maximum, and more.

They’re essential for data analysis and numerical computations.

Importing NumPy and Creating Arrays

First, let’s import NumPy and create a sample array to work with:

import numpy as np

# Example array
data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

1. np.mean – Mean (Average)

The mean of an array is the sum of all elements divided by the number of elements.


mean_value = np.mean(data)
print("Mean:", mean_value)


Mean: 5.5

2. np.median – Median

The median is the middle value in a sorted array. If the array length is even, the median is the average of the two middle numbers.


median_value = np.median(data)
print("Median:", median_value)


Median: 5.5

3. np.std – Standard Deviation

The standard deviation measures the amount of variation or dispersion in a set of values. A low standard deviation indicates that values are close to the mean, while a high standard deviation indicates a wide range of values.


std_dev = np.std(data)
print("Standard Deviation:", std_dev)


Standard Deviation: 2.8722813232690143

4. np.var – Variance

Variance is the average of the squared differences from the mean. It’s a measure of how spread out the values are.


variance = np.var(data)
print("Variance:", variance)


Variance: 8.25

5. np.min and np.max – Minimum and Maximum

These functions return the minimum and maximum values in an array.


min_value = np.min(data)
max_value = np.max(data)
print("Minimum:", min_value)
print("Maximum:", max_value)


Minimum: 1
Maximum: 10

6. np.percentile – Percentile

The percentile function finds the value below which a given percentage of observations in a group of observations falls. For example, the 25th percentile is the value below which 25% of the observations fall.


percentile_25 = np.percentile(data, 25)
percentile_50 = np.percentile(data, 50)  # Equivalent to median
percentile_75 = np.percentile(data, 75)

print("25th Percentile:", percentile_25)
print("50th Percentile (Median):", percentile_50)
print("75th Percentile:", percentile_75)


25th Percentile: 3.25
50th Percentile (Median): 5.5
75th Percentile: 7.75

7. np.quantile – Quantile

Quantiles are similar to percentiles. While percentiles are expressed as percentages, quantiles are expressed as fractions (0.25, 0.5, 0.75, etc.).


quantile_25 = np.quantile(data, 0.25)
quantile_50 = np.quantile(data, 0.5)   # Equivalent to median
quantile_75 = np.quantile(data, 0.75)

print("25th Quantile:", quantile_25)
print("50th Quantile (Median):", quantile_50)
print("75th Quantile:", quantile_75)


25th Quantile: 3.25
50th Quantile (Median): 5.5
75th Quantile: 7.75

8. np.sum – Sum

The sum function calculates the sum of all elements in the array.


total_sum = np.sum(data)
print("Sum:", total_sum)


Sum: 55

9. np.prod – Product

The product function calculates the product of all elements in the array.


total_product = np.prod(data)
print("Product:", total_product)


Product: 3628800

10. np.cumsum – Cumulative Sum

The cumulative sum function returns an array where each element is the sum of all previous elements in the input array up to that position.


cumulative_sum = np.cumsum(data)
print("Cumulative Sum:", cumulative_sum)


Cumulative Sum: [ 1  3  6 10 15 21 28 36 45 55]

11. np.cumprod – Cumulative Product

The cumulative product function returns an array where each element is the product of all previous elements up to that position.


cumulative_product = np.cumprod(data)
print("Cumulative Product:", cumulative_product)


Cumulative Product: [      1       2       6      24     120     720    5040   40320  362880 3628800]

12. np.ptp – Peak-to-Peak (Range)

The peak-to-peak function calculates the range of values (maximum – minimum) in the array.


range_value = np.ptp(data)
print("Range (Peak-to-Peak):", range_value)


Range (Peak-to-Peak): 9

13. np.mean, np.median, np.var on Multidimensional Arrays

These statistical functions can also be applied to multidimensional arrays. By specifying the axis parameter, you can calculate statistics along specific dimensions.


# Create a 2D array
data_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Mean across columns (axis=0)
mean_cols = np.mean(data_2d, axis=0)
print("Mean across columns:", mean_cols)

# Mean across rows (axis=1)
mean_rows = np.mean(data_2d, axis=1)
print("Mean across rows:", mean_rows)

# Variance across columns
variance_cols = np.var(data_2d, axis=0)
print("Variance across columns:", variance_cols)

# Median across rows
median_rows = np.median(data_2d, axis=1)
print("Median across rows:", median_rows)


Mean across columns: [4. 5. 6.]
Mean across rows: [2. 5. 8.]
Variance across columns: [6. 6. 6.]
Median across rows: [2. 5. 8.]

14. np.corrcoef – Correlation Coefficient

The correlation coefficient function calculates the correlation matrix, measuring the linear relationship between variables in a 2D array.


# Create two sample arrays
data_x = np.array([1, 2, 3, 4, 5])
data_y = np.array([5, 4, 3, 2, 1])

# Calculate correlation coefficient matrix
correlation = np.corrcoef(data_x, data_y)
print("Correlation Coefficient Matrix:\n", correlation)


Correlation Coefficient Matrix:
[[ 1. -1.]
 [-1.  1.]]


  • A value of 1 or -1 indicates a perfect linear relationship, with -1 showing an inverse correlation.

15. np.histogram – Histogram

The histogram function computes the histogram of the input data, providing bin counts and edges, which is useful for data distribution analysis.


# Generate random data
data_random = np.random.randint(0, 100, size=50)

# Compute histogram
counts, bin_edges = np.histogram(data_random, bins=5)
print("Bin counts:", counts)
print("Bin edges:", bin_edges)


  • np.histogram splits the data into bins and counts the number of elements in each bin.
  • bins=5 specifies that the data should be divided into 5 intervals.

Summary of Common NumPy Statistical Functions

Function Description
np.mean Calculates the mean (average) of elements
np.median Finds the median of elements
np.std Calculates the standard deviation
np.var Calculates the variance
np.min, np.max Finds the minimum and maximum
np.percentile Calculates specified percentiles
np.quantile Calculates specified quantiles
np.sum Calculates the sum of elements
np.prod Calculates the product of elements
np.cumsum Computes the cumulative sum
np.cumprod Computes the cumulative product
np.ptp Calculates the peak-to-peak range
np.corrcoef Calculates the correlation coefficient
np.histogram Computes the histogram

NumPy’s statistical functions provide efficient ways to analyze data, making it a powerful tool for data science, analytics, and scientific computing.

These functions operate on arrays and can be applied across different axes, making them flexible for multidimensional data analysis.


You may also like

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More