Home » NumPy statistical functions with code examples

NumPy statistical functions with code examples

Java SE 11 Programmer I [1Z0-815] Practice Tests
Oracle Java Certification
1 Year Subscription
Spring Framework Basics Video Course
Java SE 11 Programmer II [1Z0-816] Practice Tests
Java SE 11 Developer (Upgrade) [1Z0-817]

NumPy provides a wide range of statistical functions for analyzing data.

These functions operate on arrays and can compute statistics like mean, median, variance, standard deviation, minimum, maximum, and more.

They’re essential for data analysis and numerical computations.

Importing NumPy and Creating Arrays

First, let’s import NumPy and create a sample array to work with:

import numpy as np

# Example array
data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

1. np.mean – Mean (Average)

The mean of an array is the sum of all elements divided by the number of elements.

Example

mean_value = np.mean(data)
print("Mean:", mean_value)

Output

Mean: 5.5

2. np.median – Median

The median is the middle value in a sorted array. If the array length is even, the median is the average of the two middle numbers.

Example

median_value = np.median(data)
print("Median:", median_value)

Output

Median: 5.5

3. np.std – Standard Deviation

The standard deviation measures the amount of variation or dispersion in a set of values. A low standard deviation indicates that values are close to the mean, while a high standard deviation indicates a wide range of values.

Example

std_dev = np.std(data)
print("Standard Deviation:", std_dev)

Output

Standard Deviation: 2.8722813232690143

4. np.var – Variance

Variance is the average of the squared differences from the mean. It’s a measure of how spread out the values are.

Example

variance = np.var(data)
print("Variance:", variance)

Output

Variance: 8.25

5. np.min and np.max – Minimum and Maximum

These functions return the minimum and maximum values in an array.

Example

min_value = np.min(data)
max_value = np.max(data)
print("Minimum:", min_value)
print("Maximum:", max_value)

Output

Minimum: 1
Maximum: 10

6. np.percentile – Percentile

The percentile function finds the value below which a given percentage of observations in a group of observations falls. For example, the 25th percentile is the value below which 25% of the observations fall.

Example

percentile_25 = np.percentile(data, 25)
percentile_50 = np.percentile(data, 50)  # Equivalent to median
percentile_75 = np.percentile(data, 75)

print("25th Percentile:", percentile_25)
print("50th Percentile (Median):", percentile_50)
print("75th Percentile:", percentile_75)

Output

25th Percentile: 3.25
50th Percentile (Median): 5.5
75th Percentile: 7.75

7. np.quantile – Quantile

Quantiles are similar to percentiles. While percentiles are expressed as percentages, quantiles are expressed as fractions (0.25, 0.5, 0.75, etc.).

Example

quantile_25 = np.quantile(data, 0.25)
quantile_50 = np.quantile(data, 0.5)   # Equivalent to median
quantile_75 = np.quantile(data, 0.75)

print("25th Quantile:", quantile_25)
print("50th Quantile (Median):", quantile_50)
print("75th Quantile:", quantile_75)

Output

25th Quantile: 3.25
50th Quantile (Median): 5.5
75th Quantile: 7.75

8. np.sum – Sum

The sum function calculates the sum of all elements in the array.

Example

total_sum = np.sum(data)
print("Sum:", total_sum)

Output

Sum: 55

9. np.prod – Product

The product function calculates the product of all elements in the array.

Example

total_product = np.prod(data)
print("Product:", total_product)

Output

Product: 3628800

10. np.cumsum – Cumulative Sum

The cumulative sum function returns an array where each element is the sum of all previous elements in the input array up to that position.

Example

cumulative_sum = np.cumsum(data)
print("Cumulative Sum:", cumulative_sum)

Output

Cumulative Sum: [ 1  3  6 10 15 21 28 36 45 55]

11. np.cumprod – Cumulative Product

The cumulative product function returns an array where each element is the product of all previous elements up to that position.

Example

cumulative_product = np.cumprod(data)
print("Cumulative Product:", cumulative_product)

Output

Cumulative Product: [      1       2       6      24     120     720    5040   40320  362880 3628800]

12. np.ptp – Peak-to-Peak (Range)

The peak-to-peak function calculates the range of values (maximum – minimum) in the array.

Example

range_value = np.ptp(data)
print("Range (Peak-to-Peak):", range_value)

Output

Range (Peak-to-Peak): 9

13. np.mean, np.median, np.var on Multidimensional Arrays

These statistical functions can also be applied to multidimensional arrays. By specifying the axis parameter, you can calculate statistics along specific dimensions.

Example

# Create a 2D array
data_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Mean across columns (axis=0)
mean_cols = np.mean(data_2d, axis=0)
print("Mean across columns:", mean_cols)

# Mean across rows (axis=1)
mean_rows = np.mean(data_2d, axis=1)
print("Mean across rows:", mean_rows)

# Variance across columns
variance_cols = np.var(data_2d, axis=0)
print("Variance across columns:", variance_cols)

# Median across rows
median_rows = np.median(data_2d, axis=1)
print("Median across rows:", median_rows)

Output

Mean across columns: [4. 5. 6.]
Mean across rows: [2. 5. 8.]
Variance across columns: [6. 6. 6.]
Median across rows: [2. 5. 8.]

14. np.corrcoef – Correlation Coefficient

The correlation coefficient function calculates the correlation matrix, measuring the linear relationship between variables in a 2D array.

Example

# Create two sample arrays
data_x = np.array([1, 2, 3, 4, 5])
data_y = np.array([5, 4, 3, 2, 1])

# Calculate correlation coefficient matrix
correlation = np.corrcoef(data_x, data_y)
print("Correlation Coefficient Matrix:\n", correlation)

Output

Correlation Coefficient Matrix:
[[ 1. -1.]
 [-1.  1.]]

Explanation

  • A value of 1 or -1 indicates a perfect linear relationship, with -1 showing an inverse correlation.

15. np.histogram – Histogram

The histogram function computes the histogram of the input data, providing bin counts and edges, which is useful for data distribution analysis.

Example

# Generate random data
data_random = np.random.randint(0, 100, size=50)

# Compute histogram
counts, bin_edges = np.histogram(data_random, bins=5)
print("Bin counts:", counts)
print("Bin edges:", bin_edges)

Explanation

  • np.histogram splits the data into bins and counts the number of elements in each bin.
  • bins=5 specifies that the data should be divided into 5 intervals.

Summary of Common NumPy Statistical Functions

Function Description
np.mean Calculates the mean (average) of elements
np.median Finds the median of elements
np.std Calculates the standard deviation
np.var Calculates the variance
np.min, np.max Finds the minimum and maximum
np.percentile Calculates specified percentiles
np.quantile Calculates specified quantiles
np.sum Calculates the sum of elements
np.prod Calculates the product of elements
np.cumsum Computes the cumulative sum
np.cumprod Computes the cumulative product
np.ptp Calculates the peak-to-peak range
np.corrcoef Calculates the correlation coefficient
np.histogram Computes the histogram

NumPy’s statistical functions provide efficient ways to analyze data, making it a powerful tool for data science, analytics, and scientific computing.

These functions operate on arrays and can be applied across different axes, making them flexible for multidimensional data analysis.

 

You may also like

Leave a Comment

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More