NumPy provides a wide range of statistical functions for analyzing data.
These functions operate on arrays and can compute statistics like mean, median, variance, standard deviation, minimum, maximum, and more.
They’re essential for data analysis and numerical computations.
Importing NumPy and Creating Arrays
First, let’s import NumPy and create a sample array to work with:
import numpy as np # Example array data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
1. np.mean – Mean (Average)
The mean of an array is the sum of all elements divided by the number of elements.
Example
mean_value = np.mean(data) print("Mean:", mean_value)
Output
Mean: 5.5
2. np.median – Median
The median is the middle value in a sorted array. If the array length is even, the median is the average of the two middle numbers.
Example
median_value = np.median(data) print("Median:", median_value)
Output
Median: 5.5
3. np.std – Standard Deviation
The standard deviation measures the amount of variation or dispersion in a set of values. A low standard deviation indicates that values are close to the mean, while a high standard deviation indicates a wide range of values.
Example
std_dev = np.std(data) print("Standard Deviation:", std_dev)
Output
Standard Deviation: 2.8722813232690143
4. np.var – Variance
Variance is the average of the squared differences from the mean. It’s a measure of how spread out the values are.
Example
variance = np.var(data) print("Variance:", variance)
Output
Variance: 8.25
5. np.min and np.max – Minimum and Maximum
These functions return the minimum and maximum values in an array.
Example
min_value = np.min(data) max_value = np.max(data) print("Minimum:", min_value) print("Maximum:", max_value)
Output
Minimum: 1 Maximum: 10
6. np.percentile – Percentile
The percentile function finds the value below which a given percentage of observations in a group of observations falls. For example, the 25th percentile is the value below which 25% of the observations fall.
Example
percentile_25 = np.percentile(data, 25) percentile_50 = np.percentile(data, 50) # Equivalent to median percentile_75 = np.percentile(data, 75) print("25th Percentile:", percentile_25) print("50th Percentile (Median):", percentile_50) print("75th Percentile:", percentile_75)
Output
25th Percentile: 3.25 50th Percentile (Median): 5.5 75th Percentile: 7.75
7. np.quantile – Quantile
Quantiles are similar to percentiles. While percentiles are expressed as percentages, quantiles are expressed as fractions (0.25, 0.5, 0.75, etc.).
Example
quantile_25 = np.quantile(data, 0.25) quantile_50 = np.quantile(data, 0.5) # Equivalent to median quantile_75 = np.quantile(data, 0.75) print("25th Quantile:", quantile_25) print("50th Quantile (Median):", quantile_50) print("75th Quantile:", quantile_75)
Output
25th Quantile: 3.25 50th Quantile (Median): 5.5 75th Quantile: 7.75
8. np.sum – Sum
The sum function calculates the sum of all elements in the array.
Example
total_sum = np.sum(data) print("Sum:", total_sum)
Output
Sum: 55
9. np.prod – Product
The product function calculates the product of all elements in the array.
Example
total_product = np.prod(data) print("Product:", total_product)
Output
Product: 3628800
10. np.cumsum – Cumulative Sum
The cumulative sum function returns an array where each element is the sum of all previous elements in the input array up to that position.
Example
cumulative_sum = np.cumsum(data) print("Cumulative Sum:", cumulative_sum)
Output
Cumulative Sum: [ 1 3 6 10 15 21 28 36 45 55]
11. np.cumprod – Cumulative Product
The cumulative product function returns an array where each element is the product of all previous elements up to that position.
Example
cumulative_product = np.cumprod(data) print("Cumulative Product:", cumulative_product)
Output
Cumulative Product: [ 1 2 6 24 120 720 5040 40320 362880 3628800]
12. np.ptp – Peak-to-Peak (Range)
The peak-to-peak function calculates the range of values (maximum – minimum) in the array.
Example
range_value = np.ptp(data) print("Range (Peak-to-Peak):", range_value)
Output
Range (Peak-to-Peak): 9
13. np.mean, np.median, np.var on Multidimensional Arrays
These statistical functions can also be applied to multidimensional arrays. By specifying the axis parameter, you can calculate statistics along specific dimensions.
Example
# Create a 2D array data_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) # Mean across columns (axis=0) mean_cols = np.mean(data_2d, axis=0) print("Mean across columns:", mean_cols) # Mean across rows (axis=1) mean_rows = np.mean(data_2d, axis=1) print("Mean across rows:", mean_rows) # Variance across columns variance_cols = np.var(data_2d, axis=0) print("Variance across columns:", variance_cols) # Median across rows median_rows = np.median(data_2d, axis=1) print("Median across rows:", median_rows)
Output
Mean across columns: [4. 5. 6.] Mean across rows: [2. 5. 8.] Variance across columns: [6. 6. 6.] Median across rows: [2. 5. 8.]
14. np.corrcoef – Correlation Coefficient
The correlation coefficient function calculates the correlation matrix, measuring the linear relationship between variables in a 2D array.
Example
# Create two sample arrays data_x = np.array([1, 2, 3, 4, 5]) data_y = np.array([5, 4, 3, 2, 1]) # Calculate correlation coefficient matrix correlation = np.corrcoef(data_x, data_y) print("Correlation Coefficient Matrix:\n", correlation)
Output
Correlation Coefficient Matrix: [[ 1. -1.] [-1. 1.]]
Explanation
- A value of 1 or -1 indicates a perfect linear relationship, with -1 showing an inverse correlation.
15. np.histogram – Histogram
The histogram function computes the histogram of the input data, providing bin counts and edges, which is useful for data distribution analysis.
Example
# Generate random data data_random = np.random.randint(0, 100, size=50) # Compute histogram counts, bin_edges = np.histogram(data_random, bins=5) print("Bin counts:", counts) print("Bin edges:", bin_edges)
Explanation
- np.histogram splits the data into bins and counts the number of elements in each bin.
- bins=5 specifies that the data should be divided into 5 intervals.
Summary of Common NumPy Statistical Functions
Function | Description |
---|---|
np.mean | Calculates the mean (average) of elements |
np.median | Finds the median of elements |
np.std | Calculates the standard deviation |
np.var | Calculates the variance |
np.min, np.max | Finds the minimum and maximum |
np.percentile | Calculates specified percentiles |
np.quantile | Calculates specified quantiles |
np.sum | Calculates the sum of elements |
np.prod | Calculates the product of elements |
np.cumsum | Computes the cumulative sum |
np.cumprod | Computes the cumulative product |
np.ptp | Calculates the peak-to-peak range |
np.corrcoef | Calculates the correlation coefficient |
np.histogram | Computes the histogram |
NumPy’s statistical functions provide efficient ways to analyze data, making it a powerful tool for data science, analytics, and scientific computing.
These functions operate on arrays and can be applied across different axes, making them flexible for multidimensional data analysis.