NumPy Structured Arrays (also called record arrays) are arrays with fields, where each field has a name, data type, and shape.
Structured arrays allow for heterogeneous data, making them useful for working with data records (like rows in a spreadsheet) where each field can have a different data type.
Structured arrays in NumPy are great for managing complex data types in an organized manner. Let’s go through how to create, access, and manipulate them.
1. Importing NumPy
import numpy as np
2. Creating Structured Arrays
Structured arrays are created by defining a dtype with field names, data types, and optionally, the shape of each field. Here are some examples of creating structured arrays.
2.1 Basic Example with Named Fields
# Define data types for each field: name, age, and weight data_type = [('name', 'U10'), ('age', 'i4'), ('weight', 'f4')] # Creating an array of structured data data = np.array([('Alice', 25, 55.0), ('Bob', 30, 85.5), ('Cathy', 22, 70.1)], dtype=data_type) print("Structured Array:\n", data)
Here:
- U10: String with a maximum of 10 characters.
- i4: 32-bit integer.
- f4: 32-bit float.
2.2 Defining and Creating Arrays with Multiple Fields and Data Types
You can define more complex structured arrays by specifying shapes for the fields.
# Define data types with shapes data_type = [('name', 'U10'), ('grades', 'f4', (3,))] # Creating the array data = np.array([('Alice', [90.5, 85.0, 92.3]), ('Bob', [88.0, 79.5, 95.1])], dtype=data_type) print("Structured Array with Grades:\n", data)
In this example, each entry in grades has a shape of (3,), so each person has three grades.
3. Accessing Structured Array Data
3.1 Accessing Individual Fields
You can access individual fields by their name.
# Accessing the 'name' field names = data['name'] print("Names:\n", names) # Accessing the 'grades' field grades = data['grades'] print("Grades:\n", grades)
3.2 Accessing Individual Records
You can also access individual records (rows).
# Accessing the first record first_record = data[0] print("First Record:\n", first_record) # Accessing specific field in the first record first_name = data[0]['name'] print("Name in First Record:\n", first_name)
4. Adding and Modifying Records
4.1 Modifying Field Values
You can modify the data within each field.
# Change 'Alice' age data[0]['grades'] = [95.0, 90.0, 96.5] print("Modified Grades:\n", data)
4.2 Adding New Records
NumPy structured arrays don’t support direct appending, so to add a new record, you’ll need to create a new array and combine them using np.concatenate().
# Create a new record new_record = np.array([('David', [85.0, 88.0, 90.0])], dtype=data_type) # Concatenate to add the new record data = np.concatenate([data, new_record]) print("Array after Adding New Record:\n", data)
5. Sorting and Filtering Structured Arrays
5.1 Sorting by Field
To sort a structured array by a specific field, use np.sort().
# Sort data by 'name' sorted_by_name = np.sort(data, order='name') print("Sorted by Name:\n", sorted_by_name) # Sort data by 'grades' (first grade only) sorted_by_grades = np.sort(data, order='grades') print("Sorted by Grades (first element):\n", sorted_by_grades)
5.2 Filtering Based on Conditions
You can filter structured arrays by field values using conditions.
# Filter for entries where first grade is greater than 90 high_grades = data[data['grades'][:, 0] > 90] print("Records with First Grade > 90:\n", high_grades)
6. Advanced Structured Array Manipulation
6.1 Accessing Multiple Fields at Once
Use a list of field names to access multiple fields at once.
# Access both 'name' and 'grades' fields subset = data[['name', 'grades']] print("Subset with Name and Grades:\n", subset)
6.2 Masking and Conditional Selection
NumPy structured arrays allow masking based on conditions, which is useful for data filtering.
# Create a mask for people with the first grade above 85 mask = data['grades'][:, 0] > 85 # Apply the mask to get a filtered array filtered_data = data[mask] print("Filtered Array (First Grade > 85):\n", filtered_data)
7. Using Structured Arrays with NumPy Functions
Many NumPy functions work seamlessly with structured arrays.
7.1 Calculating Mean, Min, and Max
Use functions like np.mean(), np.min(), and np.max() on specific fields.
# Calculate mean of the first grade mean_first_grade = np.mean(data['grades'][:, 0]) print("Mean of First Grade:", mean_first_grade) # Calculate max and min of ages max_first_grade = np.max(data['grades'][:, 0]) min_first_grade = np.min(data['grades'][:, 0]) print("Max of First Grade:", max_first_grade) print("Min of First Grade:", min_first_grade)
7.2 Using np.where with Structured Arrays
Use np.where to create new arrays based on conditions.
# Update grades for students whose first grade is below 90 updated_grades = np.where(data['grades'][:, 0] < 90, [90.0, 92.0, 94.0], data['grades']) data['grades'] = updated_grades print("Updated Grades (for first grade < 90):\n", data)
8. Saving and Loading Structured Arrays
Structured arrays can be saved to disk using np.save and loaded back with np.load.
# Save the structured array np.save('students_data.npy', data) # Load the structured array loaded_data = np.load('students_data.npy', allow_pickle=True) print("Loaded Data:\n", loaded_data)
9. Practical Example: Employee Data Analysis
Let’s say you have employee data with fields: name, age, salary, and department.
# Define dtype employee_dtype = [('name', 'U10'), ('age', 'i4'), ('salary', 'f8'), ('department', 'U10')] # Create an array of employees employees = np.array([ ('Alice', 30, 70000.0, 'HR'), ('Bob', 35, 85000.0, 'Engineering'), ('Cathy', 32, 90000.0, 'Marketing'), ('David', 29, 75000.0, 'Engineering') ], dtype=employee_dtype) # Print the employee array print("Employee Data:\n", employees)
9.1 Filter Employees by Department
# Filter employees in Engineering engineering_employees = employees[employees['department'] == 'Engineering'] print("Engineering Employees:\n", engineering_employees)
9.2 Calculate Average Salary
# Calculate average salary average_salary = np.mean(employees['salary']) print("Average Salary:", average_salary)
9.3 Get Employees with Salary Above a Threshold
# Employees with salary above 80000 high_salary_employees = employees[employees['salary'] > 80000] print("High Salary Employees:\n", high_salary_employees)
9.4 Sort Employees by Age
# Sort employees by age sorted_by_age = np.sort(employees, order='age') print("Employees Sorted by Age:\n", sorted_by_age)
Summary of Structured Array Features
Operation | Code Example |
---|---|
Define structured dtype | dtype = [(‘name', ‘U10'), (‘age', ‘i4')] |
Create array | np.array([(‘Alice', 25)], dtype=dtype) |
Access field | data[‘name'] |
Access record | data[0] |
Sort by field | np.sort(data, order='age') |
Filter by condition | data[data[‘age'] > 30] |
Save to file | np.save(‘data.npy', data) |
Load from file | np.load(‘data.npy') |
Structured arrays in NumPy provide a powerful way to work with data that contains multiple fields and mixed data types, ideal for applications like data processing and analysis in scientific or business contexts.