Home » NumPy Structured Arrays Tutorial

NumPy Structured Arrays Tutorial

Java SE 11 Developer (Upgrade) [1Z0-817]
Java SE 11 Programmer II [1Z0-816] Practice Tests
Spring Framework Basics Video Course
Java SE 11 Programmer I [1Z0-815] Practice Tests
1 Year Subscription
Oracle Java Certification

NumPy Structured Arrays (also called record arrays) are arrays with fields, where each field has a name, data type, and shape.

Structured arrays allow for heterogeneous data, making them useful for working with data records (like rows in a spreadsheet) where each field can have a different data type.

 

Structured arrays in NumPy are great for managing complex data types in an organized manner. Let’s go through how to create, access, and manipulate them.

1. Importing NumPy

import numpy as np

2. Creating Structured Arrays

Structured arrays are created by defining a dtype with field names, data types, and optionally, the shape of each field. Here are some examples of creating structured arrays.

2.1 Basic Example with Named Fields

# Define data types for each field: name, age, and weight
data_type = [('name', 'U10'), ('age', 'i4'), ('weight', 'f4')]

# Creating an array of structured data
data = np.array([('Alice', 25, 55.0), ('Bob', 30, 85.5), ('Cathy', 22, 70.1)], dtype=data_type)
print("Structured Array:\n", data)

Here:

  • U10: String with a maximum of 10 characters.
  • i4: 32-bit integer.
  • f4: 32-bit float.

2.2 Defining and Creating Arrays with Multiple Fields and Data Types

You can define more complex structured arrays by specifying shapes for the fields.

# Define data types with shapes
data_type = [('name', 'U10'), ('grades', 'f4', (3,))]

# Creating the array
data = np.array([('Alice', [90.5, 85.0, 92.3]), ('Bob', [88.0, 79.5, 95.1])], dtype=data_type)
print("Structured Array with Grades:\n", data)

In this example, each entry in grades has a shape of (3,), so each person has three grades.

3. Accessing Structured Array Data

3.1 Accessing Individual Fields

You can access individual fields by their name.

# Accessing the 'name' field
names = data['name']
print("Names:\n", names)

# Accessing the 'grades' field
grades = data['grades']
print("Grades:\n", grades)

3.2 Accessing Individual Records

You can also access individual records (rows).

# Accessing the first record
first_record = data[0]
print("First Record:\n", first_record)

# Accessing specific field in the first record
first_name = data[0]['name']
print("Name in First Record:\n", first_name)

4. Adding and Modifying Records

4.1 Modifying Field Values

You can modify the data within each field.

# Change 'Alice' age
data[0]['grades'] = [95.0, 90.0, 96.5]
print("Modified Grades:\n", data)

4.2 Adding New Records

NumPy structured arrays don’t support direct appending, so to add a new record, you’ll need to create a new array and combine them using np.concatenate().

# Create a new record
new_record = np.array([('David', [85.0, 88.0, 90.0])], dtype=data_type)

# Concatenate to add the new record
data = np.concatenate([data, new_record])
print("Array after Adding New Record:\n", data)

5. Sorting and Filtering Structured Arrays

5.1 Sorting by Field

To sort a structured array by a specific field, use np.sort().

# Sort data by 'name'
sorted_by_name = np.sort(data, order='name')
print("Sorted by Name:\n", sorted_by_name)

# Sort data by 'grades' (first grade only)
sorted_by_grades = np.sort(data, order='grades')
print("Sorted by Grades (first element):\n", sorted_by_grades)

5.2 Filtering Based on Conditions

You can filter structured arrays by field values using conditions.

# Filter for entries where first grade is greater than 90
high_grades = data[data['grades'][:, 0] > 90]
print("Records with First Grade > 90:\n", high_grades)

6. Advanced Structured Array Manipulation

6.1 Accessing Multiple Fields at Once

Use a list of field names to access multiple fields at once.

# Access both 'name' and 'grades' fields
subset = data[['name', 'grades']]
print("Subset with Name and Grades:\n", subset)

6.2 Masking and Conditional Selection

NumPy structured arrays allow masking based on conditions, which is useful for data filtering.

# Create a mask for people with the first grade above 85
mask = data['grades'][:, 0] > 85

# Apply the mask to get a filtered array
filtered_data = data[mask]
print("Filtered Array (First Grade > 85):\n", filtered_data)

7. Using Structured Arrays with NumPy Functions

Many NumPy functions work seamlessly with structured arrays.

7.1 Calculating Mean, Min, and Max

Use functions like np.mean(), np.min(), and np.max() on specific fields.

# Calculate mean of the first grade
mean_first_grade = np.mean(data['grades'][:, 0])
print("Mean of First Grade:", mean_first_grade)

# Calculate max and min of ages
max_first_grade = np.max(data['grades'][:, 0])
min_first_grade = np.min(data['grades'][:, 0])
print("Max of First Grade:", max_first_grade)
print("Min of First Grade:", min_first_grade)

7.2 Using np.where with Structured Arrays

Use np.where to create new arrays based on conditions.

# Update grades for students whose first grade is below 90
updated_grades = np.where(data['grades'][:, 0] < 90, [90.0, 92.0, 94.0], data['grades'])
data['grades'] = updated_grades
print("Updated Grades (for first grade < 90):\n", data)

8. Saving and Loading Structured Arrays

Structured arrays can be saved to disk using np.save and loaded back with np.load.

# Save the structured array
np.save('students_data.npy', data)

# Load the structured array
loaded_data = np.load('students_data.npy', allow_pickle=True)
print("Loaded Data:\n", loaded_data)

9. Practical Example: Employee Data Analysis

Let’s say you have employee data with fields: name, age, salary, and department.

# Define dtype
employee_dtype = [('name', 'U10'), ('age', 'i4'), ('salary', 'f8'), ('department', 'U10')]

# Create an array of employees
employees = np.array([
    ('Alice', 30, 70000.0, 'HR'),
    ('Bob', 35, 85000.0, 'Engineering'),
    ('Cathy', 32, 90000.0, 'Marketing'),
    ('David', 29, 75000.0, 'Engineering')
], dtype=employee_dtype)

# Print the employee array
print("Employee Data:\n", employees)

9.1 Filter Employees by Department

# Filter employees in Engineering
engineering_employees = employees[employees['department'] == 'Engineering']
print("Engineering Employees:\n", engineering_employees)

9.2 Calculate Average Salary

# Calculate average salary
average_salary = np.mean(employees['salary'])
print("Average Salary:", average_salary)

9.3 Get Employees with Salary Above a Threshold

# Employees with salary above 80000
high_salary_employees = employees[employees['salary'] > 80000]
print("High Salary Employees:\n", high_salary_employees)

9.4 Sort Employees by Age

# Sort employees by age
sorted_by_age = np.sort(employees, order='age')
print("Employees Sorted by Age:\n", sorted_by_age)

Summary of Structured Array Features

Operation Code Example
Define structured dtype dtype = [(‘name', ‘U10'), (‘age', ‘i4')]
Create array np.array([(‘Alice', 25)], dtype=dtype)
Access field data[‘name']
Access record data[0]
Sort by field np.sort(data, order='age')
Filter by condition data[data[‘age'] > 30]
Save to file np.save(‘data.npy', data)
Load from file np.load(‘data.npy')

Structured arrays in NumPy provide a powerful way to work with data that contains multiple fields and mixed data types, ideal for applications like data processing and analysis in scientific or business contexts.

You may also like

Leave a Comment

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More