Home » NumPy String Functions Tutorial

NumPy String Functions Tutorial

Java SE 11 Developer (Upgrade) [1Z0-817]
1 Year Subscription
Spring Framework Basics Video Course
Oracle Java Certification
Java SE 11 Programmer II [1Z0-816] Practice Tests
Java SE 11 Programmer I [1Z0-815] Practice Tests

NumPy provides a variety of functions specifically for string manipulation.

These functions allow you to perform operations like concatenation, splitting, stripping, finding, and replacing within strings across arrays.

NumPy’s string functions apply operations element-wise to arrays, making it convenient to work with large datasets of strings.

Here's a comprehensive guide on using NumPy String Functions with examples.

First, let’s ensure we import NumPy.

import numpy as np

1. Creating String Arrays

You can create arrays of strings using np.array() and specifying the string data type.

# Creating a string array
string_array = np.array(['apple', 'banana', 'cherry', 'date'])
print("String Array:\n", string_array)

2. Basic String Operations

2.1 Converting to Upper and Lower Case

  • np.char.upper(): Converts all characters to uppercase.
  • np.char.lower(): Converts all characters to lowercase.
# Convert to uppercase
upper_case = np.char.upper(string_array)
print("Upper Case:\n", upper_case)

# Convert to lowercase
lower_case = np.char.lower(string_array)
print("Lower Case:\n", lower_case)

2.2 Title Case

  • np.char.title(): Capitalizes the first letter of each word.
# Convert to title case
title_case = np.char.title(string_array)
print("Title Case:\n", title_case)

3. Concatenating Strings

  • np.char.add(): Concatenates two arrays element-wise.
  • np.char.join(): Joins elements using a separator.
# Define another string array
string_array2 = np.array([' pie', ' split', ' tart', ' bar'])

# Concatenate two string arrays
concatenated = np.char.add(string_array, string_array2)
print("Concatenated Array:\n", concatenated)

# Join elements with a separator
joined = np.char.join('-', string_array)
print("Joined with '-':\n", joined)

4. String Stripping

  • np.char.strip(): Removes leading and trailing characters.
  • np.char.lstrip(): Removes leading characters.
  • np.char.rstrip(): Removes trailing characters.
# Define an array with extra spaces
string_array_with_spaces = np.array(['  apple  ', ' banana ', ' cherry', 'date  '])

# Strip leading and trailing spaces
stripped = np.char.strip(string_array_with_spaces)
print("Stripped Array:\n", stripped)

# Strip leading spaces
lstrip_result = np.char.lstrip(string_array_with_spaces)
print("Left Stripped:\n", lstrip_result)

# Strip trailing spaces
rstrip_result = np.char.rstrip(string_array_with_spaces)
print("Right Stripped:\n", rstrip_result)

5. String Splitting and Joining

5.1 Splitting Strings

  • np.char.split(): Splits each string by a specified separator.
  • np.char.splitlines(): Splits at newlines.
# Splitting strings by space
split_result = np.char.split(np.array(['apple pie', 'banana split', 'cherry tart']))
print("Split by Space:\n", split_result)

# Splitting strings by a specific character
split_by_char = np.char.split(np.array(['apple,banana', 'cherry,date']), sep=',')
print("Split by Comma:\n", split_by_char)

5.2 Joining Strings

  • np.char.join(): Joins the characters of each string with a specified separator.
# Joining each character with a hyphen
joined_chars = np.char.join('-', string_array)
print("Joined Characters with Hyphen:\n", joined_chars)

6. Finding and Counting Substrings

6.1 Finding Substrings

  • np.char.find(): Returns the lowest index in each string where the substring is found.
  • np.char.index(): Similar to find() but raises an error if the substring is not found.
# Find the position of substring 'a' in each string
find_result = np.char.find(string_array, 'a')
print("Find 'a' in each string:\n", find_result)

# Using index (raises error if not found)
index_result = np.char.index(string_array, 'e')
print("Index of 'e' in each string:\n", index_result)

6.2 Counting Substrings

  • np.char.count(): Counts the occurrences of a substring in each string.
# Count occurrences of 'a'
count_result = np.char.count(string_array, 'a')
print("Count of 'a' in each string:\n", count_result)

7. Replacing Substrings

  • np.char.replace(): Replaces occurrences of a substring with a new substring.
# Replace 'a' with '@' in each string
replace_result = np.char.replace(string_array, 'a', '@')
print("Replace 'a' with '@':\n", replace_result)

8. Checking Conditions

8.1 Checking if Each Element Starts or Ends with a Substring

  • np.char.startswith(): Checks if each string starts with a specified substring.
  • np.char.endswith(): Checks if each string ends with a specified substring.
# Check if each string starts with 'a'
startswith_result = np.char.startswith(string_array, 'a')
print("Starts with 'a':\n", startswith_result)

# Check if each string ends with 'e'
endswith_result = np.char.endswith(string_array, 'e')
print("Ends with 'e':\n", endswith_result)

9. Changing String Case

NumPy provides several functions to change string case, allowing flexibility when working with text data.

9.1 Swapping Case

  • np.char.swapcase(): Swaps the case of each character in each string.
# Swapping case
swapcase_result = np.char.swapcase(np.array(['Apple', 'BaNaNa', 'Cherry']))
print("Swap Case:\n", swapcase_result)

9.2 Capitalizing the First Letter

  • np.char.capitalize(): Capitalizes the first character of each string and makes the rest lowercase.
# Capitalize first letter
capitalize_result = np.char.capitalize(np.array(['apple', 'banana', 'cherry']))
print("Capitalized:\n", capitalize_result)

10. Practical Examples

10.1 Email Address Standardization

Suppose we have a list of email addresses that need to be standardized to lowercase.

# Array of email addresses
emails = np.array(['John.Doe@example.com', 'JANE.DOE@EXAMPLE.COM', 'jill.doe@example.com'])

# Convert to lowercase
standardized_emails = np.char.lower(emails)
print("Standardized Emails:\n", standardized_emails)

10.2 Masking Sensitive Information

Imagine we have a list of phone numbers, and we want to mask all but the last four digits.

# Array of phone numbers
phone_numbers = np.array(['123-456-7890', '987-654-3210', '555-867-5309'])

# Mask all but last four digits
masked_numbers = np.char.replace(phone_numbers, r'\d{3}-\d{3}', '***-***')
print("Masked Phone Numbers:\n", masked_numbers)

10.3 Creating Username Suggestions

Let’s create username suggestions based on first and last names.

# Define arrays for first and last names
first_names = np.array(['john', 'jane', 'jill'])
last_names = np.array(['doe', 'smith', 'doe'])

# Concatenate first and last names to create usernames
usernames = np.char.add(first_names, last_names)
print("Username Suggestions:\n", usernames)

# Add a numeric suffix to make them unique
usernames_with_suffix = np.char.add(usernames, np.array(['1', '2', '3']))
print("Username Suggestions with Suffix:\n", usernames_with_suffix)

Summary of NumPy String Functions

Function Description
np.char.upper() Converts to uppercase
np.char.lower() Converts to lowercase
np.char.title() Converts to title case
np.char.add() Concatenates two arrays
np.char.join() Joins elements with a separator
np.char.strip() Removes leading/trailing characters
np.char.split() Splits each element by a separator
np.char.find() Finds the index of a substring
np.char.count() Counts occurrences of a substring
np.char.replace() Replaces occurrences of a substring
np.char.startswith() Checks if each element starts with a substring
np.char.endswith() Checks if each element ends with a substring
np.char.swapcase() Swaps case of each character
np.char.capitalize() Capitalizes the first character and lowers the rest

This tutorial covers a wide range of NumPy string functions for data processing and manipulation, enabling efficient handling of text data within arrays.

You may also like

Leave a Comment

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More