NumPy provides a suite of functions to perform vectorized operations on strings, allowing efficient manipulation of string arrays.
These functions are contained in the numpy.char module and are useful for tasks like case conversion, string searching, and other common operations on arrays of text data.
Importing NumPy and Creating String Arrays
First, let’s import NumPy and create a sample string array to work with:
import numpy as np # Example string array names = np.array(["Alice", "Bob", "Charlie", "David"])
Common NumPy String Functions
All of these functions are available in the numpy.char module, which applies functions element-wise on arrays. Below are some of the most common functions with examples.
1. numpy.char.lower and numpy.char.upper
These functions convert the case of characters in each string element of the array.
Example
# Convert all names to lowercase lowercase_names = np.char.lower(names) print("Lowercase:", lowercase_names) # Convert all names to uppercase uppercase_names = np.char.upper(names) print("Uppercase:", uppercase_names)
Output
Lowercase: ['alice' 'bob' 'charlie' 'david'] Uppercase: ['ALICE' 'BOB' 'CHARLIE' 'DAVID']
2. numpy.char.capitalize and numpy.char.title
- capitalize: Converts the first character of each string to uppercase, and all other characters to lowercase.
- title: Converts each word in the string to title case.
Example
# Capitalize each name capitalized_names = np.char.capitalize(names) print("Capitalized:", capitalized_names) # Convert to title case title_names = np.char.title(names) print("Title case:", title_names)
Output
Capitalized: ['Alice' 'Bob' 'Charlie' 'David'] Title case: ['Alice' 'Bob' 'Charlie' 'David']
3. numpy.char.strip, numpy.char.lstrip, and numpy.char.rstrip
These functions remove whitespace or specified characters from the beginning, end, or both ends of each string.
Example
# Example with whitespace padding padded_names = np.array([" Alice ", " Bob ", "Charlie ", " David"]) stripped_names = np.char.strip(padded_names) print("Stripped:", stripped_names) # Left strip lstripped_names = np.char.lstrip(padded_names) print("Left stripped:", lstripped_names) # Right strip rstripped_names = np.char.rstrip(padded_names) print("Right stripped:", rstripped_names)
Output
Stripped: ['Alice' 'Bob' 'Charlie' 'David'] Left stripped: ['Alice ' 'Bob ' 'Charlie ' 'David'] Right stripped: [' Alice' ' Bob' 'Charlie' ' David']
4. numpy.char.split and numpy.char.join
- split: Splits each string element by a specified separator.
- join: Joins elements in each string array with a specified separator.
Example
# Split names by a character (e.g., "a" in "Charlie") split_names = np.char.split(names, sep="a") print("Split by 'a':", split_names) # Join with a separator (e.g., "-") joined_names = np.char.join("-", names) print("Joined with '-':", joined_names)
Output
Split by 'a': [list(['Alice']) list(['Bob']) list(['Ch', 'rlie']) list(['D', 'vid'])] Joined with '-': ['A-l-i-c-e' 'B-o-b' 'C-h-a-r-l-i-e' 'D-a-v-i-d']
Note: split produces lists of split elements.
5. numpy.char.replace
Replaces occurrences of a specified substring with another substring in each string element.
Example
# Replace "a" with "@" replaced_names = np.char.replace(names, "a", "@") print("Replace 'a' with '@':", replaced_names)
Output
Replace 'a' with '@': ['Alice' 'Bob' 'Ch@rlie' 'D@vid']
6. numpy.char.contains (not available directly in NumPy, but you can use np.char.find)
To find whether each string contains a substring, you can use np.char.find, which returns the position of the substring or -1 if it doesn’t exist.
Example
# Check if each name contains the letter "a" contains_a = np.char.find(names, "a") >= 0 print("Contains 'a':", contains_a)
Output
Contains 'a': [ True False True True]
7. numpy.char.startswith and numpy.char.endswith
Checks if each string element starts or ends with a specified substring.
Example
# Check if each name starts with "A" starts_with_a = np.char.startswith(names, "A") print("Starts with 'A':", starts_with_a) # Check if each name ends with "e" ends_with_e = np.char.endswith(names, "e") print("Ends with 'e':", ends_with_e)
Output
Starts with 'A': [ True False False False] Ends with 'e': [ True False True False]
8. numpy.char.add
Concatenates two arrays of strings element-wise.
Example
# Array of greetings greetings = np.array(["Hi", "Hello", "Hey", "Greetings"]) # Concatenate greeting with names combined = np.char.add(greetings, names) print("Combined:", combined)
Output
Combined: ['HiAlice' 'HelloBob' 'HeyCharlie' 'GreetingsDavid']
9. numpy.char.multiply
Repeats each string in the array a specified number of times.
Example
# Repeat each name 3 times multiplied_names = np.char.multiply(names, 3) print("Multiplied names:", multiplied_names)
Output
Multiplied names: ['AliceAliceAlice' 'BobBobBob' 'CharlieCharlieCharlie' 'DavidDavidDavid']
10. numpy.char.center
Centers each string in the array within a specified width, padding with a specified character (default is space).
Example
# Center each name within a width of 10, using "*" centered_names = np.char.center(names, 10, "*") print("Centered names:", centered_names)
Output
Centered names: ['**Alice***' '**Bob*****' '*Charlie**' '**David***']
11. numpy.char.isnumeric, numpy.char.isalpha, and numpy.char.isdigit
These functions check whether each element in the array consists of only numeric, alphabetic, or digit characters.
Example
mixed = np.array(["Alice123", "Bob", "1234", "David"]) # Check if each element is alphabetic is_alpha = np.char.isalpha(mixed) print("Is alphabetic:", is_alpha) # Check if each element is numeric is_numeric = np.char.isnumeric(mixed) print("Is numeric:", is_numeric) # Check if each element is digit-only is_digit = np.char.isdigit(mixed) print("Is digit-only:", is_digit)
Output
Is alphabetic: [False True False True] Is numeric: [False False True False] Is digit-only: [False False True False]
12. numpy.char.encode and numpy.char.decode
These functions allow encoding and decoding of string arrays, useful for converting strings into bytes using a specified encoding format (like UTF-8).
Example
# Encode names to bytes using UTF-8 encoded_names = np.char.encode(names, "utf-8") print("Encoded:", encoded_names) # Decode back to strings decoded_names = np.char.decode(encoded_names, "utf-8") print("Decoded:", decoded_names)
Output
Encoded: [b'Alice' b'Bob' b'Charlie' b'David'] Decoded: ['Alice' 'Bob' 'Charlie' 'David']
Summary of Common NumPy String Functions
Function | Description |
---|---|
np.char.lower, upper | Converts to lowercase or uppercase |
np.char.capitalize, title | Capitalizes the first letter or each word |
np.char.strip, lstrip, rstrip | Removes whitespace or specified characters |
np.char.split, join | Splits or joins strings with a separator |
np.char.replace | Replaces a substring within each string |
np.char.find | Finds a substring and returns index, or -1 |
np.char.startswith, endswith | Checks for start or end substring |
np.char.add, multiply | Concatenates or repeats strings |
np.char.center | Centers each string with padding |
np.char.isnumeric, isalpha, isdigit | Checks content type of each string |
np.char.encode, decode | Encodes and decodes strings |
NumPy’s char module enables efficient string manipulation for arrays, making it ideal for data preprocessing tasks.