Home » NumPy String Functions tutorial in python

NumPy String Functions tutorial in python

1 Year Subscription
Java SE 11 Developer (Upgrade) [1Z0-817]
Spring Framework Basics Video Course
Oracle Java Certification
Java SE 11 Programmer II [1Z0-816] Practice Tests
Java SE 11 Programmer I [1Z0-815] Practice Tests

NumPy provides a suite of functions to perform vectorized operations on strings, allowing efficient manipulation of string arrays.

These functions are contained in the numpy.char module and are useful for tasks like case conversion, string searching, and other common operations on arrays of text data.

 

Importing NumPy and Creating String Arrays

First, let’s import NumPy and create a sample string array to work with:

import numpy as np

# Example string array
names = np.array(["Alice", "Bob", "Charlie", "David"])

Common NumPy String Functions

All of these functions are available in the numpy.char module, which applies functions element-wise on arrays. Below are some of the most common functions with examples.

1. numpy.char.lower and numpy.char.upper

These functions convert the case of characters in each string element of the array.

Example

# Convert all names to lowercase
lowercase_names = np.char.lower(names)
print("Lowercase:", lowercase_names)

# Convert all names to uppercase
uppercase_names = np.char.upper(names)
print("Uppercase:", uppercase_names)

Output

Lowercase: ['alice' 'bob' 'charlie' 'david']
Uppercase: ['ALICE' 'BOB' 'CHARLIE' 'DAVID']

2. numpy.char.capitalize and numpy.char.title

  • capitalize: Converts the first character of each string to uppercase, and all other characters to lowercase.
  • title: Converts each word in the string to title case.

Example

# Capitalize each name
capitalized_names = np.char.capitalize(names)
print("Capitalized:", capitalized_names)

# Convert to title case
title_names = np.char.title(names)
print("Title case:", title_names)

Output

Capitalized: ['Alice' 'Bob' 'Charlie' 'David']
Title case: ['Alice' 'Bob' 'Charlie' 'David']

3. numpy.char.strip, numpy.char.lstrip, and numpy.char.rstrip

These functions remove whitespace or specified characters from the beginning, end, or both ends of each string.

Example

# Example with whitespace padding
padded_names = np.array(["  Alice  ", " Bob ", "Charlie   ", "  David"])
stripped_names = np.char.strip(padded_names)
print("Stripped:", stripped_names)

# Left strip
lstripped_names = np.char.lstrip(padded_names)
print("Left stripped:", lstripped_names)

# Right strip
rstripped_names = np.char.rstrip(padded_names)
print("Right stripped:", rstripped_names)

Output

Stripped: ['Alice' 'Bob' 'Charlie' 'David']
Left stripped: ['Alice  ' 'Bob ' 'Charlie   ' 'David']
Right stripped: ['  Alice' ' Bob' 'Charlie' '  David']

4. numpy.char.split and numpy.char.join

  • split: Splits each string element by a specified separator.
  • join: Joins elements in each string array with a specified separator.

Example

# Split names by a character (e.g., "a" in "Charlie")
split_names = np.char.split(names, sep="a")
print("Split by 'a':", split_names)

# Join with a separator (e.g., "-")
joined_names = np.char.join("-", names)
print("Joined with '-':", joined_names)

Output

Split by 'a': [list(['Alice']) list(['Bob']) list(['Ch', 'rlie']) list(['D', 'vid'])]
Joined with '-': ['A-l-i-c-e' 'B-o-b' 'C-h-a-r-l-i-e' 'D-a-v-i-d']

Note: split produces lists of split elements.

5. numpy.char.replace

Replaces occurrences of a specified substring with another substring in each string element.

Example

# Replace "a" with "@"
replaced_names = np.char.replace(names, "a", "@")
print("Replace 'a' with '@':", replaced_names)

Output

Replace 'a' with '@': ['Alice' 'Bob' 'Ch@rlie' 'D@vid']

6. numpy.char.contains (not available directly in NumPy, but you can use np.char.find)

To find whether each string contains a substring, you can use np.char.find, which returns the position of the substring or -1 if it doesn’t exist.

Example

# Check if each name contains the letter "a"
contains_a = np.char.find(names, "a") >= 0
print("Contains 'a':", contains_a)

Output

Contains 'a': [ True False  True  True]

7. numpy.char.startswith and numpy.char.endswith

Checks if each string element starts or ends with a specified substring.

Example

# Check if each name starts with "A"
starts_with_a = np.char.startswith(names, "A")
print("Starts with 'A':", starts_with_a)

# Check if each name ends with "e"
ends_with_e = np.char.endswith(names, "e")
print("Ends with 'e':", ends_with_e)

Output

Starts with 'A': [ True False False False]
Ends with 'e': [ True False  True False]

8. numpy.char.add

Concatenates two arrays of strings element-wise.

Example

# Array of greetings
greetings = np.array(["Hi", "Hello", "Hey", "Greetings"])

# Concatenate greeting with names
combined = np.char.add(greetings, names)
print("Combined:", combined)

Output

Combined: ['HiAlice' 'HelloBob' 'HeyCharlie' 'GreetingsDavid']

9. numpy.char.multiply

Repeats each string in the array a specified number of times.

Example

# Repeat each name 3 times
multiplied_names = np.char.multiply(names, 3)
print("Multiplied names:", multiplied_names)

Output

Multiplied names: ['AliceAliceAlice' 'BobBobBob' 'CharlieCharlieCharlie' 'DavidDavidDavid']

10. numpy.char.center

Centers each string in the array within a specified width, padding with a specified character (default is space).

Example

# Center each name within a width of 10, using "*"
centered_names = np.char.center(names, 10, "*")
print("Centered names:", centered_names)

Output

Centered names: ['**Alice***' '**Bob*****' '*Charlie**' '**David***']

11. numpy.char.isnumeric, numpy.char.isalpha, and numpy.char.isdigit

These functions check whether each element in the array consists of only numeric, alphabetic, or digit characters.

Example

mixed = np.array(["Alice123", "Bob", "1234", "David"])

# Check if each element is alphabetic
is_alpha = np.char.isalpha(mixed)
print("Is alphabetic:", is_alpha)

# Check if each element is numeric
is_numeric = np.char.isnumeric(mixed)
print("Is numeric:", is_numeric)

# Check if each element is digit-only
is_digit = np.char.isdigit(mixed)
print("Is digit-only:", is_digit)

Output

Is alphabetic: [False  True False  True]
Is numeric: [False False  True False]
Is digit-only: [False False  True False]

12. numpy.char.encode and numpy.char.decode

These functions allow encoding and decoding of string arrays, useful for converting strings into bytes using a specified encoding format (like UTF-8).

Example

# Encode names to bytes using UTF-8
encoded_names = np.char.encode(names, "utf-8")
print("Encoded:", encoded_names)

# Decode back to strings
decoded_names = np.char.decode(encoded_names, "utf-8")
print("Decoded:", decoded_names)

Output

Encoded: [b'Alice' b'Bob' b'Charlie' b'David']
Decoded: ['Alice' 'Bob' 'Charlie' 'David']

Summary of Common NumPy String Functions

Function Description
np.char.lower, upper Converts to lowercase or uppercase
np.char.capitalize, title Capitalizes the first letter or each word
np.char.strip, lstrip, rstrip Removes whitespace or specified characters
np.char.split, join Splits or joins strings with a separator
np.char.replace Replaces a substring within each string
np.char.find Finds a substring and returns index, or -1
np.char.startswith, endswith Checks for start or end substring
np.char.add, multiply Concatenates or repeats strings
np.char.center Centers each string with padding
np.char.isnumeric, isalpha, isdigit Checks content type of each string
np.char.encode, decode Encodes and decodes strings

NumPy’s char module enables efficient string manipulation for arrays, making it ideal for data preprocessing tasks.

 

You may also like

Leave a Comment

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More