Python

Pandas String Operations

This cheat sheet provides a guide to working with text data in Python.

Example Data Setup

Throughout this cheat sheet, we’ll be using two pandas series named suits and rock_paper_scissors.

import pandas as pd
 
# Create example series
suits = pd.Series(["clubs", "Diamonds", "hearts", "Spades"])
rock_paper_scissors = pd.Series(["rock ", " paper", "scissors"])

String Lengths and Substrings

Get String Length

suits.str.len()  # Returns 5 8 6 6

Get Substrings by Position

suits.str[2:5]  # Returns "ubs" "amo" "art" "ade"
suits.str[:-3]  # Returns "cl" "Diamo" "hea" "Spa"

Strip Whitespace

rock_paper_scissors.str.strip()  # Returns "rock" "paper" "scissors"

Pad Strings

suits.str.pad(8, fillchar="_")  # Returns "___clubs" "Diamonds" "__hearts" "__Spades"

Changing Case

Lowercase

suits.str.lower()  # Returns "clubs" "diamonds" "hearts" "spades"

Uppercase

suits.str.upper()  # Returns "CLUBS" "DIAMONDS" "HEARTS" "SPADES"

Title Case

pd.Series("hello, world!").str.title()  # Returns "Hello, World!"

Sentence Case

pd.Series("hello, world!").str.capitalize()  # Returns "Hello, world!"

Splitting Strings

Split into Characters

suits.str.split(pat="")  # Splits each string into list of characters

Split by Separator

suits.str.split(pat="a")  # Returns ["clubs"], ["Di", "monds"], ["he", "rts"], ["Sp", "des"]

Split into DataFrame

suits.str.split(pat="a", expand=True)  # Returns DataFrame with columns

Extracting Matches

Find All Matches

suits.str.findall(".[ae]")  # Returns ["ia"] ["he"] ["pa", "de"]

Extract Capture Groups

suits.str.extractall("([ae])(.)")  # Returns DataFrame with capture groups

Filter by Pattern

suits[suits.str.contains("d")]  # Returns subset of strings containing 'd'

Replacing Matches

Replace Pattern

suits.str.replace("a", "4")  # Returns "clubs" "Di4monds" "he4rts" "Sp4des"

Remove Suffix

suits.str.removesuffix("s")  # Returns "club" "Diamond" "heart" "Spade"

Replace Substring

rhymes = pd.Series(["vein", "gain", "deign"])
rhymes.str.slice_replace(0, 1, "r")  # Returns "rein" "rain" "reign"

Joining and Concatenating

Combine with String

suits + "5"  # Returns "clubs5" "Diamonds5" "hearts5" "Spades5"

Join Series Elements

suits.str.cat(sep=", ")  # Returns "clubs, Diamonds, hearts, Spades"

Duplicate Strings

suits * 2  # Returns "clubsclubs" "DiamondsDiamonds" "heartshearts" "SpadesSpades"

Formatting Settings

Format DataFrame Values

df = pd.DataFrame({"x": [0.123, 4.567, 8.901]})
df.style.format(precision=1)  # Formats numbers to 1 decimal place

Last updated on