HomeMachine LearningStop Writing Loops in Pandas: 7 Faster Alternatives to Try

Stop Writing Loops in Pandas: 7 Faster Alternatives to Try

Introduction

Row-by-row iteration is a notable performance bottleneck in pandas code. While it may go unnoticed with small datasets, handling large datasets this way can significantly impact performance.

Pandas, built on NumPy, is designed to perform operations on entire arrays simultaneously using compiled C code. By looping over rows in Python, you essentially bypass these optimizations, forcing each operation back into the Python interpreter—one row at a time.

This article explores seven alternatives to loops in pandas, each tailored to different transformation types. By the end, you’ll have a clearer understanding of which tool to employ depending on the problem’s nature.

You can get the Colab notebook on GitHub.

Configuring the Sample Dataset

We’ll use a realistic ecommerce order dataset throughout this article:


import pandas as pd
import numpy as np

np.random.seed(42)
n = 100_000
categories = ['Electronics', 'Clothing', 'Home & Kitchen', 'Sports', 'Books']
regions = ['North', 'South', 'East', 'West']

df = pd.DataFrame({
'order_id': range(1, n + 1),
'customer_age': np.random.randint(18, 70, n),
'product_category': np.random.choice(categories, n),
'region': np.random.choice(regions, n),
'price': np.round(np.random.uniform(5.0, 500.0, n), 2),
'quantity': np.random.randint(1, 10, n),
'days_to_ship': np.random.randint(1, 14, n),
})
display(df.head())

To go out:

Configuring the sample dataset
We now have a dataset of 100,000 rows to work with.

Using Vectorized Operations for Arithmetic

For any arithmetic or comparison on a column, vectorized operations should be your first instinct.

What we want to do: Calculate total revenue per order.


df['revenue'] = df['price'] * df['quantity']
display(df[['price', 'quantity', 'revenue']].head())

To go out:

Using vectorized operations for arithmetic

Applying a Function for Conditional Logic

When your transformation involves logic that cannot be expressed in simple arithmetic form, .apply() allows you to pass a function across a column or row.

What we want to do: Assign a shipping priority label based on shipping days.


def shipping_label(days):
if days <= 2:
return 'Express'
elif days <= 5:
return 'Standard'
else:
return 'Economy'

df['shipping_tier'] = df['days_to_ship'].apply(shipping_label)
display(df[['days_to_ship', 'shipping_tier']].head())

To go out:

Applying a function for conditional logic
Using .apply() is clean, readable, and much easier to debug than a loop. Use it when your logic is conditional and np.where() or np.select() seems too nested.

Using np.where() for Binary Conditions

When you have a binary condition—one result if true, another if false—np.where() is the clean and fast choice.

What we want to do: Report orders for which the customer is entitled to a senior discount.


df['senior_discount'] = np.where(df['customer_age'] >= 60, True, False)
display(df[['customer_age', 'senior_discount']].head())

To go out:

Using np.where() for binary conditions
np.where() is fully vectorized and significantly faster than .apply() for simple true or false conditions. Think of it as a vectorized ternary operator.

Selecting Among Multiple Conditions with np.select()

When you have more than two conditions, np.select() allows you to define a list of conditions and corresponding values without the need for nested if/elif strings.

What we want to do: Assign a tax rate based on region.


conditions = [
df['region'] == 'North',
df['region'] == 'South',
df['region'] == 'East',
df['region'] == 'West',
]

tax_rates = [0.08, 0.06, 0.07, 0.09]

df['tax_rate'] = np.select(conditions, tax_rates, default=0.07)
df['tax_amount'] = df['price'] * df['tax_rate']
display(df[['region', 'price', 'tax_rate', 'tax_amount']].head())

To go out:

Selecting under multiple conditions with np.select()
np.select() evaluates all conditions in order and selects the first match. The default setting handles anything that doesn’t match, which is useful as a safety net.

Mapping Values with a Dictionary Lookup

When you need to translate values in a column, such as mapping category names to numeric codes or replacing keys with labels, .map() with a dictionary is clean and fast.

What we want to do: Map product categories to internal service codes.


category_codes = {
'Electronics': 'ELEC',
'Clothing': 'CLTH',
'Home & Kitchen': 'HOME',
'Sports': 'SPRT',
'Books': 'BOOK',
}

df['dept_code'] = df['product_category'].map(category_codes)
display(df[['product_category', 'dept_code']].head())

To go out:

Mapping values with a dictionary lookup
.map() works like a lookup table. This is one of the most underused tools in pandas—we often look for .apply(lambda x: dict[x]) when .map(dict) does the same thing faster.

Manipulating Strings with the .str Accessor

String manipulation is where people most often default to loops or .apply(). The .str accessor allows you to perform string operations on an entire column without either.

What we want to do: Extract the first word from the product_category column and convert it to lowercase.


df['category_slug'] = df['product_category'].str.split().str[0].str.lower()
display(df[['product_category', 'category_slug']].head())

To go out:

Manipulating strings with the .str accessor
You can chain .str methods like regular Python string methods. It also supports .str.contains(), .str.replace(), .str.extract() for regular expressions, and more.

Aggregating Groups with .groupby()

A common loop pattern involves iterating through subsets of data to calculate group-level statistics. .groupby() handles this natively.

What we want to do: Calculate total revenue and average shipping days by product category.


summary = (
df.groupby('product_category')
.agg(
total_revenue=('revenue', 'sum'),
avg_ship_days=('days_to_ship', 'mean'),
order_count=('order_id', 'count')
)
.round(2)
.reset_index()
)
summary

To go out:

Aggregating groups with .groupby()

Choose the Right Tool

Most transformations that you would write a loop for fit nicely into one of these models:

Operation / MethodUse case/description
Arithmetic on columnsPerform vectorized math operations like addition, subtraction, multiplication, and division directly on DataFrame columns.
Vectorized operations (*, +, etc.)Efficiently apply element-wise operations on entire columns without explicit loops.
Simple true/false conditionEvaluate Boolean conditions to filter or create conditional columns.
np.where()Apply conditional logic (if-else) in a vectorized manner for DataFrame tables and columns.
Multiple conditions, multiple resultsHandle complex conditional logic with multiple rules and outputs.
np.select()Select values based on multiple conditions and return corresponding outputs.
Value Substitution via SearchReplace values using mapping dictionaries for quick transformations.
.map(dict)Map values into a series using a dictionary or substitution function.
.apply()Apply custom functions per row or per column for flexible transformations.
String manipulationUse vectorized string operations via the .str accessor to clean and transform text data.
.groupby() + .agg()Group data and calculate aggregate statistics like sum, average, count, etc.

Once you start thinking in columns rather than rows, you’ll find that the pandas API starts to feel less like a workaround and more like the actual way of working.

Girl Priya C is an Indian developer and technical writer. She enjoys working at the intersection of mathematics, programming, data science, and content creation. Her areas of interest and expertise include DevOps, data science, and natural language processing. She loves reading, writing, coding, and coffee! Currently, she is working on learning and sharing her knowledge with the developer community by creating tutorials, how-to guides, opinion pieces, and more. Bala also creates engaging resource overviews and coding tutorials.

For further reading, visit: Here

“`

Must Read
Related News

LEAVE A REPLY

Please enter your comment!
Please enter your name here