Stop Writing Loops in Pandas: 7 Faster Alternatives to Try

Introduction

Row-by-row iteration is a notable performance bottleneck in pandas code. While it may go unnoticed with small datasets, handling large datasets this way can significantly impact performance.

Pandas, built on NumPy, is designed to perform operations on entire arrays simultaneously using compiled C code. By looping over rows in Python, you essentially bypass these optimizations, forcing each operation back into the Python interpreter—one row at a time.

This article explores seven alternatives to loops in pandas, each tailored to different transformation types. By the end, you’ll have a clearer understanding of which tool to employ depending on the problem’s nature.

You can get the Colab notebook on GitHub.

Configuring the Sample Dataset

We’ll use a realistic ecommerce order dataset throughout this article:



import pandas as pd 

import numpy as np 



np.random.seed(42) 

n = 100_000 

categories = ['Electronics', 'Clothing', 'Home & Kitchen', 'Sports', 'Books']

regions = ['North', 'South', 'East', 'West']



df = pd.DataFrame({

    'order_id': range(1, n + 1),

    'customer_age': np.random.randint(18, 70, n),

    'product_category': np.random.choice(categories, n),

    'region': np.random.choice(regions, n),

    'price': np.round(np.random.uniform(5.0, 500.0, n), 2),

    'quantity': np.random.randint(1, 10, n),

    'days_to_ship': np.random.randint(1, 14, n),

})

display(df.head())

To go out:

Configuring the sample dataset
We now have a dataset of 100,000 rows to work with.

Using Vectorized Operations for Arithmetic

For any arithmetic or comparison on a column, vectorized operations should be your first instinct.

What we want to do: Calculate total revenue per order.



df['revenue'] = df['price'] * df['quantity']

display(df[['price', 'quantity', 'revenue']].head())

To go out:

Using vectorized operations for arithmetic

Applying a Function for Conditional Logic

When your transformation involves logic that cannot be expressed in simple arithmetic form, .apply() allows you to pass a function across a column or row.

What we want to do: Assign a shipping priority label based on shipping days.



def shipping_label(days): 

    if days <= 2: 

        return 'Express' 

    elif days <= 5: 

        return 'Standard' 

    else: 

        return 'Economy'



df['shipping_tier'] = df['days_to_ship'].apply(shipping_label)

display(df[['days_to_ship', 'shipping_tier']].head())

To go out:

Applying a function for conditional logic
Using .apply() is clean, readable, and much easier to debug than a loop. Use it when your logic is conditional and np.where() or np.select() seems too nested.

Using np.where() for Binary Conditions

When you have a binary condition—one result if true, another if false—np.where() is the clean and fast choice.

What we want to do: Report orders for which the customer is entitled to a senior discount.



df['senior_discount'] = np.where(df['customer_age'] >= 60, True, False)

display(df[['customer_age', 'senior_discount']].head())

To go out:

Using np.where() for binary conditions
np.where() is fully vectorized and significantly faster than .apply() for simple true or false conditions. Think of it as a vectorized ternary operator.

Selecting Among Multiple Conditions with np.select()

When you have more than two conditions, np.select() allows you to define a list of conditions and corresponding values without the need for nested if/elif strings.

What we want to do: Assign a tax rate based on region.



conditions = [

    df['region'] == 'North', 

    df['region'] == 'South', 

    df['region'] == 'East', 

    df['region'] == 'West',

]



tax_rates = [0.08, 0.06, 0.07, 0.09]



df['tax_rate'] = np.select(conditions, tax_rates, default=0.07)

df['tax_amount'] = df['price'] * df['tax_rate']

display(df[['region', 'price', 'tax_rate', 'tax_amount']].head())

To go out:

Selecting under multiple conditions with np.select()
np.select() evaluates all conditions in order and selects the first match. The default setting handles anything that doesn’t match, which is useful as a safety net.

Mapping Values with a Dictionary Lookup

When you need to translate values in a column, such as mapping category names to numeric codes or replacing keys with labels, .map() with a dictionary is clean and fast.

What we want to do: Map product categories to internal service codes.



category_codes = { 

    'Electronics': 'ELEC', 

    'Clothing': 'CLTH', 

    'Home & Kitchen': 'HOME', 

    'Sports': 'SPRT', 

    'Books': 'BOOK',

}



df['dept_code'] = df['product_category'].map(category_codes)

display(df[['product_category', 'dept_code']].head())

To go out:

Mapping values with a dictionary lookup
.map() works like a lookup table. This is one of the most underused tools in pandas—we often look for .apply(lambda x: dict[x]) when .map(dict) does the same thing faster.

Manipulating Strings with the .str Accessor

String manipulation is where people most often default to loops or .apply(). The .str accessor allows you to perform string operations on an entire column without either.

What we want to do: Extract the first word from the product_category column and convert it to lowercase.



df['category_slug'] = df['product_category'].str.split().str[0].str.lower()

display(df[['product_category', 'category_slug']].head())

To go out:

Manipulating strings with the .str accessor
You can chain .str methods like regular Python string methods. It also supports .str.contains(), .str.replace(), .str.extract() for regular expressions, and more.

Aggregating Groups with .groupby()

A common loop pattern involves iterating through subsets of data to calculate group-level statistics. .groupby() handles this natively.

What we want to do: Calculate total revenue and average shipping days by product category.



summary = (

    df.groupby('product_category')

    .agg(

        total_revenue=('revenue', 'sum'),

        avg_ship_days=('days_to_ship', 'mean'),

        order_count=('order_id', 'count')

    )

    .round(2)

    .reset_index()

)

summary

To go out:

Aggregating groups with .groupby()

Choose the Right Tool

Most transformations that you would write a loop for fit nicely into one of these models:

Operation / Method	Use case/description
Arithmetic on columns	Perform vectorized math operations like addition, subtraction, multiplication, and division directly on DataFrame columns.
Vectorized operations (*, +, etc.)	Efficiently apply element-wise operations on entire columns without explicit loops.
Simple true/false condition	Evaluate Boolean conditions to filter or create conditional columns.
np.where()	Apply conditional logic (if-else) in a vectorized manner for DataFrame tables and columns.
Multiple conditions, multiple results	Handle complex conditional logic with multiple rules and outputs.
np.select()	Select values based on multiple conditions and return corresponding outputs.
Value Substitution via Search	Replace values using mapping dictionaries for quick transformations.
.map(dict)	Map values into a series using a dictionary or substitution function.
.apply()	Apply custom functions per row or per column for flexible transformations.
String manipulation	Use vectorized string operations via the .str accessor to clean and transform text data.
.groupby() + .agg()	Group data and calculate aggregate statistics like sum, average, count, etc.

Once you start thinking in columns rather than rows, you’ll find that the pandas API starts to feel less like a workaround and more like the actual way of working.

Girl Priya C is an Indian developer and technical writer. She enjoys working at the intersection of mathematics, programming, data science, and content creation. Her areas of interest and expertise include DevOps, data science, and natural language processing. She loves reading, writing, coding, and coffee! Currently, she is working on learning and sharing her knowledge with the developer community by creating tutorials, how-to guides, opinion pieces, and more. Bala also creates engaging resource overviews and coding tutorials.

For further reading, visit: Here

“`

CMS tightens oversight of accreditation institutions and limits paid advice

Unlocking UK housing with AI-accelerated planning

We tend to think of learning as an activity for the young, but a study of 416 Canadians over the age of sixty found...

Lenovo’s Snapdragon X2 Windows laptops are worth your attention amid RAMageddon

Stop Writing Loops in Pandas: 7 Faster Alternatives to Try

Introduction

Configuring the Sample Dataset

Using Vectorized Operations for Arithmetic

Applying a Function for Conditional Logic

Using np.where() for Binary Conditions

Selecting Among Multiple Conditions with np.select()

Mapping Values with a Dictionary Lookup

Manipulating Strings with the .str Accessor

Aggregating Groups with .groupby()

Choose the Right Tool

CMS tightens oversight of accreditation institutions and limits paid advice

Unlocking UK housing with AI-accelerated planning

We tend to think of learning as an activity for the young, but a study of 416 Canadians over the age of sixty found...

Lenovo’s Snapdragon X2 Windows laptops are worth your attention amid RAMageddon

Plaud says its software business surpassed $100 million in ARR after shipping more than 2 million AI note takers

Health Information Mining: Estimating Advanced Gait Metrics with Smartwatches

Small Models, Big Results: Achieving Superior Intent Extraction with Decomposition

What really makes cars polluting? An in-depth analysis of CO₂ emissions through data science

Towards a science of scaling agent systems: when and why agent systems work

TOON: Beyond JSON for LLMs

LEAVE A REPLY Cancel reply

Useful Links

Latest News

Unlocking UK housing with AI-accelerated planning

We tend to think of learning as an activity for the young, but a study of 416 Canadians over the age of sixty found...

Lenovo’s Snapdragon X2 Windows laptops are worth your attention amid RAMageddon

Our Newsletter