10 Easy Ways to Concatenate DataFrames in Pandas

Advertisement

May 11, 2025 By Alison Perry

Combining DataFrames is a regular task when working with pandas, especially when your data is split across files, tables, or sources. Whether you're preparing reports, aligning metrics, or piecing together fragmented records, pandas offers several reliable ways to bring your DataFrames into a single structure.

Different situations call for different methods, and how you use them can affect the structure of your final result, from index behavior to column alignment and beyond. Here's a rundown of the main techniques, each suited for a specific type of merge.

How To Concatenate Two or More Pandas DataFrames?

Concatenating Vertically with pd.concat() (Default Behavior)

To stack two or more DataFrames on top of each other, the most direct way is to use pd.concat() without changing the axis. This stacks them row-wise.

python

CopyEdit

import pandas as pd

df1 = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Age': [25, 30]})

df2 = pd.DataFrame({'Name': ['Charlie', 'David'], 'Age': [35, 40]})

result = pd.concat([df1, df2])

Here, both DataFrames have the same columns, so they merge without issues. If there are columns that exist in one DataFrame but not the other, pandas will still perform the concat and insert NaN in places where data is missing. This behavior ensures that nothing gets dropped just because it's not in every source.

Concatenating Horizontally by Column (Setting axis=1)

If your goal is to attach DataFrames side by side, set axis=1. This merges them by index, not by row order or content.

python

CopyEdit

df1 = pd.DataFrame({'A': [1, 2]})

df2 = pd.DataFrame({'B': [3, 4]})

result = pd.concat([df1, df2], axis=1)

This keeps the structure of each DataFrame intact but binds them by index. If the index values don’t align, the result will still go through — pandas fills the gaps with NaN, but it doesn’t assume how you want mismatches handled. So when combining horizontally, make sure the indexes of the source DataFrames are either aligned or reset beforehand if alignment isn't your focus.

Adding Keys for Hierarchical Indexing

If you’re joining multiple DataFrames and want to retain some trace of their origin, keys are helpful. When passed as a list to the keys argument in pd.concat(), pandas assigns each DataFrame its own identifier in a new level of the index.

python

CopyEdit

df1 = pd.DataFrame({'Score': [80, 85]})

df2 = pd.DataFrame({'Score': [90, 95]})

result = pd.concat([df1, df2], keys=['Set1', 'Set2'])

The result uses a MultiIndex, where each row shows which group it came from. This is especially useful when you’ll later split the combined data or want to preserve grouping information without modifying the actual content.

Resetting Index with ignore_index=True

Often, you're bringing together DataFrames that have overlapping or inconsistent index values. If the individual index values don’t matter anymore and you want a clean, continuous index in the result, use the ignore_index parameter.

python

CopyEdit

df1 = pd.DataFrame({'City': ['Austin', 'Denver']})

df2 = pd.DataFrame({'City': ['Seattle', 'Boston']})

result = pd.concat([df1, df2], ignore_index=True)

The index now runs from 0 through the total number of rows, which is useful when combining data after filtering or shuffling, where the original index no longer serves a purpose.

Controlling Which Columns Stay with the join Argument

Sometimes you only care about the columns that appear in all DataFrames. In such cases, instead of accepting all columns with missing data filled in, you can filter them out by setting join='inner'.

python

CopyEdit

df1 = pd.DataFrame({'A': [1, 2], 'Common': [5, 6]})

df2 = pd.DataFrame({'Common': [7, 8], 'B': [9, 10]})

result = pd.concat([df1, df2], join='inner')

This filters the result down to columns both DataFrames share — in this case, only Common. The remaining columns (A and B) are dropped automatically. It’s a cleaner approach when working with data sources that may include extra columns you’re not concerned with.

Working with Different Index Types

Sometimes, the DataFrames you're combining might use entirely different index types — like integers in one and strings in another. Pandas will still let you concatenate them, but the resulting index can look inconsistent.

python

CopyEdit

df1 = pd.DataFrame({'X': [100, 200]}, index=[0, 1])

df2 = pd.DataFrame({'X': [300, 400]}, index=['a', 'b'])

result = pd.concat([df1, df2])

The merged DataFrame keeps the mixed index without raising errors. While functional, this kind of result might complicate things like sorting or filtering later, so it's worth planning for, especially if you'll apply index-based operations downstream.

Avoiding Performance Issues When Concatenating in Loops

If you’re working with a collection of DataFrames generated inside a loop — for example, reading several files or creating batches dynamically — don’t concatenate them one-by-one using chained operations. That can slow things down and use more memory than needed.

Instead, collect all the DataFrames in a list first and concatenate them once, after the loop finishes:

python

CopyEdit

dataframes = []

for i in range(3):

df = pd.DataFrame({'Value': [i, i+1]})

dataframes.append(df)

result = pd.concat(dataframes, ignore_index=True)

This approach is both faster and cleaner. You're minimizing the overhead by letting Pandas combine in a single go, rather than recalculating the shape and index after every append.

Using itertools.chain() to Concatenate Large Collections

If your DataFrames are not in a list but generated or gathered in a stream-like structure, you can use chain() from Python’s itertools module. This lets you pass an iterable of DataFrames directly to pd.concat() without building a list.

python

CopyEdit

from itertools import chain

df1 = pd.DataFrame({'Z': [1]})

df2 = pd.DataFrame({'Z': [2]})

df3 = pd.DataFrame({'Z': [3]})

result = pd.concat(chain([df1], [df2], [df3]), ignore_index=True)

This method doesn't save much memory unless you're working at scale, but it avoids storing a long intermediate list, which is useful when reading from a generator or chunked process.

Older Methods like append()

You might come across older code that uses .append() to combine DataFrames. While it works, it's being phased out and has already been deprecated in newer Pandas versions.

python

CopyEdit

df1 = pd.DataFrame({'ID': [1, 2]})

df2 = pd.DataFrame({'ID': [3, 4]})

# Still valid but discouraged

result = df1.append(df2)

This internally uses pd.concat() but is slower in loops and less flexible. For cleaner and more future-proof code, stick with pd.concat() and use lists when combining multiple objects.

Final Thoughts

There’s no single “correct” way to concatenate DataFrames — it depends on how your data is structured and what shape you want in the result. Whether you’re stacking rows, attaching columns, or combining mismatched sets, pandas gives you the flexibility to control the outcome without much hassle. Just make sure to plan ahead for how the index and column structure will behave, and you’ll avoid most of the common surprises.

Advertisement

Recommended Updates

Technologies

Kafka to MongoDB: Building a Streamlined Data Pipeline

Alison Perry / May 05, 2025

Learn how to connect Kafka to MongoDB and build a simple, reliable data pipeline that moves real-time messages into a NoSQL database efficient-ly

Technologies

Can You Tell If a Video Is Fake? Learn About Deepfakes

Tessa Rodriguez / May 26, 2025

Understand deepfakes, their impact, the creation process, and simple tips to identify and avoid falling for fake media.

Technologies

Standout Generative AI Tools and Success Stories

Tessa Rodriguez / May 28, 2025

Highlighting top generative AI tools and real-world examples that show how they’re transforming industries.

Technologies

Domino Data Lab Aims for Responsible Generative AI Growth

Alison Perry / May 27, 2025

Domino Data Lab introduces tools and practices to support safe, ethical, and efficient generative AI development.

Technologies

Inserting Items into Lists in Python: How the insert() Method Works

Tessa Rodriguez / May 08, 2025

Master the Python list insert() method with this easy-to-follow guide. Learn how to insert element in list at specific positions using simple examples

Technologies

Step-by-Step Guide to Installing Copilot for Microsoft 365

Tessa Rodriguez / May 27, 2025

Discover how to install and set up Copilot for Microsoft 365 easily with our step-by-step guide for faster productivity.

Technologies

Is It Time to Switch from Microsoft 365 Copilot?

Alison Perry / May 26, 2025

Struggling with Copilot's cost or limits? Explore smarter alternative AI tools with your desired features and workflow.

Technologies

8 Industries That Will Benefit Most from AI-Powered Contact Centers

Tessa Rodriguez / May 26, 2025

Discover top industries for AI contact centers—healthcare, banking fraud detection, retail, and a few others.

Technologies

Box Integrates Google Vertex AI for Smarter Document Processing

Alison Perry / May 27, 2025

Box adds Google Vertex AI to automate and enhance document processing with advanced machine learning capabilities.

Technologies

How to Loop Through Lists in Python: 10 Useful Techniques

Alison Perry / May 11, 2025

Learn 10 clean and effective ways to iterate over a list in Python. From simple loops to advanced tools like zip, map, and deque, this guide shows you all the options

Technologies

Understanding matplotlib.pyplot.subplots(): Build Better Layouts in Python

Alison Perry / May 07, 2025

Learn how to use matplotlib.pyplot.subplots() in Python to build structured layouts with multiple charts. A clear guide for creating and customizing Python plots in one figure

Technologies

Efficient Ways to Check for Element Existence in a Python List

Alison Perry / May 09, 2025

Discover different methods to check if an element exists in a list in Python. From simple techniques like using in to more advanced methods like binary search, explore all the ways to efficiently check membership in a Python list