Advertisement
Combining DataFrames is a regular task when working with pandas, especially when your data is split across files, tables, or sources. Whether you're preparing reports, aligning metrics, or piecing together fragmented records, pandas offers several reliable ways to bring your DataFrames into a single structure.
Different situations call for different methods, and how you use them can affect the structure of your final result, from index behavior to column alignment and beyond. Here's a rundown of the main techniques, each suited for a specific type of merge.
To stack two or more DataFrames on top of each other, the most direct way is to use pd.concat() without changing the axis. This stacks them row-wise.
python
CopyEdit
import pandas as pd
df1 = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Age': [25, 30]})
df2 = pd.DataFrame({'Name': ['Charlie', 'David'], 'Age': [35, 40]})
result = pd.concat([df1, df2])
Here, both DataFrames have the same columns, so they merge without issues. If there are columns that exist in one DataFrame but not the other, pandas will still perform the concat and insert NaN in places where data is missing. This behavior ensures that nothing gets dropped just because it's not in every source.
If your goal is to attach DataFrames side by side, set axis=1. This merges them by index, not by row order or content.
python
CopyEdit
df1 = pd.DataFrame({'A': [1, 2]})
df2 = pd.DataFrame({'B': [3, 4]})
result = pd.concat([df1, df2], axis=1)
This keeps the structure of each DataFrame intact but binds them by index. If the index values don’t align, the result will still go through — pandas fills the gaps with NaN, but it doesn’t assume how you want mismatches handled. So when combining horizontally, make sure the indexes of the source DataFrames are either aligned or reset beforehand if alignment isn't your focus.
If you’re joining multiple DataFrames and want to retain some trace of their origin, keys are helpful. When passed as a list to the keys argument in pd.concat(), pandas assigns each DataFrame its own identifier in a new level of the index.
python
CopyEdit
df1 = pd.DataFrame({'Score': [80, 85]})
df2 = pd.DataFrame({'Score': [90, 95]})
result = pd.concat([df1, df2], keys=['Set1', 'Set2'])
The result uses a MultiIndex, where each row shows which group it came from. This is especially useful when you’ll later split the combined data or want to preserve grouping information without modifying the actual content.
Often, you're bringing together DataFrames that have overlapping or inconsistent index values. If the individual index values don’t matter anymore and you want a clean, continuous index in the result, use the ignore_index parameter.
python
CopyEdit
df1 = pd.DataFrame({'City': ['Austin', 'Denver']})
df2 = pd.DataFrame({'City': ['Seattle', 'Boston']})
result = pd.concat([df1, df2], ignore_index=True)
The index now runs from 0 through the total number of rows, which is useful when combining data after filtering or shuffling, where the original index no longer serves a purpose.
Sometimes you only care about the columns that appear in all DataFrames. In such cases, instead of accepting all columns with missing data filled in, you can filter them out by setting join='inner'.
python
CopyEdit
df1 = pd.DataFrame({'A': [1, 2], 'Common': [5, 6]})
df2 = pd.DataFrame({'Common': [7, 8], 'B': [9, 10]})
result = pd.concat([df1, df2], join='inner')
This filters the result down to columns both DataFrames share — in this case, only Common. The remaining columns (A and B) are dropped automatically. It’s a cleaner approach when working with data sources that may include extra columns you’re not concerned with.
Sometimes, the DataFrames you're combining might use entirely different index types — like integers in one and strings in another. Pandas will still let you concatenate them, but the resulting index can look inconsistent.
python
CopyEdit
df1 = pd.DataFrame({'X': [100, 200]}, index=[0, 1])
df2 = pd.DataFrame({'X': [300, 400]}, index=['a', 'b'])
result = pd.concat([df1, df2])
The merged DataFrame keeps the mixed index without raising errors. While functional, this kind of result might complicate things like sorting or filtering later, so it's worth planning for, especially if you'll apply index-based operations downstream.
If you’re working with a collection of DataFrames generated inside a loop — for example, reading several files or creating batches dynamically — don’t concatenate them one-by-one using chained operations. That can slow things down and use more memory than needed.
Instead, collect all the DataFrames in a list first and concatenate them once, after the loop finishes:
python
CopyEdit
dataframes = []
for i in range(3):
df = pd.DataFrame({'Value': [i, i+1]})
dataframes.append(df)
result = pd.concat(dataframes, ignore_index=True)
This approach is both faster and cleaner. You're minimizing the overhead by letting Pandas combine in a single go, rather than recalculating the shape and index after every append.
If your DataFrames are not in a list but generated or gathered in a stream-like structure, you can use chain() from Python’s itertools module. This lets you pass an iterable of DataFrames directly to pd.concat() without building a list.
python
CopyEdit
from itertools import chain
df1 = pd.DataFrame({'Z': [1]})
df2 = pd.DataFrame({'Z': [2]})
df3 = pd.DataFrame({'Z': [3]})
result = pd.concat(chain([df1], [df2], [df3]), ignore_index=True)
This method doesn't save much memory unless you're working at scale, but it avoids storing a long intermediate list, which is useful when reading from a generator or chunked process.
You might come across older code that uses .append() to combine DataFrames. While it works, it's being phased out and has already been deprecated in newer Pandas versions.
python
CopyEdit
df1 = pd.DataFrame({'ID': [1, 2]})
df2 = pd.DataFrame({'ID': [3, 4]})
# Still valid but discouraged
result = df1.append(df2)
This internally uses pd.concat() but is slower in loops and less flexible. For cleaner and more future-proof code, stick with pd.concat() and use lists when combining multiple objects.
There’s no single “correct” way to concatenate DataFrames — it depends on how your data is structured and what shape you want in the result. Whether you’re stacking rows, attaching columns, or combining mismatched sets, pandas gives you the flexibility to control the outcome without much hassle. Just make sure to plan ahead for how the index and column structure will behave, and you’ll avoid most of the common surprises.
Advertisement
Learn how to connect Kafka to MongoDB and build a simple, reliable data pipeline that moves real-time messages into a NoSQL database efficient-ly
Understand deepfakes, their impact, the creation process, and simple tips to identify and avoid falling for fake media.
Highlighting top generative AI tools and real-world examples that show how they’re transforming industries.
Domino Data Lab introduces tools and practices to support safe, ethical, and efficient generative AI development.
Master the Python list insert() method with this easy-to-follow guide. Learn how to insert element in list at specific positions using simple examples
Discover how to install and set up Copilot for Microsoft 365 easily with our step-by-step guide for faster productivity.
Struggling with Copilot's cost or limits? Explore smarter alternative AI tools with your desired features and workflow.
Discover top industries for AI contact centers—healthcare, banking fraud detection, retail, and a few others.
Box adds Google Vertex AI to automate and enhance document processing with advanced machine learning capabilities.
Learn 10 clean and effective ways to iterate over a list in Python. From simple loops to advanced tools like zip, map, and deque, this guide shows you all the options
Learn how to use matplotlib.pyplot.subplots() in Python to build structured layouts with multiple charts. A clear guide for creating and customizing Python plots in one figure
Discover different methods to check if an element exists in a list in Python. From simple techniques like using in to more advanced methods like binary search, explore all the ways to efficiently check membership in a Python list