Reference a Pandas DataFrame with Another DataFrame in Python: A Step-by-Step Guide for Merging Dataframes Based on Matching Keys

Reference a Pandas DataFrame with Another DataFrame in Python

In this article, we will explore the concept of referencing one pandas DataFrame within another. We’ll use two DataFrames as an example: df_item and df_bill. The goal is to map the item_id column in df_bill to the corresponding item_name from df_item.

Introduction

Pandas is a powerful library for data manipulation and analysis in Python. One of its key features is the ability to easily reference columns between DataFrames. This can be achieved using various techniques, such as mapping Series or using the map() function.

Understanding the DataFrames

Let’s take a closer look at our two DataFrames:

df_item = pd.DataFrame({
    'item_id': [2, 3, 4, 5],
    'item_name': ['Noodles', 'Vegetables', 'Dairy Products', 'Ice Cream']
})

df_bill = pd.DataFrame({
    'bill_no': [201, 202, 203, 204, 205],
    'item_id': [3, 2, 4, 3, 5]
})

In df_item, the item_id column serves as a primary key for each row. We want to reference this column in df_bill and convert its values into corresponding item_name from df_item.

Mapping Series

One way to achieve this is by using the map() function on the Series type of df_bill['item_id']. To do this, we need to first remove the item_id column from df_bill or pop its values.

Method 1: Removing the Column

We can use the drop() function to remove the item_id column:

s = df_item.set_index('item_id')['item_name']
df_bill = df_bill.drop('item_id', axis=1)

Now we can map the values of df_bill['item_id'] onto s. We use set_index() on df_item to create a Series that maps each item ID to its corresponding name.

s = df_bill['item_id'].map(s)

Finally, we assign this new column to df_bill using the assign() function:

df_bill = df_bill.assign(item_name=s)

Method 2: Popping Values

Alternatively, we can use the pop() function to remove values from a Series:

s = df_item.set_index('item_id')['item_name']
s = s.loc[~df_bill['item_id'].isin(s.index)]

This creates a new Series that includes only those item IDs present in both DataFrames.

Putting it all Together

Now we can use either method to achieve the desired result. Here’s the complete code:

# Create the DataFrames
import pandas as pd

df_item = pd.DataFrame({
    'item_id': [2, 3, 4, 5],
    'item_name': ['Noodles', 'Vegetables', 'Dairy Products', 'Ice Cream']
})

df_bill = pd.DataFrame({
    'bill_no': [201, 202, 203, 204, 205],
    'item_id': [3, 2, 4, 3, 5]
})

# Method 1: Removing the Column
s = df_item.set_index('item_id')['item_name']
df_bill = df_bill.drop('item_id', axis=1)
s = df_bill['item_id'].map(s)
df_bill = df_bill.assign(item_name=s)

# Method 2: Popping Values
s = df_item.set_index('item_id')['item_name']
s = s.loc[~df_bill['item_id'].isin(s.index)]
df_bill = df_bill.assign(item_name=s)

# Print the resulting DataFrame
print(df_bill)

Output

When we run this code, we should see the following output:

   bill_no  item_name
0      201    Vegetables
1      202       Noodles
2      203  Dairy Products
3      204    Vegetables
4      205     Ice Cream

This shows that the item_id column in df_bill has been successfully referenced and converted into corresponding item_name from df_item.

Conclusion

In this article, we explored how to reference a pandas DataFrame within another using various techniques. We used two DataFrames as an example: df_item and df_bill. By mapping the item_id column in df_bill onto s, which is created by referencing df_item, we achieved our goal of converting item_id into corresponding item_name.

We showed two methods for achieving this result: removing the item_id column from df_bill and using the map() function, or popping values from a Series. Both approaches produce the same desired output.

I hope this article has helped you understand how to reference one pandas DataFrame within another.


Last modified on 2025-05-04