Joining Two Excel-Based DataFrames with Python Using pandas Library

Joining Two Separate Excel-Based DataFrames with Python

Joining two separate Excel-based dataframes that are related by a common column can be achieved using Python and the popular pandas library. In this article, we will explore how to join these dataframes based on a specific condition.

Problem Statement

We have two separate excel files, df1 and df2, each containing different types of data. The data in both files are related by a common column, namely ceremony_number. Our goal is to create a new dataframe that combines the data from both files based on this condition.

Step 1: Importing Libraries

To start solving this problem, we need to import the necessary libraries. We will be using pandas for data manipulation and analysis.

import pandas as pd

Step 2: Creating Sample DataFrames

We will create sample dataframes that mimic the structure of df1 and df2.

# Create df1
data1 = {'birthdate': {0: '9/30/1895', 1: '7/23/1884', 2: '3/29/1889',
                       3: '4/10/1868', 4: '4/8/1892'},
         'ceremony_number': {0: '1st', 1: '1st', 2: '2nd', 3: '3rd', 4: '2nd'}}
df1 = pd.DataFrame(data1)

# Create df2
data2 = {'dates': {1: '1929-05-16', 2: '1930-04-03', 3: '1930-11-05'}}
df2 = pd.DataFrame(data2)

Step 3: Converting Data Types

Before we can perform any operations on these dataframes, we need to convert the data types to match the expected format. We will use pd.to_datetime to convert the birthdate column in df1 and the dates column in df2.

# Convert data types
df1['birthdate'] = pd.to_datetime(df1['birthdate'], format='%m/%d/%Y')
df2['dates'] = pd.to_datetime(df2['dates'], format='%Y-%m-%d')

Step 4: Extracting Digits from Ceremony Number

To match the ceremony_number column in df1 with the index column in df2, we need to extract the digits from the ceremony_number column. We will use str.extract to achieve this.

# Extract digits from ceremony number
num = df1['ceremony_number'].str.extract('^(\d+)', expand=False).astype(int)

Step 5: Mapping Ceremony Number to Date

Now that we have the digits extracted, we can map them to the corresponding dates in df2. We will use the map function to achieve this.

# Map ceremony number to date
df1['date_oscar'] = num.map(df2['dates'])

Step 6: Combining Code into a Single Function

Once we have finished all the steps, we can combine the code into a single function that does not take any arguments. This function will perform the necessary operations and return the combined dataframe.

def join_dataframes():
    # Create df1
    data1 = {'birthdate': {0: '9/30/1895', 1: '7/23/1884', 2: '3/29/1889',
                           3: '4/10/1868', 4: '4/8/1892'},
             'ceremony_number': {0: '1st', 1: '1st', 2: '2nd', 3: '3rd', 4: '2nd'}}
    df1 = pd.DataFrame(data1)

    # Create df2
    data2 = {'dates': {1: '1929-05-16', 2: '1930-04-03', 3: '1930-11-05'}}
    df2 = pd.DataFrame(data2)

    # Convert data types
    df1['birthdate'] = pd.to_datetime(df1['birthdate'], format='%m/%d/%Y')
    df2['dates'] = pd.to_datetime(df2['dates'], format='%Y-%m-%d')

    # Extract digits from ceremony number
    num = df1['ceremony_number'].str.extract('^(\d+)', expand=False).astype(int)

    # Map ceremony number to date
    df1['date_oscar'] = num.map(df2['dates'])

    return df1

# Call the function and print the result
result = join_dataframes()
print(result)

Step 7: Running the Function

Finally, we can run the function using the ipython tool to see the combined dataframe.

>>> result = join_dataframes()
   birthdate ceremony_number date_oscar
0 1895-09-30             1st 1929-05-16
1 1884-07-23             1st 1929-05-16
2 1889-03-29             2nd 1930-04-03
3 1868-04-10             3rd 1930-11-05
4 1892-04-08             2nd 1930-04-03

This is the final output of the function, which combines the data from both df1 and df2 based on the common column ceremony_number.


Last modified on 2023-07-28