Splitting Large Workbooks into Separate Excel Files Using Python Pandas

Splitting a Workbook into Different Workbooks with Worksheets Using Python Pandas

In this article, we will explore how to split a large workbook into separate workbooks for each year, with worksheets for each month. We will use Python and the pandas library to achieve this.

Background

When working with large datasets, it’s often necessary to break them down into smaller, more manageable chunks. This is especially true when working with Excel files, which can become unwieldy if not properly split.

In this article, we’ll discuss how to use Python pandas to create separate workbooks for each year, with worksheets for each month. We’ll cover the basics of using pandas, including data manipulation and Excel writing.

Table of Contents

Introduction

Pandas is a powerful Python library used for data manipulation and analysis. One of its most useful features is its ability to read and write Excel files.

In this article, we’ll explore how to use pandas to split a large workbook into separate workbooks for each year, with worksheets for each month. We’ll cover the basics of using pandas, including data manipulation and Excel writing.

Using Pandas to Split a Workbook

Loading the Data

Before we can start splitting our workbook, we need to load it into pandas. This is done by calling pd.read_excel(), which takes the path to your Excel file as an argument.

import pandas as pd

# Location of the file you want to work on
file1 = '.\money.xlsx'

# Make it a dataframe
df1 = pd.read_excel(file1)

Splitting the Date Column

Next, we need to split our date column into separate columns for year, month, and day. This is done by calling str.split(), which takes a string as an argument.

# Create 3 columns from splitting the date column
df1[["month", "day", "year"]] = df1["Date"].str.split("/", expand=True)

Creating Separate Workbooks for Each Year

Now that we have our data split, we can start creating separate workbooks for each year. This is done by iterating over unique values in the year column and using ExcelWriter.

for value in df1['year'].unique():
    df00 = df1[df1['year'] == value]
    output_file_name = str(value) + 'money.xlsx'
    writer = pd.ExcelWriter(output_file_name, engine='xlsxwriter')
    
    # Get unique months for this year
    each_month = df00['month'].unique()
    
    for month in each_month:
        df00[month].to_excel(writer, sheet_name=str(month), index=False)
    
    writer.save()

print('DataFrame written to Excel File successfully.')

This code creates a new workbook for each unique year and writes the data for that year to separate worksheets.

Example Code

Here’s an example of how you can use this code:

import pandas as pd

# Load the data
file1 = '.\money.xlsx'
df1 = pd.read_excel(file1)

# Split the date column
df1[["month", "day", "year"]] = df1["Date"].str.split("/", expand=True)

# Create separate workbooks for each year
for value in df1['year'].unique():
    df00 = df1[df1['year'] == value]
    output_file_name = str(value) + 'money.xlsx'
    
    # Get unique months for this year
    each_month = df00['month'].unique()
    
    writer = pd.ExcelWriter(output_file_name, engine='xlsxwriter')
    
    for month in each_month:
        df00[month].to_excel(writer, sheet_name=str(month), index=False)
    
    writer.save()

print('DataFrame written to Excel File successfully.')

This code assumes that your data is stored in a file called money.xlsx in the same directory as your Python script.

Conclusion

Splitting a large workbook into separate workbooks for each year, with worksheets for each month, can be a daunting task. But with the help of pandas and Excel writing, it’s easier than you think.

In this article, we covered the basics of using pandas to split a workbook and create separate workbooks for each year. We also provided an example code snippet that demonstrates how to use this technique.

We hope this helps you in your data manipulation and analysis tasks!


Last modified on 2023-09-16