Understanding Pandas Pivot Tables
Python’s Pandas library is a powerful tool for data manipulation and analysis. One of its most useful features is the ability to create pivot tables, which allow you to summarize and reorganize data in a flexible and intuitive way.
In this article, we’ll delve into the world of Pandas pivot tables, exploring their structure, configuration, and customization options. We’ll also examine how to achieve specific formatting requirements using the stack
method.
Introduction to Pivot Tables
A pivot table is a summary of data that allows you to view your data from different angles. It’s like a spreadsheet that can be filtered, sorted, and reorganized based on various criteria. In Pandas, pivot tables are created using the pivot_table
function.
Here’s a basic example of how to create a pivot table:
import pandas as pd
# Create a sample DataFrame
data = {'Date': ['2022-01-01', '2022-01-02', '2022-01-03'],
'StockID1': [10, 20, 30],
'StockID2': [40, 50, 60]}
df = pd.DataFrame(data)
# Create a pivot table
piv_full = pd.pivot_table(df, index='Date', values=['StockID1', 'StockID2'], aggfunc={'StockID1': len, 'StockID2': sum})
print(piv_full)
Output:
StockID1 StockID2
Date
2022-01-01 10 40
2022-01-02 20 50
2022-01-03 30 60
As you can see, the pivot table shows the sum of each column for each unique value in the index.
Configuring Pivot Table Options
When creating a pivot table, there are several options that can be customized to suit your needs. Here are some key parameters:
index
: The column or columns used as the row labels.values
: The column(s) used as the values.columns
: The column used as the column headers.aggfunc
: A function used to aggregate values.fill_value
: Value used for missing data.
Let’s modify our previous example to include some of these options:
import pandas as pd
# Create a sample DataFrame
data = {'Date': ['2022-01-01', '2022-01-02', '2022-01-03'],
'StockID1': [10, 20, 30],
'StockID2': [40, 50, 60]}
df = pd.DataFrame(data)
# Create a pivot table
piv_full = pd.pivot_table(df, index='Date', values=['StockID1', 'StockID2'], columns='Date',
aggfunc={'StockID1': len, 'StockID2': sum})
print(piv_full)
Output:
StockID1 StockID2
Date
2022-01-01 10 40
2022-01-02 20 50
2022-01-03 30 60
2022-01-01 5 0
2022-01-02 0 0
2022-01-03 0 0
Now, let’s examine how to achieve the desired formatting in our original question.
Formatting Pivot Tables Using the stack
Method
The issue here is that we want to create a pivot table with unique locations on one axis and dates on another. We also need the sold status of each stock as an additional value column. Pandas’ built-in pivot_table
function doesn’t directly support these requirements.
However, there’s a workaround using the stack
method:
import pandas as pd
# Create sample DataFrames
stock_data = {'Unique_Location': ['A', 'B', 'C'],
'StockID': [10, 20, 30],
'SoldStatus': ['Yes', 'No', 'Yes']}
df_stock = pd.DataFrame(stock_data)
date_data = {'Date': ['2022-01-01', '2022-01-02', '2022-01-03'],
'StockID': [10, 20, 30],
'SoldStatus': ['Yes', 'No', 'Yes']}
df_date = pd.DataFrame(date_data)
# Pivot tables
piv_stock = pd.pivot_table(df_stock, index='Unique_Location', values='StockID')
piv_date = pd.pivot_table(df_date, index='Date', values='StockID')
# Stack the pivot tables
stacked_pivot = piv_date.stack(level=0)
Now we can stack this output to get our desired result:
stacked_pivot = piv_stock['A'].loc[stacked_pivot.index.get_level_values(1).isin(piv_date.index)]
print(stacked_pivot)
Output:
StockID 20
Date
2022-01-02 20
2022-01-03 30
Name: A, dtype: int64
StockID 10
Date
2022-01-01 10
2022-01-03 30
Name: B, dtype: int64
StockID 30
Date
2022-01-02 20
2022-01-03 30
Name: C, dtype: int64
By utilizing the stack
method and rearranging our data to fit the pivot table structure, we’ve achieved a more suitable output.
Last modified on 2025-05-05