MultiIndex/Reshaping differences between Pandas versions
Introduction to Pandas and MultiIndex
The pandas
library is a powerful data analysis tool in Python, widely used for handling structured data, including tabular data such as spreadsheets and SQL tables. One of the key features of pandas is its support for multi-level indexing (MultiIndex), which allows users to assign multiple levels of labels to rows and columns.
In this article, we will explore how changes in Pandas versions can affect MultiIndex/reshaping functionality. Specifically, we will examine the differences between Pandas 0.19.2 and 0.20.1 when it comes to reshaping DataFrames with MultiIndex.
Setting Up the Environment
To demonstrate these differences, let’s start by setting up a basic environment using pandas version 0.19.2:
import pandas as pd
import numpy as np
index = pd.DatetimeIndex(['2017-05-04', '2017-05-05', '2017-05-08', '2017-05-09',
'2017-05-10'], dtype='datetime64[ns]',name = 'date', freq='B')
columns = pd.MultiIndex(levels=[['HSBA LN Equity', 'UCG IM Equity', 'ISP IM Equity'], ['LAST PRICE', 'HIGH', 'LOW']],
labels=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]])
data = np.array([[ 663.8, 672.5, 661.1, 15.97, 16.02, 15.49, 2.76, 2.768, 2.694],
[ 658.6, 663.9, 656.0, 16.22, 16.48, 15.77, 2.842, 2.868, 2.77 ],
[ 660.6, 664.1, 658.9, 16.01, 16.49, 15.94, 2.852, 2.898, 2.826],
[ 664.9, 669.2, 662.5, 15.90, 16.41, 15.90, 2.848, 2.898, 2.842],
[ 670.9, 673.4, 663.8, 16.09, 16.15, 15.59, 2.85, 2.888, 2.802]])
df = pd.DataFrame(data, columns=columns, index = index)
Stack() Functionality
One of the most powerful reshaping tools in pandas is stack()
, which allows users to reshape DataFrames with MultiIndex into a single-level index.
In Pandas version 0.19.2, the stack()
function treats each level of the MultiIndex as a separate column when reshaping:
df_stacked = df.stack(0)
This results in a DataFrame where each column corresponds to a level of the original MultiIndex:
HSBA LN Equity UCG IM Equity \
LAST PRICE HIGH LOW LAST PRICE HIGH LOW
date
2017-05-04 663.8 672.5 661.1 15.97 16.02 15.49
2017-05-05 658.6 663.9 656.0 16.22 16.48 15.77
2017-05-08 660.6 664.1 658.9 16.01 16.49 15.94
2017-05-09 664.9 669.2 662.5 15.90 16.41 15.90
2017-05-10 670.9 673.4 663.8 16.09 16.15 15.59
ISP IM Equity
LAST PRICE HIGH LOW
date
2017-05-04 2.760 2.768 2.694
2017-05-05 2.842 2.868 2.770
2017-05-08 2.852 2.898 2.826
2017-05-09 2.848 2.898 2.842
2017-05-10 2.850 2.888 2.802
Changes in Pandas Version 0.20.1
In contrast, Pandas version 0.20.1 introduces changes to the stack()
function that affect MultiIndex/reshaping behavior.
Specifically, Pandas 0.20.1 treats each level of the MultiIndex as a separate dimension when reshaping, rather than treating them as columns:
df_stacked_2019 = df.stack(0)
This results in a DataFrame where each column corresponds to a level of the original MultiIndex, but with the levels treated as dimensions rather than columns:
HIGH LAST PRICE LOW
date
2017-05-04 HSBA LN Equity 672.500 663.800 661.100
UCG IM Equity 2.768 2.760 2.694
ISP IM Equity 16.020 15.970 15.490
2017-05-05 HSBA LN Equity 663.900 658.600 656.000
UCG IM Equity 2.868 2.842 2.770
ISP IM Equity 16.480 16.220 15.770
2017-05-08 HSBA LN Equity 664.100 660.600 658.900
UCG IM Equity 2.898 2.852 2.826
ISP IM Equity 16.490 16.010 15.940
2017-05-09 HSBA LN Equity 669.200 664.900 662.500
UCG IM Equity 2.898 2.848 2.842
ISP IM Equity 16.410 15.900 15.900
2017-05-10 HSBA LN Equity 673.400 670.900 663.800
UCG IM Equity 2.888 2.850 2.802
ISP IM Equity 16.150 16.090 15.590
Conclusion
In this article, we explored how changes in Pandas versions can affect MultiIndex/reshaping functionality.
Specifically, we examined the differences between Pandas 0.19.2 and 0.20.1 when it comes to reshaping DataFrames with MultiIndex using stack()
.
While Pandas 0.19.2 treats each level of the MultiIndex as a separate column when reshaping, Pandas 0.20.1 introduces changes that treat each level as a separate dimension.
Understanding these differences is crucial for effective data manipulation and analysis in pandas.
Last modified on 2023-10-01