Changes in Pandas Version 0.20.1: What You Need to Know About MultiIndex Reshaping

MultiIndex/Reshaping differences between Pandas versions

Introduction to Pandas and MultiIndex

The pandas library is a powerful data analysis tool in Python, widely used for handling structured data, including tabular data such as spreadsheets and SQL tables. One of the key features of pandas is its support for multi-level indexing (MultiIndex), which allows users to assign multiple levels of labels to rows and columns.

In this article, we will explore how changes in Pandas versions can affect MultiIndex/reshaping functionality. Specifically, we will examine the differences between Pandas 0.19.2 and 0.20.1 when it comes to reshaping DataFrames with MultiIndex.

Setting Up the Environment

To demonstrate these differences, let’s start by setting up a basic environment using pandas version 0.19.2:

import pandas as pd
import numpy as np


index = pd.DatetimeIndex(['2017-05-04', '2017-05-05', '2017-05-08', '2017-05-09',
           '2017-05-10'], dtype='datetime64[ns]',name = 'date', freq='B')
columns = pd.MultiIndex(levels=[['HSBA LN Equity', 'UCG IM Equity', 'ISP IM Equity'], ['LAST PRICE', 'HIGH', 'LOW']],
       labels=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]])
data = np.array([[ 663.8, 672.5, 661.1, 15.97, 16.02, 15.49, 2.76, 2.768, 2.694],
          [ 658.6, 663.9, 656.0, 16.22, 16.48, 15.77, 2.842, 2.868, 2.77 ],
          [ 660.6, 664.1, 658.9, 16.01, 16.49, 15.94, 2.852, 2.898, 2.826],
          [ 664.9, 669.2, 662.5, 15.90, 16.41, 15.90, 2.848, 2.898, 2.842],
          [ 670.9, 673.4, 663.8, 16.09, 16.15, 15.59, 2.85,  2.888, 2.802]])
df = pd.DataFrame(data, columns=columns, index = index)

Stack() Functionality

One of the most powerful reshaping tools in pandas is stack(), which allows users to reshape DataFrames with MultiIndex into a single-level index.

In Pandas version 0.19.2, the stack() function treats each level of the MultiIndex as a separate column when reshaping:

df_stacked = df.stack(0)

This results in a DataFrame where each column corresponds to a level of the original MultiIndex:

            HSBA LN Equity               UCG IM Equity                \
            LAST PRICE   HIGH    LOW    LAST PRICE   HIGH    LOW   
date                                                                  
2017-05-04          663.8  672.5  661.1         15.97  16.02  15.49   
2017-05-05          658.6  663.9  656.0         16.22  16.48  15.77   
2017-05-08          660.6  664.1  658.9         16.01  16.49  15.94   
2017-05-09          664.9  669.2  662.5         15.90  16.41  15.90   
2017-05-10          670.9  673.4  663.8         16.09  16.15  15.59   

       ISP IM Equity                
          LAST PRICE   HIGH    LOW  
date                                    
2017-05-04         2.760  2.768  2.694  
2017-05-05         2.842  2.868  2.770  
2017-05-08         2.852  2.898  2.826  
2017-05-09         2.848  2.898  2.842  
2017-05-10         2.850  2.888  2.802  

Changes in Pandas Version 0.20.1

In contrast, Pandas version 0.20.1 introduces changes to the stack() function that affect MultiIndex/reshaping behavior.

Specifically, Pandas 0.20.1 treats each level of the MultiIndex as a separate dimension when reshaping, rather than treating them as columns:

df_stacked_2019 = df.stack(0)

This results in a DataFrame where each column corresponds to a level of the original MultiIndex, but with the levels treated as dimensions rather than columns:

                              HIGH  LAST PRICE      LOW
date                                                   
2017-05-04 HSBA LN Equity  672.500     663.800  661.100
           UCG IM Equity     2.768       2.760    2.694
           ISP IM Equity    16.020      15.970   15.490
2017-05-05 HSBA LN Equity  663.900     658.600  656.000
           UCG IM Equity     2.868       2.842    2.770
           ISP IM Equity    16.480      16.220   15.770
2017-05-08 HSBA LN Equity  664.100     660.600  658.900
           UCG IM Equity     2.898       2.852    2.826
           ISP IM Equity    16.490      16.010   15.940
2017-05-09 HSBA LN Equity  669.200     664.900  662.500
           UCG IM Equity     2.898       2.848    2.842
           ISP IM Equity    16.410      15.900   15.900
2017-05-10 HSBA LN Equity  673.400     670.900  663.800
           UCG IM Equity     2.888       2.850    2.802
           ISP IM Equity    16.150      16.090   15.590

Conclusion

In this article, we explored how changes in Pandas versions can affect MultiIndex/reshaping functionality.

Specifically, we examined the differences between Pandas 0.19.2 and 0.20.1 when it comes to reshaping DataFrames with MultiIndex using stack().

While Pandas 0.19.2 treats each level of the MultiIndex as a separate column when reshaping, Pandas 0.20.1 introduces changes that treat each level as a separate dimension.

Understanding these differences is crucial for effective data manipulation and analysis in pandas.


Last modified on 2023-10-01