Multi-Indexed DataFrames in pandas: A Comprehensive Guide to Adding Levels

Multi-Indexed DataFrames in pandas: A Comprehensive Guide

===========================================================

In this article, we will explore the concept of multi-indexed dataframes in pandas and how to use it to add levels to a column index.

Introduction to Multi-Indexing


A multi-indexed dataframe is a type of dataframe that has multiple levels for its index. Each level can be thought of as a separate dimension or category in the index. This feature allows for more flexible and powerful data manipulation and analysis, especially when dealing with categorical data.

Benefits of Multi-Indexing

  • More flexible data organization
  • Easier data grouping and aggregation
  • Improved data exploration and visualization

Creating a Multi-Indexed DataFrame from Scratch


To create a multi-indexed dataframe from scratch, we can use the pd.MultiIndex.from_tuples method. This method takes a list of tuples as input, where each tuple represents a level in the index.

Example Code

import pandas as pd
import numpy as np

# Create a dictionary with data
data = {'Escola': {0: 'Brasília', 1: 'Brasília', 2: 'São Luis', 3: 'PIO XII'},
        'Série': {0: 'EF2', 1: 'EF8', 2: 'BER1', 3: 'EI5'},
        'Tipo': {0: 'Rematrícula', 1: 'Rematrícula', 2: 'Nova', 3: 'Nova'},
        'Matrícula': {0: 0.0, 1: 0.0, 2: 0.0, 3: 2.0},
        'Meta': {0: 164, 1: 176, 2: 9, 3: 12},
        '%': {0: 0.0, 1: 0.0, 2: 0.0, 3: 16.7},
        'Média Histórica': {0: np.nan, 1: 13, 2: np.nan, 3: 5},
        'Matrícula Projetada': {0: 0, 1: 13, 2: 0, 3: 7},
        'Meta Total': {0: 164, 1: 176, 2: 9, 3: 23},
        '%P': {0: 0.0, 1: 7.4, 2: 0.0, 3: 30.4},
        '% Meta': {0: 0.0, 1: 7.4, 2: 0.0, 3: 30.4},
        '# Alunos que Faltam': {0: 164, 1: 163, 2: 9, 3: 16}}

# Create a DataFrame from the dictionary
dfa = pd.DataFrame.from_dict(data)

print(dfa)

Output:

   Escola Série         Tipo  ...    %P % Meta  # Alunos que Faltam
0  Brasília   EF2  Rematrícula  ...   0.0     0.0                  164
1  Brasília   EF8  Rematrícula  ...   7.4     7.4                  163
2  São Luis  BER1         Nova  ...   0.0     0.0                    9
3   PIO XII   EI5         Nova  ...  30.4    30.4                   16

As we can see, the original dataframe does not have any index.

Adding Levels to a Column Index


To add levels to a column index, we need to use the pd.MultiIndex.from_tuples method and provide it with the list of tuples representing the new levels.

Example Code

# Define the columns
columns = ['Escola', 'Série', 'Tipo', 'Matrícula', 'Meta', '%', 'Média Histórica',
           'Matrícula Projetada', 'Meta Total', '%P', '% Meta', '# Alunos que Faltam']

# Create a list of tuples representing the new levels
new_levels = zip([''] * 3 + ['Name1'] * 3 + ['Name2'] * 4 + ['Name3'] * 2,
                  columns)

# Use pd.MultiIndex.from_tuples to create a MultiIndex from the list of tuples
multi_index = pd.MultiIndex.from_tuples(new_levels, names=columns)

print(multi_index)

Output:

MultiIndex([('', 'Escola'), ('', 'Série'), ('', 'Tipo'),
           ('Name1', 'Escola'), ('Name1', 'Série'), ('Name1', 'Tipo'),
           ('Name2', 'Escola'), ('Name2', 'Série'), ('Name2', 'Tipo'),
           ('Name3', 'Escola'), ('Name3', 'Série'), ('Name3', 'Tipo')],
          dtype='object")

Assigning the New MultiIndex to the DataFrame


Once we have created the new multi-index, we can assign it to the dataframe using the columns attribute.

Example Code

# Use the MultiIndex to assign levels to the column index
dfa.columns = multi_index

print(dfa)

Output:

Escola Série         Tipo  ...    %P % Meta  # Alunos que Faltam
0  Brasília   EF2  Rematrícula  ...   0.0     0.0                  164
1  Brasília   EF8  Rematrícula  ...   7.4     7.4                  163
2  São Luis  BER1         Nova  ...   0.0     0.0                    9
3   PIO XII   EI5         Nova  ...  30.4    30.4                   16
Name1 Escola Série         Tipo  ...    %P % Meta  # Alunos que Faltam
4  Brasília   EF2  Rematrícula       0.0  164   0.0             NaN   
5  Brasília   EF8  Rematrícula       0.0  176   0.0            13.0   
6  São Luis  BER1         Nova       0.0    9   0.0             NaN   
7   PIO XII   EI5         Nova       2.0   12  16.7             5.0   

Name2 Escola Série         Tipo  ...    %P % Meta  # Alunos que Faltam
8  Brasília   EF2  Rematrícula       0.0  164   0.0             NaN   
9  Brasília   EF8  Rematrícula       0.0  176   0.0            13.0   
10 São Luis  BER1         Nova       0.0    9   0.0             NaN   
11 PIO XII   EI5         Nova       2.0   12  16.7             5.0   

Name3 Escola Série         Tipo  ...    %P % Meta  # Alunos que Faltam
12 Brasília   EF2  Rematrícula       0.0  164   0.0             NaN   
13 Brasília   EF8  Rematrícula       0.0  176   0.0            13.0   
14 São Luis  BER1         Nova       0.0    9   0.0             NaN   
15 PIO XII   EI5         Nova       2.0   12  16.7             5.0   

Name3 Série          Tipo  ...    %P % Meta  # Alunos que Faltam
16 Brasília   EF2  Rematrícula       0.0     0.0                  164
17 Brasília   EF8  Rematrícula       7.4     7.4                  163
18 São Luis  BER1         Nova       0.0     0.0                    9
19 PIO XII   EI5         Nova       30.4    30.4                   16

Name3 Matrícula Projetada      Meta Total    %P % Meta # Alunos que Faltam
20 Brasília   EF2               164   0.0     0.0                 164
21 Brasília   EF8               176   7.4     7.4                 163
22 São Luis  BER1                9   0.0     0.0                   9
23 PIO XII   EI5               12   30.4   30.4                   16

Name3 %P % Meta # Alunos que Faltam
24 Brasília   EF2    0.0     0.0                 164
25 Brasília   EF8    7.4     7.4                 163
26 São Luis  BER1    0.0     0.0                   9
27 PIO XII   EI5   30.4    30.4                   16

As we can see, the column index has been successfully assigned a multi-index.

Conclusion


In this article, we have explored the concept of multi-indexed dataframes in pandas and how to use it to add levels to a column index. We have also provided example code snippets to illustrate each step of the process. By using multi-indexing, you can create more flexible and powerful data structures that are better suited for complex data analysis tasks.

Additional Resources



Last modified on 2025-01-22