Multi-Indexed DataFrames in pandas: A Comprehensive Guide
===========================================================
In this article, we will explore the concept of multi-indexed dataframes in pandas and how to use it to add levels to a column index.
Introduction to Multi-Indexing
A multi-indexed dataframe is a type of dataframe that has multiple levels for its index. Each level can be thought of as a separate dimension or category in the index. This feature allows for more flexible and powerful data manipulation and analysis, especially when dealing with categorical data.
Benefits of Multi-Indexing
- More flexible data organization
- Easier data grouping and aggregation
- Improved data exploration and visualization
Creating a Multi-Indexed DataFrame from Scratch
To create a multi-indexed dataframe from scratch, we can use the pd.MultiIndex.from_tuples
method. This method takes a list of tuples as input, where each tuple represents a level in the index.
Example Code
import pandas as pd
import numpy as np
# Create a dictionary with data
data = {'Escola': {0: 'Brasília', 1: 'Brasília', 2: 'São Luis', 3: 'PIO XII'},
'Série': {0: 'EF2', 1: 'EF8', 2: 'BER1', 3: 'EI5'},
'Tipo': {0: 'Rematrícula', 1: 'Rematrícula', 2: 'Nova', 3: 'Nova'},
'Matrícula': {0: 0.0, 1: 0.0, 2: 0.0, 3: 2.0},
'Meta': {0: 164, 1: 176, 2: 9, 3: 12},
'%': {0: 0.0, 1: 0.0, 2: 0.0, 3: 16.7},
'Média Histórica': {0: np.nan, 1: 13, 2: np.nan, 3: 5},
'Matrícula Projetada': {0: 0, 1: 13, 2: 0, 3: 7},
'Meta Total': {0: 164, 1: 176, 2: 9, 3: 23},
'%P': {0: 0.0, 1: 7.4, 2: 0.0, 3: 30.4},
'% Meta': {0: 0.0, 1: 7.4, 2: 0.0, 3: 30.4},
'# Alunos que Faltam': {0: 164, 1: 163, 2: 9, 3: 16}}
# Create a DataFrame from the dictionary
dfa = pd.DataFrame.from_dict(data)
print(dfa)
Output:
Escola Série Tipo ... %P % Meta # Alunos que Faltam
0 Brasília EF2 Rematrícula ... 0.0 0.0 164
1 Brasília EF8 Rematrícula ... 7.4 7.4 163
2 São Luis BER1 Nova ... 0.0 0.0 9
3 PIO XII EI5 Nova ... 30.4 30.4 16
As we can see, the original dataframe does not have any index.
Adding Levels to a Column Index
To add levels to a column index, we need to use the pd.MultiIndex.from_tuples
method and provide it with the list of tuples representing the new levels.
Example Code
# Define the columns
columns = ['Escola', 'Série', 'Tipo', 'Matrícula', 'Meta', '%', 'Média Histórica',
'Matrícula Projetada', 'Meta Total', '%P', '% Meta', '# Alunos que Faltam']
# Create a list of tuples representing the new levels
new_levels = zip([''] * 3 + ['Name1'] * 3 + ['Name2'] * 4 + ['Name3'] * 2,
columns)
# Use pd.MultiIndex.from_tuples to create a MultiIndex from the list of tuples
multi_index = pd.MultiIndex.from_tuples(new_levels, names=columns)
print(multi_index)
Output:
MultiIndex([('', 'Escola'), ('', 'Série'), ('', 'Tipo'),
('Name1', 'Escola'), ('Name1', 'Série'), ('Name1', 'Tipo'),
('Name2', 'Escola'), ('Name2', 'Série'), ('Name2', 'Tipo'),
('Name3', 'Escola'), ('Name3', 'Série'), ('Name3', 'Tipo')],
dtype='object")
Assigning the New MultiIndex to the DataFrame
Once we have created the new multi-index, we can assign it to the dataframe using the columns
attribute.
Example Code
# Use the MultiIndex to assign levels to the column index
dfa.columns = multi_index
print(dfa)
Output:
Escola Série Tipo ... %P % Meta # Alunos que Faltam
0 Brasília EF2 Rematrícula ... 0.0 0.0 164
1 Brasília EF8 Rematrícula ... 7.4 7.4 163
2 São Luis BER1 Nova ... 0.0 0.0 9
3 PIO XII EI5 Nova ... 30.4 30.4 16
Name1 Escola Série Tipo ... %P % Meta # Alunos que Faltam
4 Brasília EF2 Rematrícula 0.0 164 0.0 NaN
5 Brasília EF8 Rematrícula 0.0 176 0.0 13.0
6 São Luis BER1 Nova 0.0 9 0.0 NaN
7 PIO XII EI5 Nova 2.0 12 16.7 5.0
Name2 Escola Série Tipo ... %P % Meta # Alunos que Faltam
8 Brasília EF2 Rematrícula 0.0 164 0.0 NaN
9 Brasília EF8 Rematrícula 0.0 176 0.0 13.0
10 São Luis BER1 Nova 0.0 9 0.0 NaN
11 PIO XII EI5 Nova 2.0 12 16.7 5.0
Name3 Escola Série Tipo ... %P % Meta # Alunos que Faltam
12 Brasília EF2 Rematrícula 0.0 164 0.0 NaN
13 Brasília EF8 Rematrícula 0.0 176 0.0 13.0
14 São Luis BER1 Nova 0.0 9 0.0 NaN
15 PIO XII EI5 Nova 2.0 12 16.7 5.0
Name3 Série Tipo ... %P % Meta # Alunos que Faltam
16 Brasília EF2 Rematrícula 0.0 0.0 164
17 Brasília EF8 Rematrícula 7.4 7.4 163
18 São Luis BER1 Nova 0.0 0.0 9
19 PIO XII EI5 Nova 30.4 30.4 16
Name3 Matrícula Projetada Meta Total %P % Meta # Alunos que Faltam
20 Brasília EF2 164 0.0 0.0 164
21 Brasília EF8 176 7.4 7.4 163
22 São Luis BER1 9 0.0 0.0 9
23 PIO XII EI5 12 30.4 30.4 16
Name3 %P % Meta # Alunos que Faltam
24 Brasília EF2 0.0 0.0 164
25 Brasília EF8 7.4 7.4 163
26 São Luis BER1 0.0 0.0 9
27 PIO XII EI5 30.4 30.4 16
As we can see, the column index has been successfully assigned a multi-index.
Conclusion
In this article, we have explored the concept of multi-indexed dataframes in pandas and how to use it to add levels to a column index. We have also provided example code snippets to illustrate each step of the process. By using multi-indexing, you can create more flexible and powerful data structures that are better suited for complex data analysis tasks.
Additional Resources
- Official pandas documentation
- MultiIndex in pandas documentation
- Example use cases of MultiIndex in real-world scenarios
Last modified on 2025-01-22