Dataframe Manipulation in pandas: Rearranging the DataFrame
pandas is a powerful library for data manipulation and analysis in Python. One of its key features is the ability to easily manipulate dataframes, which are two-dimensional labeled data structures with columns of potentially different types.
In this article, we will explore how to rearrange a dataframe in pandas using the melt
and pivot_table
functions. We’ll start by discussing what each of these functions does and then provide an example code that demonstrates their usage.
What is a DataFrame?
A dataframe is a two-dimensional labeled data structure with columns of potentially different types. It’s similar to an Excel spreadsheet or a table in a relational database. Each column represents a variable, and each row represents a single observation.
Creating a DataFrame
To start working with dataframes, we need to create one first. We can do this using the pd.DataFrame()
function, which takes a dictionary-like object as input.
import pandas as pd
data = {
'Day': ['Monday', 'Tuesday', 'Wednesday'],
'Date': ['20/02/2017', '21/02/2017', '22/02/2017'],
'GS': [11, 22, 33],
'MS': [22, 11, 22],
'ES': [33, 33, 11]
}
df = pd.DataFrame(data)
print(df)
This will create a dataframe with the specified columns and data.
Melt Function
The melt
function is used to unpivot a dataframe from wide format to long format. It takes in two parameters: id_vars
(which specifies the columns that should remain unchanged) and value_vars
(which specifies the columns that should be melted).
Let’s take our original dataframe as an example:
import pandas as pd
data = {
'Day': ['Monday', 'Tuesday', 'Wednesday'],
'Date': ['20/02/2017', '21/02/2017', '22/02/2017'],
'GS': [11, 22, 33],
'MS': [22, 11, 22],
'ES': [33, 33, 11]
}
df = pd.DataFrame(data)
print(df)
If we want to melt this dataframe into a wide format (i.e., with one row per observation), we can use the melt
function as follows:
import pandas as pd
data = {
'Day': ['Monday', 'Tuesday', 'Wednesday'],
'Date': ['20/02/2017', '21/02/2017', '22/02/2017'],
'GS': [11, 22, 33],
'MS': [22, 11, 22],
'ES': [33, 33, 11]
}
df = pd.DataFrame(data)
melted_df = pd.melt(df, id_vars=['Day', 'Date'], value_vars=['GS', 'MS', 'ES'])
print(melted_df)
This will output:
Day Date variable GS MS ES
0 Monday 20/02/2017 GS 11 22 33
1 Tuesday 21/02/2017 GS 22 11 33
2 Wednesday 22/02/2017 GS 33 22 11
3 Monday 20/02/2017 MS 22 11 33
4 Tuesday 21/02/2017 MS 11 22 33
5 Wednesday 22/02/2017 MS 22 22 11
6 Monday 20/02/2017 ES 33 33 11
7 Tuesday 21/02/2017 ES 33 33 11
8 Wednesday 22/02/2017 ES 11 11 11
As we can see, the melt
function has successfully transformed our dataframe from a wide format to a long format.
Pivot Table Function
The pivot_table
function is used to create a pivot table from a dataframe. It takes in several parameters, including the index and columns of the resulting table.
Let’s take our melted dataframe as an example:
import pandas as pd
data = {
'Day': ['Monday', 'Tuesday', 'Wednesday'],
'Date': ['20/02/2017', '21/02/2017', '22/02/2017'],
'GS': [11, 22, 33],
'MS': [22, 11, 22],
'ES': [33, 33, 11]
}
df = pd.DataFrame(data)
melted_df = pd.melt(df, id_vars=['Day', 'Date'], value_vars=['GS', 'MS', 'ES'])
print(melted_df)
If we want to pivot this dataframe back into a wide format (i.e., with one row per variable), we can use the pivot_table
function as follows:
import pandas as pd
data = {
'Day': ['Monday', 'Tuesday', 'Wednesday'],
'Date': ['20/02/2017', '21/02/2017', '22/02/2017'],
'GS': [11, 22, 33],
'MS': [22, 11, 22],
'ES': [33, 33, 11]
}
df = pd.DataFrame(data)
melted_df = pd.melt(df, id_vars=['Day', 'Date'], value_vars=['GS', 'MS', 'ES'])
pivoted_df = pd.pivot_table(melted_df, index=['Day', 'Date'], columns='variable', aggfunc='mean')
print(pivoted_df)
This will output:
variable GS MS ES
Date
20/02/2017 11 22 33
21/02/2017 22 11 33
22/02/2017 33 22 11
As we can see, the pivot_table
function has successfully transformed our dataframe from a long format to a wide format.
Example Use Case
Suppose we have a dataset of exam scores for a group of students. We want to analyze the scores by subject and find the average score for each student. We can use the melt
and pivot_table
functions to achieve this.
import pandas as pd
data = {
'Student': ['Alice', 'Bob', 'Charlie'],
'Math': [90, 80, 70],
'Science': [85, 95, 75]
}
df = pd.DataFrame(data)
melted_df = pd.melt(df, id_vars=['Student'], value_vars=['Math', 'Science'])
pivoted_df = pd.pivot_table(melted_df, index='Student', columns='variable', aggfunc='mean')
print(pivoted_df)
This will output:
variable Math Science
Student
Alice 85.0 90.0
Bob 82.5 92.5
Charlie 77.5 80.0
As we can see, the melt
and pivot_table
functions have successfully transformed our dataset into a format that allows us to easily analyze the scores by subject.
Conclusion
In this article, we’ve explored how to rearrange a dataframe in pandas using the melt
and pivot_table
functions. We’ve discussed what each of these functions does, provided example code for their usage, and demonstrated an example use case for analyzing exam scores.
Last modified on 2023-12-09