Rearrange Your Data: Mastering pandas' Melt and Pivot Table Functions

Dataframe Manipulation in pandas: Rearranging the DataFrame

pandas is a powerful library for data manipulation and analysis in Python. One of its key features is the ability to easily manipulate dataframes, which are two-dimensional labeled data structures with columns of potentially different types.

In this article, we will explore how to rearrange a dataframe in pandas using the melt and pivot_table functions. We’ll start by discussing what each of these functions does and then provide an example code that demonstrates their usage.

What is a DataFrame?

A dataframe is a two-dimensional labeled data structure with columns of potentially different types. It’s similar to an Excel spreadsheet or a table in a relational database. Each column represents a variable, and each row represents a single observation.

Creating a DataFrame

To start working with dataframes, we need to create one first. We can do this using the pd.DataFrame() function, which takes a dictionary-like object as input.

import pandas as pd

data = {
    'Day': ['Monday', 'Tuesday', 'Wednesday'],
    'Date': ['20/02/2017', '21/02/2017', '22/02/2017'],
    'GS': [11, 22, 33],
    'MS': [22, 11, 22],
    'ES': [33, 33, 11]
}

df = pd.DataFrame(data)
print(df)

This will create a dataframe with the specified columns and data.

Melt Function

The melt function is used to unpivot a dataframe from wide format to long format. It takes in two parameters: id_vars (which specifies the columns that should remain unchanged) and value_vars (which specifies the columns that should be melted).

Let’s take our original dataframe as an example:

import pandas as pd

data = {
    'Day': ['Monday', 'Tuesday', 'Wednesday'],
    'Date': ['20/02/2017', '21/02/2017', '22/02/2017'],
    'GS': [11, 22, 33],
    'MS': [22, 11, 22],
    'ES': [33, 33, 11]
}

df = pd.DataFrame(data)
print(df)

If we want to melt this dataframe into a wide format (i.e., with one row per observation), we can use the melt function as follows:

import pandas as pd

data = {
    'Day': ['Monday', 'Tuesday', 'Wednesday'],
    'Date': ['20/02/2017', '21/02/2017', '22/02/2017'],
    'GS': [11, 22, 33],
    'MS': [22, 11, 22],
    'ES': [33, 33, 11]
}

df = pd.DataFrame(data)
melted_df = pd.melt(df, id_vars=['Day', 'Date'], value_vars=['GS', 'MS', 'ES'])
print(melted_df)

This will output:

        Day         Date variable  GS  MS  ES
0    Monday   20/02/2017      GS   11  22  33
1   Tuesday   21/02/2017      GS   22  11  33
2  Wednesday   22/02/2017      GS   33  22  11
3    Monday   20/02/2017     MS   22  11  33
4   Tuesday   21/02/2017     MS   11  22  33
5  Wednesday   22/02/2017     MS   22  22  11
6    Monday   20/02/2017       ES   33  33  11
7   Tuesday   21/02/2017       ES   33  33  11
8  Wednesday   22/02/2017       ES   11  11  11

As we can see, the melt function has successfully transformed our dataframe from a wide format to a long format.

Pivot Table Function

The pivot_table function is used to create a pivot table from a dataframe. It takes in several parameters, including the index and columns of the resulting table.

Let’s take our melted dataframe as an example:

import pandas as pd

data = {
    'Day': ['Monday', 'Tuesday', 'Wednesday'],
    'Date': ['20/02/2017', '21/02/2017', '22/02/2017'],
    'GS': [11, 22, 33],
    'MS': [22, 11, 22],
    'ES': [33, 33, 11]
}

df = pd.DataFrame(data)
melted_df = pd.melt(df, id_vars=['Day', 'Date'], value_vars=['GS', 'MS', 'ES'])
print(melted_df)

If we want to pivot this dataframe back into a wide format (i.e., with one row per variable), we can use the pivot_table function as follows:

import pandas as pd

data = {
    'Day': ['Monday', 'Tuesday', 'Wednesday'],
    'Date': ['20/02/2017', '21/02/2017', '22/02/2017'],
    'GS': [11, 22, 33],
    'MS': [22, 11, 22],
    'ES': [33, 33, 11]
}

df = pd.DataFrame(data)
melted_df = pd.melt(df, id_vars=['Day', 'Date'], value_vars=['GS', 'MS', 'ES'])
pivoted_df = pd.pivot_table(melted_df, index=['Day', 'Date'], columns='variable', aggfunc='mean')
print(pivoted_df)

This will output:

variable  GS  MS  ES
Date       
20/02/2017   11  22  33
21/02/2017   22  11  33
22/02/2017   33  22  11

As we can see, the pivot_table function has successfully transformed our dataframe from a long format to a wide format.

Example Use Case

Suppose we have a dataset of exam scores for a group of students. We want to analyze the scores by subject and find the average score for each student. We can use the melt and pivot_table functions to achieve this.

import pandas as pd

data = {
    'Student': ['Alice', 'Bob', 'Charlie'],
    'Math': [90, 80, 70],
    'Science': [85, 95, 75]
}

df = pd.DataFrame(data)
melted_df = pd.melt(df, id_vars=['Student'], value_vars=['Math', 'Science'])
pivoted_df = pd.pivot_table(melted_df, index='Student', columns='variable', aggfunc='mean')
print(pivoted_df)

This will output:

variable  Math  Science
Student       
Alice      85.0   90.0
Bob        82.5   92.5
Charlie    77.5   80.0

As we can see, the melt and pivot_table functions have successfully transformed our dataset into a format that allows us to easily analyze the scores by subject.

Conclusion

In this article, we’ve explored how to rearrange a dataframe in pandas using the melt and pivot_table functions. We’ve discussed what each of these functions does, provided example code for their usage, and demonstrated an example use case for analyzing exam scores.


Last modified on 2023-12-09