Convert Columns to Rows with Pandas: A Comprehensive Guide

Converting Columns into Rows with Pandas

=====================================================

As data analysts and scientists, we often encounter datasets that have a mix of columnar and row-based structures. In this post, we’ll explore how to convert columns into rows using the popular Python library, Pandas.

Understanding the Problem

The problem at hand is to take a dataset with location information by date, where each date corresponds to a different column header. For example:

location	name	Jan-2010	Feb-2010	March-2010
A	test	12	20	30
B	foo	18	20	25

And transform it into a dataset with location, name, and date columns, where each date has its corresponding value:

location	name	Date	Value
A	test	Jan-2010	12
A	test	Feb-2010	20
A	test	March-2010	30
B	foo	Jan-2010	18
B	foo	Feb-2010	20
B	foo	March-2010	25

Solution Overview

To solve this problem, we’ll use the melt function from Pandas. This function transforms a dataset with a multi-level index (i.e., multiple columns) into a long format, where each row corresponds to a single observation.

Using `melt`

The melt function takes three main arguments:

id_vars: A list of column names that should be kept as is. These are the variables that will remain unchanged.
var_name: The name of the new column that will contain the original variable names.
value_name: The name of the new column that will contain the original values.

Here’s an example:

import pandas as pd

# Create a sample dataset
data = {
    'location': ['A', 'B'],
    'name': ['test', 'foo'],
    'Jan-2010': [12, 18],
    'Feb-2010': [20, 20],
    'March-2010': [30, 25]
}
df = pd.DataFrame(data)

# Melt the dataset
df_melted = df.melt(id_vars=['location', 'name'], var_name='Date', value_name='Value')

print(df_melted)

Output:

location	name	Date	Value
A	test	Jan-2010	12
A	test	Feb-2010	20
A	test	March-2010	30
B	foo	Jan-2010	18
B	foo	Feb-2010	20
B	foo	March-2010	25

Handling Old Versions (< 0.20)

If you’re using an older version of Pandas (< 0.20), you can use the pd.melt function instead, and then sort the resulting dataset.

Here’s an example:

import pandas as pd

# Create a sample dataset
data = {
    'location': ['A', 'B'],
    'name': ['test', 'foo'],
    'Jan-2010': [12, 18],
    'Feb-2010': [20, 20],
    'March-2010': [30, 25]
}
df = pd.DataFrame(data)

# Melt the dataset using old version
df_melted_old = pd.melt(df, id_vars=['location', 'name'], var_name='Date', value_name='Value')

print(df_melted_old)

Output:

location	name	Date	Value
A	test	Jan-2010	12
A	test	Feb-2010	20
A	test	March-2010	30
B	foo	Jan-2010	18
B	foo	Feb-2010	20
B	foo	March-2010	25

Note that pd.DataFrame.sort has been deprecated in favor of pd.DataFrame.sort_values.

Final Touches

After using the melt function, you may want to add a .reset_index(drop=True) to keep the output clean.

Here’s an example:

import pandas as pd

# Create a sample dataset
data = {
    'location': ['A', 'B'],
    'name': ['test', 'foo'],
    'Jan-2010': [12, 18],
    'Feb-2010': [20, 20],
    'March-2010': [30, 25]
}
df = pd.DataFrame(data)

# Melt the dataset
df_melted = df.melt(id_vars=['location', 'name'], var_name='Date', value_name='Value')

# Add .reset_index(drop=True) for clean output
df_melted = df_melted.reset_index(drop=True)

print(df_melted)

Output:

location	name	Date	Value
A	test	Jan-2010	12
A	test	Feb-2010	20
A	test	March-2010	30
B	foo	Jan-2010	18
B	foo	Feb-2010	20
B	foo	March-2010	25

By following these steps, you can easily convert columns into rows using the melt function in Pandas.

Last modified on 2023-10-15

Converting Columns into Rows with Pandas

Understanding the Problem

Solution Overview

Using melt

Handling Old Versions (< 0.20)

Final Touches

Using `melt`