Convert Columns to Rows with Pandas: A Comprehensive Guide

Converting Columns into Rows with Pandas

=====================================================

As data analysts and scientists, we often encounter datasets that have a mix of columnar and row-based structures. In this post, we’ll explore how to convert columns into rows using the popular Python library, Pandas.

Understanding the Problem


The problem at hand is to take a dataset with location information by date, where each date corresponds to a different column header. For example:

locationnameJan-2010Feb-2010March-2010
Atest122030
Bfoo182025

And transform it into a dataset with location, name, and date columns, where each date has its corresponding value:

locationnameDateValue
AtestJan-201012
AtestFeb-201020
AtestMarch-201030
BfooJan-201018
BfooFeb-201020
BfooMarch-201025

Solution Overview


To solve this problem, we’ll use the melt function from Pandas. This function transforms a dataset with a multi-level index (i.e., multiple columns) into a long format, where each row corresponds to a single observation.

Using melt


The melt function takes three main arguments:

  • id_vars: A list of column names that should be kept as is. These are the variables that will remain unchanged.
  • var_name: The name of the new column that will contain the original variable names.
  • value_name: The name of the new column that will contain the original values.

Here’s an example:

import pandas as pd

# Create a sample dataset
data = {
    'location': ['A', 'B'],
    'name': ['test', 'foo'],
    'Jan-2010': [12, 18],
    'Feb-2010': [20, 20],
    'March-2010': [30, 25]
}
df = pd.DataFrame(data)

# Melt the dataset
df_melted = df.melt(id_vars=['location', 'name'], var_name='Date', value_name='Value')

print(df_melted)

Output:

locationnameDateValue
AtestJan-201012
AtestFeb-201020
AtestMarch-201030
BfooJan-201018
BfooFeb-201020
BfooMarch-201025

Handling Old Versions (< 0.20)


If you’re using an older version of Pandas (< 0.20), you can use the pd.melt function instead, and then sort the resulting dataset.

Here’s an example:

import pandas as pd

# Create a sample dataset
data = {
    'location': ['A', 'B'],
    'name': ['test', 'foo'],
    'Jan-2010': [12, 18],
    'Feb-2010': [20, 20],
    'March-2010': [30, 25]
}
df = pd.DataFrame(data)

# Melt the dataset using old version
df_melted_old = pd.melt(df, id_vars=['location', 'name'], var_name='Date', value_name='Value')

print(df_melted_old)

Output:

locationnameDateValue
AtestJan-201012
AtestFeb-201020
AtestMarch-201030
BfooJan-201018
BfooFeb-201020
BfooMarch-201025

Note that pd.DataFrame.sort has been deprecated in favor of pd.DataFrame.sort_values.

Final Touches


After using the melt function, you may want to add a .reset_index(drop=True) to keep the output clean.

Here’s an example:

import pandas as pd

# Create a sample dataset
data = {
    'location': ['A', 'B'],
    'name': ['test', 'foo'],
    'Jan-2010': [12, 18],
    'Feb-2010': [20, 20],
    'March-2010': [30, 25]
}
df = pd.DataFrame(data)

# Melt the dataset
df_melted = df.melt(id_vars=['location', 'name'], var_name='Date', value_name='Value')

# Add .reset_index(drop=True) for clean output
df_melted = df_melted.reset_index(drop=True)

print(df_melted)

Output:

locationnameDateValue
AtestJan-201012
AtestFeb-201020
AtestMarch-201030
BfooJan-201018
BfooFeb-201020
BfooMarch-201025

By following these steps, you can easily convert columns into rows using the melt function in Pandas.


Last modified on 2023-10-15