Converting Columns into Rows with Pandas: A Flexible Solution for Data Transformation

Converting Columns into Rows with Pandas

As data analysts and scientists, we often find ourselves working with datasets that have undergone transformations or have unique structures. One such problem is when a dataset has multiple columns that are actually different dates, but you want to convert them into separate rows for each date. In this post, we’ll explore how to achieve this using Pandas, a popular Python library for data manipulation and analysis.

Understanding the Problem

Consider a dataset with information by location for n dates. The problem is that each date is actually a different column header, as shown in the following example CSV:

location    name    Jan-2010    Feb-2010    March-2010
A           "test"  12          20          30
B           "foo"   18          20          25

The goal is to convert this into the following format:

location    name    Date        Value
A           "test"  Jan-2010    12       
A           "test"  Feb-2010    20
A           "test"  March-2010  30
B           "foo"   Jan-2010    18       
B           "foo"   Feb-2010    20
B           "foo"   March-2010  25

Using the Melt Function

The most straightforward way to achieve this transformation is by using the melt function from Pandas. The melt function takes a DataFrame and converts it from wide format (where each column represents a variable) into long format (where each row represents an observation).

Here’s how you can use melt to convert your date columns:

import pandas as pd

# Create the sample DataFrame
df = pd.DataFrame({
    'location': ['A', 'B'],
    'name': ['test', 'foo'],
    'Jan-2010': [12, 18],
    'Feb-2010': [20, 20],
    'March-2010': [30, 25]
})

# Melt the DataFrame
df_melted = df.melt(id_vars=['location', 'name'], 
                   var_name='Date', value_name='Value')

print(df_melted)

This will output:

 location    name        Date  Value
0        A  test    Jan-2010     12
1        B   foo    Jan-2010     18
2        A  test    Feb-2010     20
3        B   foo    Feb-2010     20
4        A  test  March-2010     30
5        B   foo  March-2010     25

As you can see, the melt function has successfully converted each date column into separate rows.

Old Versions (<0.20)

If you’re using an older version of Pandas that doesn’t support the melt function directly, you can achieve similar results by first sorting the DataFrame and then using the apply function to melt it manually.

df_sorted = df.sort_values(by=['location', 'name'])
df_melted_old = df_sorted.apply(lambda row: pd.Series([row['location'], row['name']] + [row[column] for column in ['Jan-2010', 'Feb-2010', 'March-2010']], index=['location', 'name', 'Date', 'Value']), axis=1)

However, this approach can be cumbersome and may not provide the same level of flexibility as the melt function.

Deprecation Notice

It’s worth noting that pd.DataFrame.sort has been deprecated in favor of pd.DataFrame.sort_values. To avoid any issues with deprecation warnings, make sure to use sort_values instead:

df_sorted = df.sort_values(by=['location', 'name'])

Conclusion

Converting columns into rows using Pandas can be a useful technique for data analysis and manipulation. By leveraging the melt function, you can easily transform your dataset from wide format to long format. While older versions of Pandas may require additional steps or workarounds, the melt function provides a convenient and efficient solution for most use cases.


Last modified on 2025-04-04