Converting Columns into Rows with Pandas
As data analysts and scientists, we often find ourselves working with datasets that have undergone transformations or have unique structures. One such problem is when a dataset has multiple columns that are actually different dates, but you want to convert them into separate rows for each date. In this post, we’ll explore how to achieve this using Pandas, a popular Python library for data manipulation and analysis.
Understanding the Problem
Consider a dataset with information by location for n dates. The problem is that each date is actually a different column header, as shown in the following example CSV:
location name Jan-2010 Feb-2010 March-2010
A "test" 12 20 30
B "foo" 18 20 25
The goal is to convert this into the following format:
location name Date Value
A "test" Jan-2010 12
A "test" Feb-2010 20
A "test" March-2010 30
B "foo" Jan-2010 18
B "foo" Feb-2010 20
B "foo" March-2010 25
Using the Melt Function
The most straightforward way to achieve this transformation is by using the melt
function from Pandas. The melt
function takes a DataFrame and converts it from wide format (where each column represents a variable) into long format (where each row represents an observation).
Here’s how you can use melt
to convert your date columns:
import pandas as pd
# Create the sample DataFrame
df = pd.DataFrame({
'location': ['A', 'B'],
'name': ['test', 'foo'],
'Jan-2010': [12, 18],
'Feb-2010': [20, 20],
'March-2010': [30, 25]
})
# Melt the DataFrame
df_melted = df.melt(id_vars=['location', 'name'],
var_name='Date', value_name='Value')
print(df_melted)
This will output:
location name Date Value
0 A test Jan-2010 12
1 B foo Jan-2010 18
2 A test Feb-2010 20
3 B foo Feb-2010 20
4 A test March-2010 30
5 B foo March-2010 25
As you can see, the melt
function has successfully converted each date column into separate rows.
Old Versions (<0.20)
If you’re using an older version of Pandas that doesn’t support the melt
function directly, you can achieve similar results by first sorting the DataFrame and then using the apply
function to melt it manually.
df_sorted = df.sort_values(by=['location', 'name'])
df_melted_old = df_sorted.apply(lambda row: pd.Series([row['location'], row['name']] + [row[column] for column in ['Jan-2010', 'Feb-2010', 'March-2010']], index=['location', 'name', 'Date', 'Value']), axis=1)
However, this approach can be cumbersome and may not provide the same level of flexibility as the melt
function.
Deprecation Notice
It’s worth noting that pd.DataFrame.sort
has been deprecated in favor of pd.DataFrame.sort_values
. To avoid any issues with deprecation warnings, make sure to use sort_values
instead:
df_sorted = df.sort_values(by=['location', 'name'])
Conclusion
Converting columns into rows using Pandas can be a useful technique for data analysis and manipulation. By leveraging the melt
function, you can easily transform your dataset from wide format to long format. While older versions of Pandas may require additional steps or workarounds, the melt
function provides a convenient and efficient solution for most use cases.
Last modified on 2025-04-04