Creating Hierarchical Columns from Unique Values in a Pandas DataFrame
In this article, we’ll explore how to create hierarchical columns based on unique values in specific columns of a pandas DataFrame. This is particularly useful when working with data that has multiple categories or subcategories.
Problem Statement
Suppose you have a pandas DataFrame with three columns: S.No
, Name1
, and Name2
. The Name1
and Name2
columns contain unique values, and you want to create hierarchical columns based on these unique values. This will result in a new DataFrame where the hierarchical column is created based on the unique values in Name1
and Name2
.
Solution Overview
To solve this problem, we’ll use the following steps:
- Create a helper DataFrame with all possible combinations of unique values from
S.No
,Name1
, andName2
. - Merge the original DataFrame with the helper DataFrame using the
merge
function. - Pivot the resulting DataFrame to create hierarchical columns based on the unique values in
Name1
andName2
.
Step 1: Creating the Helper DataFrame
The first step is to create a helper DataFrame that contains all possible combinations of unique values from S.No
, Name1
, and Name2
. We can use the itertools.product
function to achieve this.
from io import StringIO
from itertools import product
import pandas as pd
# Create a sample DataFrame
txt = '''S.No Name1 Name2 Size
1 ABC XYZ 12
2 BCA XCZ 15
3 DAB ZXM 20'''
df = pd.read_table(StringIO(txt), sep="\s+")
# Get unique values from S.No, Name1, and Name2
unique_sno = df['S.No'].unique()
unique_name1 = df['Name1'].unique()
unique_name2 = df['Name2'].unique()
# Create a helper DataFrame with all possible combinations of unique values
fill_df = pd.DataFrame(list(product(unique_sno, unique_name1, unique_name2)),
columns=['S.No', 'Name1', 'Name2'])
Step 2: Merging the DataFrames
Next, we’ll merge the original DataFrame with the helper DataFrame using the merge
function. We’ll use a right join to ensure that all rows from the helper DataFrame are included in the resulting DataFrame.
# Merge the original DataFrame with the helper DataFrame
df = df.merge(fill_df, on=['S.No', 'Name1', 'Name2'], how='right')
Step 3: Creating Hierarchical Columns
Finally, we’ll use the pivot_table
function to create hierarchical columns based on the unique values in Name1
and Name2
. We’ll set the index of the DataFrame to S.No
, and specify Name1
and Name2
as the columns to be pivoted.
# Create a pivot table to create hierarchical columns
pvtdf = df.pivot_table(index='S.No', columns=['Name1', 'Name2'],
values='Size', aggfunc='max', dropna=False)\
.rename_axis([None, None], axis="columns")
Example Use Case
Here’s an example of how to use this solution:
Suppose you have a DataFrame with the following data:
| S.No | Name1 | Name2 | Size |
| --- | --- | --- | --- |
| 1 | ABC | XYZ | 12 |
| 2 | BCA | XCZ | 15 |
| 3 | DAB | ZXM | 20 |
You can use the solution above to create a hierarchical column based on the unique values in Name1
and Name2
. The resulting DataFrame would look like this:
| S.No | ABC | BCA | DAB | XCZ | XYZ | ZXM |
| --- | --- | --- | --- | --- | --- | --- |
| 1 | 12.0 | NaN | NaN | NaN | 12.0 | NaN |
| 2 | NaN | 15.0 | NaN | NaN | NaN | NaN |
| 3 | NaN | NaN | 20.0 | NaN | NaN | NaN |
As you can see, the hierarchical column is created based on the unique values in Name1
and Name2
. The index of the DataFrame remains S.No
, which allows for easy filtering and sorting.
Conclusion
In this article, we explored how to create hierarchical columns based on unique values in specific columns of a pandas DataFrame. We used the itertools.product
function to create a helper DataFrame with all possible combinations of unique values from S.No
, Name1
, and Name2
. We then merged the original DataFrame with the helper DataFrame using the merge
function, and created hierarchical columns based on the unique values in Name1
and Name2
using the pivot_table
function. This solution can be used to solve a variety of problems involving hierarchical data structures.
Last modified on 2025-02-20