Identifying Consecutive Weeks Without Missing Values in Pandas DataFrames
Understanding the Problem
The problem at hand involves a pandas DataFrame with orders data, grouped by country and product, and indexed by week number. The task is to find the number of consecutive weeks where there are no missing values (i.e., null) in each group.
Step 1: Importing Libraries and Creating Sample Data
# Import necessary libraries
import pandas as pd
import numpy as np
# Create a sample DataFrame
raw_data = {'Country': ['UK','UK','UK','UK','UK','UK','UK','UK','UK','UK','UK','UK','US','US','UK','UK'],
'Product':['A','A','A','A','A','A','A','A','B','B','B','B','C','C','D','D'],
'Week': [202001,202002,202003,202004,202005,202006,202007,202008,202001,202006,202007,202008,202006,202008,202007,202008],
'Orders': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]}
df = pd.DataFrame(raw_data, columns = ['Country','Product','Week','Orders'])
print("Original DataFrame:")
print(df)
Step 2: Reshaping the Data
# Pivot table to reshape data
df2 = df.pivot_table(index=['Country','Product'], columns='Week', values='Orders', aggfunc='size').reset_index()
print("\nReshaped DataFrame:")
print(df2)
Step 3: Identifying Non-NaN Values from the Start
# Find non-NaN values from the start of each group
a = df2.notna().iloc[:, ::-1]
print("\nNon-NaN values from the start:")
print(a)
Step 4: Calculating Running Sum
# Calculate running sum
b = a.cumsum(axis=1)
print("\nRunning sum:")
print(b)
Step 5: Counting Consecutive Values
# Subtract running sum from original non-NaN values and fill with NaN where necessary
df = b-b.mask(a).ffill(axis=1).fillna(0).astype(int)
# Convert all consecutive values to 0
val = df.mask(df.eq(0).cumsum(axis=1).ne(0), 0)
print("\nConsecutive values converted to 0:")
print(val)
# Calculate max for count of last consecutive values
max_val = val.max(axis=1).astype(str)
print("\nMax value for count of last consecutive weeks:")
print(max_val)
Step 6: Creating a Text Column
# Create a text column based on the max value
df2['Text'] = np.where(val != '1',
'Last ' + val + ' consecutive weeks is not null',
'Last ' + val + ' week is not null')
print("\nDataFrame with text column:")
print(df2)
Step 7: Converting MultiIndex to Columns
# Convert multi-index to columns
df2 = df2.reset_index()
print("\nFinal DataFrame with converted indices:")
print(df2)
By following these steps, we have successfully identified the number of consecutive weeks where there are no missing values (i.e., null) in each group and created a text column indicating whether the last week was null or not.
Last modified on 2025-04-08