Plot Background Shape Based on Variable

In this tutorial, we will explore how to create a plot with a background shape based on the value of a variable. We will use Python’s popular data analysis library, pandas, and its integration with matplotlib for creating high-quality plots.

Introduction

When working with real-world data, it is often useful to visualize trends or patterns in the data. One way to do this is by using colors to represent different values. However, when we have a variable that changes gradually over time, such as temperature or stock prices, it can be difficult to determine where exactly these changes occur.

Background Shape with Non-Consecutive Integers

In our case, we are given a pandas DataFrame with several columns containing data and one column that encodes the state of the process of interest (non-consecutive integers). We want to use this variable to add a shade to the background of the plot. Here is an example:

df = pd.DataFrame(
{
    "y": [x * x / 100 for x in range(10)],
    "state": [0 if x < 5 else 1 for x in range(10)],
})

Understanding the Code

The code first creates a pandas DataFrame df with two columns: y and state. The values in the y column are generated using the formula x * x / 100, where x is an integer ranging from 0 to 9. The state column is created by applying a simple condition to each value of x: if x is less than 5, the corresponding state is 0; otherwise, it’s 1.

Next, we use matplotlib to create a plot with a specified range for the y-axis (ax.set_ylim(0,1)). We then use the plot() function to visualize the data in the y column. However, as we mentioned earlier, simply plotting the data does not provide any information about how the state variable changes.

Finding Blocks of Constant State

To solve this problem, we need to identify blocks where the state is constant and fill these areas with different colors. We can achieve this by iterating over each value of the state column and identifying the indices of consecutive values that have the same state.

We start by creating an index variable x that contains the indices of all rows in the DataFrame where the state changes (i.e., where df['state'] != df['state'].shift(1)). This creates a new DataFrame with two columns: index and next_index, which specify the range of indices for each block.

x = df.loc[df['state'] != df['state'].shift(1), 'state'].reset_index()
x['next_index'] = x['index'].shift(-1).fillna(df.index.max())

Filling Blocks with Colors

With our index variable x in hand, we can now fill the blocks with different colors. We loop over each value of the x index and use matplotlib’s axvspan() function to create a shaded area corresponding to that block.

The color of the shaded area is determined by the state at that particular point: if it’s 1 (i.e., we’re in a region of changing state), the area is filled with blue; otherwise, it’s red.

for i in x.index:
    c = 'blue' if (x.at[i, 'state']==1) else 'red'
    xa = x.at[i, 'index']
    xb = x.at[i, 'next_index']
    ax.axvspan(xa, xb, alpha=0.15, color=c)

Combining the Code

Here is the complete code that generates our desired plot:

import pandas as pd
import matplotlib.pyplot as plt

# Create a DataFrame with some data and a state column
df = pd.DataFrame(
{
    "y": [x * x / 100 for x in range(10)],
    "state": [0 if x < 5 else 1 for x in range(10)],
})

# Create a figure and axis object
fig, ax = plt.subplots(1)

# Set the y-axis limits
ax.set_ylim(0,1)

# Plot the data in the y column
df[['y']].plot(ax=ax)

# Find blocks of constant state
x = df.loc[df['state'] != df['state'].shift(1), 'state'].reset_index()
x['next_index'] = x['index'].shift(-1).fillna(df.index.max())

# Fill these areas with different colors
for i in x.index:
    c = 'blue' if (x.at[i, 'state']==1) else 'red'
    xa = x.at[i, 'index']
    xb = x.at[i, 'next_index']
    ax.axvspan(xa, xb, alpha=0.15, color=c)

# Show the plot
plt.show()

Last modified on 2024-05-10