Plotting Multiple Plots on the Same Row Using Pandas and Matplotlib for Scatter Matrix Analysis

Plotting Multiple Plots on the Same Row with Pandas and Matplotlib

In this article, we will explore how to plot multiple plots on the same row using pandas and matplotlib libraries in Python. We will focus on creating a compact scatter matrix plot that displays multiple feature columns against the target variable, while also displaying correlation between each feature and the target.

Introduction

The kaggle house price dataset is a classic example of a multivariate dataset, where we have multiple feature columns and a single target column. In this article, we will use pandas and matplotlib to create a scatter matrix plot that displays multiple plots on the same row, which can be useful for visualizing correlation between features.

Requirements

To follow along with this tutorial, you will need:

  • Python 3.x
  • Pandas library (pip install pandas)
  • Matplotlib library (pip install matplotlib)

Importing Libraries and Loading Data

First, we import the necessary libraries:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

Next, we load the kaggle house price dataset:

df = pd.read_csv('kc_house_data.csv')

Plotting Multiple Plots on the Same Row

To plot multiple plots on the same row, we can use the subplots function from matplotlib. We create one subplot with multiple columns and put each plot on its own axis.

fig, axes = plt.subplots(1, len(cols), figsize=(12,8), squeeze=False)

In this code:

  • We create a figure object fig and a set of subplots axes.
  • The first argument to subplots is the number of rows. In this case, we have 1 row.
  • The second argument is the number of columns. We use the length of the cols list as the number of columns.
  • The third argument is the figure size in inches.
  • The fourth argument is a boolean value indicating whether to squeeze the subplots into a single row (in this case, we set it to False).
  • The resulting figure and axes object are stored in the fig and axes variables.

We then iterate over each column and plot the scatter matrix:

for i, col in enumerate(cols):
    df.plot(kind='scatter', x=col, y='price', ax=axes[0,i], s=10, alpha=0.5)

In this code:

  • We iterate over each column col using a for loop.
  • For each column, we plot the scatter matrix using the df.plot() function.
  • We pass the following arguments to plot(): kind='scatter', x=col, and y='price'.
  • We also pass the axis object ax to plot on.
  • The resulting scatter plot is displayed.

Displaying Correlation Between Features

To display correlation between each feature and the target, we can use the corr() function from pandas:

correlation_matrix = df[cols].corrwith(df['price'])

In this code:

  • We create a correlation matrix by selecting the columns in cols and the ‘price’ column using square brackets [].
  • We pass the corrwith() function to calculate the correlation between each feature and the target.

Final Code

Here is the complete final code:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load data
df = pd.read_csv('kc_house_data.csv')

# Define feature columns
cols = [i for i in list(df.columns) if i not in ['id','price']]

# Create a scatter matrix plot with multiple plots on the same row
fig, axes = plt.subplots(1, len(cols), figsize=(12,8), squeeze=False)

for i, col in enumerate(cols):
    df.plot(kind='scatter', x=col, y='price', ax=axes[0,i], s=10, alpha=0.5)

# Display correlation between features
correlation_matrix = df[cols].corrwith(df['price'])
print(correlation_matrix)

Conclusion

In this article, we explored how to plot multiple plots on the same row using pandas and matplotlib libraries in Python. We created a compact scatter matrix plot that displayed multiple feature columns against the target variable, while also displaying correlation between each feature and the target. The final code is provided above for reference.

Example Use Case

The example use case is to analyze the relationship between different features of house prices using scatter matrices. For instance, one might want to see how price changes with respect to number of bedrooms or square footage. By plotting multiple plots on the same row, we can easily compare and contrast these relationships.


Last modified on 2023-12-20