Plotting Multiple Plots on the Same Row with Pandas and Matplotlib
In this article, we will explore how to plot multiple plots on the same row using pandas and matplotlib libraries in Python. We will focus on creating a compact scatter matrix plot that displays multiple feature columns against the target variable, while also displaying correlation between each feature and the target.
Introduction
The kaggle house price dataset is a classic example of a multivariate dataset, where we have multiple feature columns and a single target column. In this article, we will use pandas and matplotlib to create a scatter matrix plot that displays multiple plots on the same row, which can be useful for visualizing correlation between features.
Requirements
To follow along with this tutorial, you will need:
- Python 3.x
- Pandas library (
pip install pandas
) - Matplotlib library (
pip install matplotlib
)
Importing Libraries and Loading Data
First, we import the necessary libraries:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
Next, we load the kaggle house price dataset:
df = pd.read_csv('kc_house_data.csv')
Plotting Multiple Plots on the Same Row
To plot multiple plots on the same row, we can use the subplots
function from matplotlib. We create one subplot with multiple columns and put each plot on its own axis.
fig, axes = plt.subplots(1, len(cols), figsize=(12,8), squeeze=False)
In this code:
- We create a figure object
fig
and a set of subplotsaxes
. - The first argument to
subplots
is the number of rows. In this case, we have 1 row. - The second argument is the number of columns. We use the length of the
cols
list as the number of columns. - The third argument is the figure size in inches.
- The fourth argument is a boolean value indicating whether to squeeze the subplots into a single row (in this case, we set it to
False
). - The resulting figure and axes object are stored in the
fig
andaxes
variables.
We then iterate over each column and plot the scatter matrix:
for i, col in enumerate(cols):
df.plot(kind='scatter', x=col, y='price', ax=axes[0,i], s=10, alpha=0.5)
In this code:
- We iterate over each column
col
using a for loop. - For each column, we plot the scatter matrix using the
df.plot()
function. - We pass the following arguments to
plot()
:kind='scatter'
,x=col
, andy='price'
. - We also pass the axis object
ax
to plot on. - The resulting scatter plot is displayed.
Displaying Correlation Between Features
To display correlation between each feature and the target, we can use the corr()
function from pandas:
correlation_matrix = df[cols].corrwith(df['price'])
In this code:
- We create a correlation matrix by selecting the columns in
cols
and the ‘price’ column using square brackets[]
. - We pass the
corrwith()
function to calculate the correlation between each feature and the target.
Final Code
Here is the complete final code:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Load data
df = pd.read_csv('kc_house_data.csv')
# Define feature columns
cols = [i for i in list(df.columns) if i not in ['id','price']]
# Create a scatter matrix plot with multiple plots on the same row
fig, axes = plt.subplots(1, len(cols), figsize=(12,8), squeeze=False)
for i, col in enumerate(cols):
df.plot(kind='scatter', x=col, y='price', ax=axes[0,i], s=10, alpha=0.5)
# Display correlation between features
correlation_matrix = df[cols].corrwith(df['price'])
print(correlation_matrix)
Conclusion
In this article, we explored how to plot multiple plots on the same row using pandas and matplotlib libraries in Python. We created a compact scatter matrix plot that displayed multiple feature columns against the target variable, while also displaying correlation between each feature and the target. The final code is provided above for reference.
Example Use Case
The example use case is to analyze the relationship between different features of house prices using scatter matrices. For instance, one might want to see how price changes with respect to number of bedrooms or square footage. By plotting multiple plots on the same row, we can easily compare and contrast these relationships.
Last modified on 2023-12-20