Understanding Grouped Bar Charts with Matplotlib
Overview of the Problem
The question from Stack Overflow presents a situation where creating a grouped bar chart using matplotlib in Python results in an undesirable output. The goal is to understand which part of the code is causing the issue and provide a solution.
Importing Libraries and Dataframe
Corrected Code for Importing Libraries and Creating DataFrame
import pandas as pd
import matplotlib.pyplot as plt
file = "https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DV0101EN/labs/coursera/Topic_Survey_Assignment.csv"
df = pd.read_csv(file, index_col=0)
The original code for importing libraries and creating the dataframe was incorrect. To fix this issue, you need to import pandas as ‘pd’ instead of just ‘import pandas as pandas’. After fixing the import statement, use pd.read_csv()
function with the correct parameters.
Explanation
- Importing Libraries: Pandas (as ‘pd’) is used for data manipulation and analysis. Matplotlib.pyplot (‘plt’) is used for plotting.
- Creating DataFrame: The
read_csv
method reads a CSV file into a pandas DataFrame, where each column represents a variable and rows represent observations.
Using Matplotlib for Plotting
Corrected Code for Plotting with Annotations
# your colors
colors = ['#5cb85c', '#5bc0de', '#d9534f']
# plot with annotations is probably easier
ax = df.plot(kind='bar', color=colors, figsize=(20, 8), rot=0, ylabel='Percentage', title="The percentage of the respondents' interest in the different data science Area")
for c in ax.containers:
ax.bar_label(c, fmt='%.2f', label_type='edge')
Explanation
- Plotting DataFrame: Use
df.plot()
to create a bar plot. Set thekind
parameter to ‘bar’. This method automatically divides all columns of the dataframe by 2233 and returns a DataFrame with the desired format. - Adding Annotations: The
ax.bar_label()
function adds labels (annotations) above each bar in the plot. You can customize these labels using thefmt
parameter.
Using Matplotlib before Version 3.4.2
Corrected Code for Plotting without Annotations
# your colors
colors = ['#5cb85c', '#5bc0de', '#d9534f']
# plot with annotations is probably easier
ax = df.plot.bar(color=colors, figsize=(20, 8), ylabel='Percentage', title="The percentage of the respondents' interest in the different data science Area")
ax.set_xticklabels(ax.get_xticklabels(), rotation=0)
for p in ax.patches:
ax.annotate(f'{p.get_height():0.2f}', (p.get_x() + p.get_width() / 2., p.get_height()), ha = 'center', va = 'center', xytext = (0, 10), textcoords = 'offset points')
Explanation
- Plotting DataFrame: This code is similar to the previous one but lacks annotations. For easier annotation, refer to the section “Corrected Code for Importing Libraries and Creating DataFrame”.
- Rotating Labels: Use
ax.set_xticklabels()
function to rotate the labels for better visibility.
Additional Advice
Avoid Using Low Values in Annotations
To avoid low values in the annotations (if they exist), it can be useful to use a method with more sophisticated formatting. For this purpose, see how to add value labels on a bar chart or how to use matplotlib.pyplot.bar_label() and pandas.DataFrame.plot().
Last modified on 2024-05-06