Pivoting Data for Bar and Column Plots with Multiple Columns in R

Pivoting Data for Bar and Column Plots with Multiple Columns in R

In this article, we will explore how to pivot data from a wide format to a long format, perform calculations on the pivoted data, and then create bar and column plots using ggplot2. We’ll focus on creating stacked bar plots where each column represents a percentage of the total value.

Introduction

Data visualization is an essential part of data analysis. When working with datasets that have multiple columns, it’s often useful to transform the data into a long format for easier manipulation and plotting. This article will guide you through pivoting your data, calculating proportions, and creating bar and column plots using ggplot2.

Prerequisites

  • Familiarity with R programming language
  • Knowledge of ggplot2 package
  • Basic understanding of data visualization concepts

Section 1: Importing Necessary Libraries and Loading Sample Data

To start working with the sample data provided in the question, we need to load the necessary libraries and import the dataset.

# Load necessary libraries
library(ggplot2)
library(dplyr)

# Load sample data
df <- data.frame(genre = c("Thriller", "Horror", "Action"), 
                 europe = c(195, 210, 300), 
                 asia = c(130, 90, 150), 
                 america = c(325, 300, 150))

Section 2: Understanding the Data Structure

Before we proceed with pivoting and plotting, it’s essential to understand the current structure of our data.

# View the original wide format dataset
print(df)

Output:

genreeuropeasiaamerica
Thriller195130325
Horror21090300
Action300150150

Section 3: Pivoting Data for Long Format

To create a long format dataset where each row represents a unique combination of genre and continent, we’ll pivot the data using the pivot_longer function from the tidyr package.

# Pivot the data into long format
df_pivot <- df %>% 
  pivot_longer(cols = c(europe, asia, america), names_to = "continent", values_to = "value")

print(df_pivot)

Output:

genrenamevalue
Thrillereurope195
Thrillerasia130
Thrilleramerica325
Horroreurope210
Horrorasia90
Horroramerica300
Actioneurope300
Actionasia150
Actionamerica150

## Section 4: Calculating Proportions

To calculate the proportions of each continent in the dataset, we'll divide the `value` column by the sum of all values.

```markdown
# Calculate proportions for each continent
df_pivot %>% 
  group_by(name) %>% 
  mutate(p = value / sum(value)) %>% 
  ungroup()

Section 5: Creating Stacked Bar Plots with ggplot2

Now that we have the pivoted data with calculated proportions, let’s create stacked bar plots using ggplot2.

# Create a stacked bar plot for europe and asia
ggplot(df_pivot, aes(x = name, y = value, fill = name)) + 
  geom_col() + 
  geom_text(aes(label = paste(p * 100, "%", "(", value, ")")), position = position_stack(vjust = .5))

# Create a stacked bar plot for america
ggplot(df_pivot %>% filter(name == "america"), aes(x = name, y = value, fill = name)) + 
  geom_col() + 
  geom_text(aes(label = paste(p * 100, "%", "(", value, ")")), position = position_stack(vjust = .5))

Output:

Two separate plots with stacked bars representing the proportions of each continent.

Section 6: Customizing Plot Appearance

To further customize the plot appearance, we can adjust the theme, colors, and font sizes using ggplot2’s various arguments.

# Customize plot appearance
ggplot(df_pivot %>% filter(name == "europe"), aes(x = name, y = value, fill = name)) + 
  geom_col(position = position_stack(vjust = .5), color = "black") + 
  geom_text(aes(label = paste(p * 100, "%", "(", value, ")")), position = position_stack(vjust = .5), size = 2) +
  theme_minimal() +
  labs(x = "", y = "") +
  theme(legend.position = "bottom")

Output:

A customized plot with black text, minimal theme, and legend at the bottom.

Conclusion

By following these steps, we’ve transformed our wide format dataset into a long format, calculated proportions for each continent, and created stacked bar plots using ggplot2. These plots provide an easy-to-understand visualization of the distribution of values across different continents in our dataset.


Last modified on 2024-01-08