Introduction to R-squared in ggplot
=====================================================
In this article, we will explore how to add the R-squared value to a ggplot plot. We’ll discuss the basics of R-squared and its importance in regression analysis. We’ll also go through the steps to achieve this using ggplot2.
What is R-squared?
R-squared (R²) is a statistical measure that represents the proportion of variance for a dependent variable that’s explained by an independent variable or variables in a regression model. It provides a simple and intuitive way to evaluate the performance of a linear regression model.
Mathematically, R-squared can be calculated as follows:
R² = 1 - (SS RES / SS TOTAL)
Where:
- SS RES is the sum of squares for residuals
- SS TOTAL is the total sum of squares
Importance of R-squared
R-squared has several important implications in regression analysis:
- Model fit: A high R-squared value indicates that the model fits well to the data, while a low value suggests that the model does not capture the underlying relationship between variables.
- Predictive power: An R-squared value close to 1 means that the model is very good at predicting the response variable, whereas an R-squared value close to 0 indicates that the model has no predictive power.
Calculating R-squared in ggplot
To calculate R-squared in ggplot, we can use the summary()
function after fitting a linear regression model using lm()
. However, this approach is limited because it only provides an estimate of R-squared and does not give us any further insights into the model’s performance.
Using ggmisc package
One popular alternative to calculate R-squared in ggplot is by using the ggmmisc
package. This package provides a range of visualization functions for generalized linear models, including linear regression.
Here is an example code snippet that calculates R-squared and adds it to the subtitle:
# Install ggmisc package
install.packages("ggmmisc")
# Load necessary libraries
library(ggplot2)
library(ggmisc)
# Simulated data
set.seed(123)
n <- 100
x <- rnorm(n, mean = 50, sd = 10)
y <- 2 + 0.5 * x + rnorm(n, mean = 0, sd = 15)
sim_data <- data.frame(x = x, y = y)
# Calculate R-squared using ggmisc
R2 <- summary(lm(y~x, data=sim_data))$r.squared
# Create a function to calculate and display R-squared
sim_plot <- function(data) {
R2_value <- round(summary(lm(y~x, data=data))$r.squared, 2)
ggplot(data, aes(x = x, y = y)) +
geom_point(color = "steelblue", alpha = 0.7, size = 3) +
geom_smooth(method = "lm", color = "darkred", size = 1) +
theme_minimal() +
labs(
title = "Scatterplot with Regression Line and Confidence Interval",
subtitle = paste("R^2 =", R2_value),
x = "X Variable",
y = "Y Variable"
)
}
# Call the function to display R-squared
sim_plot(sim_data)
Custom Function to Calculate R-squared
Alternatively, we can create our own custom function to calculate R-squared:
# Define a custom function to calculate and display R-squared
r_squared <- function(data) {
# Perform linear regression
model <- lm(y~x, data=data)
# Extract R-squared value from summary object
r2_value <- summary(model)$r.squared
# Return the result as a formatted string
paste("R^2 =", round(r2_value, 2))
}
# Simulated data
set.seed(123)
n <- 100
x <- rnorm(n, mean = 50, sd = 10)
y <- 2 + 0.5 * x + rnorm(n, mean = 0, sd = 15)
sim_data <- data.frame(x = x, y = y)
# Calculate R-squared using custom function
R2_value <- r_squared(sim_data)
# Create a ggplot with R-squared subtitle
ggplot(sim_data, aes(x = x, y = y)) +
geom_point(color = "steelblue", alpha = 0.7, size = 3) +
geom_smooth(method = "lm", color = "darkred", size = 1) +
theme_minimal() +
labs(
title = "Scatterplot with Regression Line and Confidence Interval",
subtitle = R2_value,
x = "X Variable",
y = "Y Variable"
)
Conclusion
In this article, we discussed the basics of R-squared and its importance in regression analysis. We also explored how to add R-squared value to a ggplot plot using both ggmmisc
package and custom functions.
While the ggmisc
package provides an easy-to-use interface for calculating R-squared, creating our own custom function allows us to customize the output format and display it directly within the ggplot code.
By following these examples, you should now be able to calculate and display R-squared in your ggplot plots with ease.
Last modified on 2023-06-23