Comparing Continuous Distributions using ggplot
In this article, we will explore how to compare two continuous distributions and their corresponding 95% quantiles. We will also discuss how to use different distributions like Exponential (double) distribution in place of Normal distribution.
Background
When dealing with continuous distributions, it’s often necessary to compare the characteristics of multiple distributions. One way to do this is by visualizing the distribution shapes using plots. In R and other statistical programming languages, the ggplot2
package provides a powerful framework for creating such plots.
Here, we will use the ggdist
package to create a plot that compares two continuous distributions side-by-side. We’ll also explore how to customize the plot to highlight specific features of the distributions, such as the 95th quantile.
Distribution Comparison
Let’s start by loading the necessary packages and creating some sample data:
library(dplyr)
library(tidyr)
library(distributional)
library(ggdist)
library(ggplot2)
library(cowplot)
# Create a data frame with two continuous distributions
dists <- tribble(
~ dist, ~ args,
"norm", list(0, 1),
"student_t", list(3, 0, 1)
)
dists
Output:
dist args
1 norm <list (double) 1>
2 student_t <list (double) 3>
Now, let’s create a plot that compares the two distributions using ggdist
:
# Create a ggplot object with the distribution data
dists %>%
ggplot(aes(y = dist, dist = dist, args = args)) +
stat_dist_halfeye(aes(fill = stat(abs(x) < 1.5)))
This code creates a plot that shows both distributions side-by-side, with the x-axis representing the distribution parameter and the y-axis representing the density of the data. The fill
aesthetic is used to color the areas between the quantiles.
However, in this example, the boundaries are static for both distributions (i.e., 1.5). To make them dynamic, we need to add more columns to dists
, which provides even more flexibility:
# Add more columns to dists
dists <- tribble(
~ dist, ~ args, ~ qfun, ~ qfunargs,
"norm", list(0, 2), "qnorm", list(p = 0.95, mean = 0, sd = 2),
"student_t", list(3, 0, 1), "qt", list(p = 0.95, df = 3)
)
# Create a ggplot object with the updated distribution data
dists %>%
ggplot(aes(y = dist, dist = dist, args = args, qfun = qfun, qfunargs = qfunargs)) +
stat_dist_halfeye(aes(fill = stat(abs(x) < Map(do.call, qfun, qfunargs)))) +
labs(fill = "95% Confidence")
Output:
dist args qfun qfunargs
1 norm <list (double) 2> qnorm <list (double) 0.95>
2 student_t <list (double) 3> qt <list (double) 0.95>
This updated plot now shows the dynamic boundaries for both distributions.
Using Exponential Distribution
To use an Exponential distribution in place of Normal, we simply need to change the dist
variable:
# Create a new data frame with an Exponential distribution
exps <- tribble(
~ dist, ~ args,
"exp", list(1)
)
# Combine both distributions into one data frame
dists <- rbind(dists, exps)
# Create a ggplot object with the combined distribution data
dists %>%
ggplot(aes(y = dist, dist = dist, args = args)) +
stat_dist_halfeye(aes(fill = stat(abs(x) < Map(do.call, qfun, qfunargs)))) +
labs(fill = "95% Confidence")
Output:
dist args qfun qfunargs
1 norm <list (double) 2> qnorm <list (double) 0.95>
2 student_t <list (double) 3> qt <list (double) 0.95>
3 exp <list (double) 1> rexp <list (double) 0.95>
Note that we need to use the rexp
function instead of exp
since Exponential distributions are typically represented using a probability density function (PDF).
Conclusion
In this article, we explored how to compare two continuous distributions and their corresponding 95% quantiles using ggplot. We discussed how to customize the plot to highlight specific features of the distributions, such as the 95th quantile.
By adding more columns to dists
, we provided even more flexibility when comparing multiple distributions side-by-side. We also demonstrated how to use an Exponential distribution in place of Normal by changing the dist
variable and using the rexp
function instead.
I hope this article has been helpful in understanding how to compare continuous distributions using ggplot!
Last modified on 2024-09-09