Understanding Fisher’s Exact Test and How to Try Different Effect Sizes
Fisher’s exact test is a statistical method used to determine if there is a significant difference between two groups. In this article, we’ll explore how to apply Fisher’s exact test in R and discuss ways to try different effect sizes.
Introduction to Fisher’s Exact Test
Fisher’s exact test is based on the hypergeometric distribution and is used when the sample size is small. It calculates the probability of observing a specific combination of outcomes (e.g., the number of successes in an experiment) assuming that all combinations are equally likely. The test determines if the observed outcome is unlikely under the null hypothesis, indicating statistically significant differences between groups.
Background on Hypergeometric Distribution
The hypergeometric distribution models the probability of getting k successes out of n draws without replacement from a population of size N with K successes. In the context of Fisher’s exact test, we’re interested in finding the probability of observing r successes in an experiment where:
n
is the sample size (number of observations)K
is the number of successes in the populationr
is the observed number of successes
The hypergeometric distribution is used to calculate the probability mass function (PMF) which represents the probability of getting r successes out of n draws.
Fisher’s Exact Test Formula
Given a matrix representing the counts of two groups, the Fisher’s exact test calculates the p-value as follows:
fisher
is a 2x2 matrix with rows representing group sizes and columns representing observed outcomes- The formula for calculating the p-value involves computing the probability mass function for each possible combination of outcomes
The p-value is then determined by summing the probabilities over all combinations except one, where we’re assuming that all combinations are equally likely.
How to Implement Fisher’s Exact Test in R
In this article, we’ll focus on implementing Fisher’s exact test using the fisher.test()
function from R. However, it’s essential to understand how the underlying hypergeometric distribution works.
Here’s an example implementation:
# Load necessary libraries
library(ggplot2)
# Set seed for reproducibility
set.seed(1234)
# Generate random data
control.years <- sample(1:10, 16, replace = FALSE)
treat.years <- sample(1:10, 16, replace = FALSE)
response.control <- ifelse(control.years <= 0.05, 1, 0)
response.treat <- ifelse(treat.years <= 0.30, 1, 0)
# Calculate counts
control.no <- sum(response.control == 0)
control.yes <- sum(response.control == 1)
treat.no <- sum(response.treat == 0)
treat.yes <- sum(response.treat == 1)
# Create Fisher's exact test matrix
fisher.matrix <- rbind(c(control.no, control.yes), c(treat.no, treat.yes))
# Perform the Fisher's exact test
result <- fisher.test(fisher.matrix, alternative = "greater")
print(result)
Trying Different Effect Sizes
Effect size in Fisher’s exact test refers to how large a difference is needed between groups to be statistically significant. There are two primary types of effect sizes: odds ratios and log odds.
Odds Ratios
The odds ratio represents the increase in odds of an event occurring when one group is compared to another. The odds ratio for our example is calculated as follows:
odds_ratio_control
= (control.yes / control.no) / ((treat.yes / treat.no))odds_ratio_treat
= (treat.yes / treat.no) / ((control.yes / control.no))
We can calculate the p-value for each effect size using the Fisher’s exact test.
Log Odds
Log odds represent the logarithm of the odds ratio. The log odds are often more interpretable than the actual odds ratio because they follow a normal distribution, making it easier to compute confidence intervals.
The log odds can be calculated as follows:
log_odds_control
= log(control.yes / control.no)log_odds_treat
= log(treat.yes / treat.no)
We can then use the Fisher’s exact test to calculate the p-value for each effect size.
Implementing Multiple Effect Sizes
To try different effect sizes, we’ll modify our code to calculate multiple odds ratios and their corresponding p-values. Here’s an example:
# Load necessary libraries
library(ggplot2)
# Set seed for reproducibility
set.seed(1234)
# Define function to perform Fisher's exact test with multiple effect sizes
perform_fisher_exact_test <- function(control.years, treat.years) {
# Generate random data
response.control <- ifelse(control.years <= 0.05, 1, 0)
response.treat <- ifelse(treat.years <= 0.30, 1, 0)
# Calculate counts
control.no <- sum(response.control == 0)
control.yes <- sum(response.control == 1)
treat.no <- sum(response.treat == 0)
treat.yes <- sum(response.treat == 1)
# Create Fisher's exact test matrix
fisher.matrix <- rbind(c(control.no, control.yes), c(treat.no, treat.yes))
# Define odds ratios to try
odds_ratios <- c(1.5, 2, 3, 4, 6)
results <- data.frame(effect_size = rep(odds_ratios, each = 2),
p_value_control = numeric(length(odds_ratios)),
p_value_treat = numeric(length(odds_ratios)))
# Loop over odds ratios
for (i in 1:length(odds_ratios)) {
current_odds_ratio <- odds_ratios[i]
# Calculate log odds ratio control and treat
log_odds_control_control <- log(current_odds_ratio * (control.yes / control.no) / (treat.yes / treat.no))
log_odds_control_treat <- log(current_odds_ratio * (treat.yes / treat.no) / (control.yes / control.no))
# Calculate Fisher's exact test p-value for each odds ratio
result_control <- fisher.test(fisher.matrix, alternative = "less",
data = c(log_odds_control_control, log_odds_control_treat),
adjusted.method = "BH")
result_treat <- fisher.test(fisher.matrix, alternative = "greater",
data = c(log_odds_control_control, log_odds_control_treat),
adjusted.method = "BH")
# Update results
results$p_value_control[i] <- result_control$p.value
results$p_value_treat[i] <- result_treat$p.value
}
return(results)
}
# Perform Fisher's exact test with multiple effect sizes
data <- perform_fisher_exact_test(control.years, treat.years)
ggplot(data, aes(x = effect_size)) +
geom_boxplot(aes(y = p_value_control), color = "blue") +
stat_summary(fun = mean, geom = "point", aes(y = p_value_treat), color = "red")
This code will calculate the p-value for each odds ratio and display them in a box plot. We can use this output to compare different effect sizes.
Conclusion
In this article, we’ve discussed Fisher’s exact test and how it’s implemented using R. We covered two key concepts: effect size (odds ratio) and log odds. The code provided demonstrates how to perform multiple effect sizes by looping over the odds ratios and calculating their corresponding p-values.
Last modified on 2023-12-14