Understanding Fisher’s Exact Test and How to Try Different Effect Sizes

Fisher’s exact test is a statistical method used to determine if there is a significant difference between two groups. In this article, we’ll explore how to apply Fisher’s exact test in R and discuss ways to try different effect sizes.

Introduction to Fisher’s Exact Test

Fisher’s exact test is based on the hypergeometric distribution and is used when the sample size is small. It calculates the probability of observing a specific combination of outcomes (e.g., the number of successes in an experiment) assuming that all combinations are equally likely. The test determines if the observed outcome is unlikely under the null hypothesis, indicating statistically significant differences between groups.

Background on Hypergeometric Distribution

The hypergeometric distribution models the probability of getting k successes out of n draws without replacement from a population of size N with K successes. In the context of Fisher’s exact test, we’re interested in finding the probability of observing r successes in an experiment where:

n is the sample size (number of observations)
K is the number of successes in the population
r is the observed number of successes

The hypergeometric distribution is used to calculate the probability mass function (PMF) which represents the probability of getting r successes out of n draws.

Fisher’s Exact Test Formula

Given a matrix representing the counts of two groups, the Fisher’s exact test calculates the p-value as follows:

fisher is a 2x2 matrix with rows representing group sizes and columns representing observed outcomes
The formula for calculating the p-value involves computing the probability mass function for each possible combination of outcomes

The p-value is then determined by summing the probabilities over all combinations except one, where we’re assuming that all combinations are equally likely.

How to Implement Fisher’s Exact Test in R

In this article, we’ll focus on implementing Fisher’s exact test using the fisher.test() function from R. However, it’s essential to understand how the underlying hypergeometric distribution works.

Here’s an example implementation:

# Load necessary libraries
library(ggplot2)

# Set seed for reproducibility
set.seed(1234)

# Generate random data
control.years <- sample(1:10, 16, replace = FALSE)
treat.years <- sample(1:10, 16, replace = FALSE)
response.control <- ifelse(control.years <= 0.05, 1, 0)
response.treat <- ifelse(treat.years <= 0.30, 1, 0)

# Calculate counts
control.no <- sum(response.control == 0)
control.yes <- sum(response.control == 1)
treat.no <- sum(response.treat == 0)
treat.yes <- sum(response.treat == 1)

# Create Fisher's exact test matrix
fisher.matrix <- rbind(c(control.no, control.yes), c(treat.no, treat.yes))

# Perform the Fisher's exact test
result <- fisher.test(fisher.matrix, alternative = "greater")

print(result)

Trying Different Effect Sizes

Effect size in Fisher’s exact test refers to how large a difference is needed between groups to be statistically significant. There are two primary types of effect sizes: odds ratios and log odds.

Odds Ratios

The odds ratio represents the increase in odds of an event occurring when one group is compared to another. The odds ratio for our example is calculated as follows:

odds_ratio_control = (control.yes / control.no) / ((treat.yes / treat.no))
odds_ratio_treat = (treat.yes / treat.no) / ((control.yes / control.no))

We can calculate the p-value for each effect size using the Fisher’s exact test.

Log Odds

Log odds represent the logarithm of the odds ratio. The log odds are often more interpretable than the actual odds ratio because they follow a normal distribution, making it easier to compute confidence intervals.

The log odds can be calculated as follows:

log_odds_control = log(control.yes / control.no)
log_odds_treat = log(treat.yes / treat.no)

We can then use the Fisher’s exact test to calculate the p-value for each effect size.

Implementing Multiple Effect Sizes

To try different effect sizes, we’ll modify our code to calculate multiple odds ratios and their corresponding p-values. Here’s an example:

# Load necessary libraries
library(ggplot2)

# Set seed for reproducibility
set.seed(1234)

# Define function to perform Fisher's exact test with multiple effect sizes
perform_fisher_exact_test <- function(control.years, treat.years) {
  # Generate random data
  response.control <- ifelse(control.years <= 0.05, 1, 0)
  response.treat <- ifelse(treat.years <= 0.30, 1, 0)

  # Calculate counts
  control.no <- sum(response.control == 0)
  control.yes <- sum(response.control == 1)
  treat.no <- sum(response.treat == 0)
  treat.yes <- sum(response.treat == 1)

  # Create Fisher's exact test matrix
  fisher.matrix <- rbind(c(control.no, control.yes), c(treat.no, treat.yes))

  # Define odds ratios to try
  odds_ratios <- c(1.5, 2, 3, 4, 6)

  results <- data.frame(effect_size = rep(odds_ratios, each = 2),
                        p_value_control = numeric(length(odds_ratios)),
                        p_value_treat = numeric(length(odds_ratios)))

  # Loop over odds ratios
  for (i in 1:length(odds_ratios)) {
    current_odds_ratio <- odds_ratios[i]

    # Calculate log odds ratio control and treat
    log_odds_control_control <- log(current_odds_ratio * (control.yes / control.no) / (treat.yes / treat.no))
    log_odds_control_treat <- log(current_odds_ratio * (treat.yes / treat.no) / (control.yes / control.no))

    # Calculate Fisher's exact test p-value for each odds ratio
    result_control <- fisher.test(fisher.matrix, alternative = "less", 
                                  data = c(log_odds_control_control, log_odds_control_treat), 
                                  adjusted.method = "BH")
    result_treat <- fisher.test(fisher.matrix, alternative = "greater", 
                                data = c(log_odds_control_control, log_odds_control_treat), 
                                adjusted.method = "BH")

    # Update results
    results$p_value_control[i] <- result_control$p.value
    results$p_value_treat[i] <- result_treat$p.value

  }
  return(results)
}

# Perform Fisher's exact test with multiple effect sizes
data <- perform_fisher_exact_test(control.years, treat.years)

ggplot(data, aes(x = effect_size)) +
  geom_boxplot(aes(y = p_value_control), color = "blue") +
  stat_summary(fun = mean, geom = "point", aes(y = p_value_treat), color = "red")

This code will calculate the p-value for each odds ratio and display them in a box plot. We can use this output to compare different effect sizes.

Conclusion

In this article, we’ve discussed Fisher’s exact test and how it’s implemented using R. We covered two key concepts: effect size (odds ratio) and log odds. The code provided demonstrates how to perform multiple effect sizes by looping over the odds ratios and calculating their corresponding p-values.

Last modified on 2023-12-14