Tabulating Coefficients from Linear Models in R: A Two-Approach Solution

Tabulating Coefficients from Linear Models

In this post, we will explore how to tabulate coefficients from linear models using R. The problem presented involves extracting specific information from multiple linear regression models and arranging it in a structured format.

Background

Linear regression is a widely used statistical technique for modeling the relationship between a dependent variable and one or more independent variables. In this context, we have 10 linear models with different combinations of predictor variables (e.g., Rainfall) and response variables (Qend). Each model provides estimates of the coefficients for the slope and intercept of the regression line.

Problem Statement

Given a dataset containing information from these 10 linear models, how can we extract and tabulate the coefficients of interest (slope, intercept, R-squared, adjusted R-squared, and p-value) in a structured format?

Solution

We will demonstrate two approaches to solving this problem using R.

Approach 1: Base R Solution

The first approach uses base R functions to achieve the desired outcome. We start by reading the dataset into R and then split it into separate datasets based on the CatChro column.

data <- read.csv("./data/so53933238.csv", header = TRUE)
dataList <- split(data, data$CatChro)

# Process each list of datasets using lm() and extract results to a data frame
lmResults <- lapply(dataList, function(x) {
  y <- summary(lm(Qend ~ Rainfall, data = x))
  Intercept <- y$coefficients[1, 1]
  Slope <- y$coefficients[2, 1]
  rSquared <- y$r.squared
  adjRSquared <- y$adj.r.squared
  f <- y$fstatistic[1]
  pValue <- pf(y$fstatistic[1], y$fstatistic[2], y$fstatistic[3], lower.tail = FALSE)
  data.frame(Slope, Intercept, rSquared, adjRSquared, pValue)
})

Next, we use the do.call(rbind, lmResults) function to combine the results into a single data frame (lmResultTable) and add a column for catChro.

lmResultTable <- do.call(rbind, lmResults)
lmResultTable$catChro <- names(dataList)

Approach 2: Using knitr and kable()

The second approach leverages the knitr package to render the output in a tabular format. We use kable() to create a nicely formatted table from the first few rows of lmResultTable.

library(knitr)
kable(lmResultTable[1:5], row.names = TRUE, digits = 5)

Example Use Case

Suppose we have the following dataset:

CatChroQendRainfall
A3D11020
A3D21530

We can use the base R solution to extract and tabulate the coefficients for each model:

# Load required libraries
library(knitr)

# Read in dataset
data <- read.csv("./data/so53933238.csv", header = TRUE)

# Split dataset into separate datasets based on CatChro column
dataList <- split(data, data$CatChro)

# Process each list of datasets using lm() and extract results to a data frame
lmResults <- lapply(dataList, function(x) {
  y <- summary(lm(Qend ~ Rainfall, data = x))
  Intercept <- y$coefficients[1, 1]
  Slope <- y$coefficients[2, 1]
  rSquared <- y$r.squared
  adjRSquared <- y$adj.r.squared
  f <- y$fstatistic[1]
  pValue <- pf(y$fstatistic[1], y$fstatistic[2], y$fstatistic[3], lower.tail = FALSE)
  data.frame(Slope, Intercept, rSquared, adjRSquared, pValue)
})

# Combine results into a single data frame
lmResultTable <- do.call(rbind, lmResults)

# Add CatChro column to the data frame
lmResultTable$catChro <- names(dataList)

# Print first 5 rows of the table
kable(lmResultTable[1:5], row.names = TRUE, digits = 5)

This will produce a nicely formatted table with the extracted coefficients for each model.


Last modified on 2023-07-17