Tabulating Coefficients from Linear Models
In this post, we will explore how to tabulate coefficients from linear models using R. The problem presented involves extracting specific information from multiple linear regression models and arranging it in a structured format.
Background
Linear regression is a widely used statistical technique for modeling the relationship between a dependent variable and one or more independent variables. In this context, we have 10 linear models with different combinations of predictor variables (e.g., Rainfall
) and response variables (Qend
). Each model provides estimates of the coefficients for the slope and intercept of the regression line.
Problem Statement
Given a dataset containing information from these 10 linear models, how can we extract and tabulate the coefficients of interest (slope, intercept, R-squared, adjusted R-squared, and p-value) in a structured format?
Solution
We will demonstrate two approaches to solving this problem using R.
Approach 1: Base R Solution
The first approach uses base R functions to achieve the desired outcome. We start by reading the dataset into R and then split it into separate datasets based on the CatChro
column.
data <- read.csv("./data/so53933238.csv", header = TRUE)
dataList <- split(data, data$CatChro)
# Process each list of datasets using lm() and extract results to a data frame
lmResults <- lapply(dataList, function(x) {
y <- summary(lm(Qend ~ Rainfall, data = x))
Intercept <- y$coefficients[1, 1]
Slope <- y$coefficients[2, 1]
rSquared <- y$r.squared
adjRSquared <- y$adj.r.squared
f <- y$fstatistic[1]
pValue <- pf(y$fstatistic[1], y$fstatistic[2], y$fstatistic[3], lower.tail = FALSE)
data.frame(Slope, Intercept, rSquared, adjRSquared, pValue)
})
Next, we use the do.call(rbind, lmResults)
function to combine the results into a single data frame (lmResultTable
) and add a column for catChro
.
lmResultTable <- do.call(rbind, lmResults)
lmResultTable$catChro <- names(dataList)
Approach 2: Using knitr and kable()
The second approach leverages the knitr
package to render the output in a tabular format. We use kable()
to create a nicely formatted table from the first few rows of lmResultTable
.
library(knitr)
kable(lmResultTable[1:5], row.names = TRUE, digits = 5)
Example Use Case
Suppose we have the following dataset:
CatChro | Qend | Rainfall |
---|---|---|
A3D1 | 10 | 20 |
A3D2 | 15 | 30 |
… | … | … |
We can use the base R solution to extract and tabulate the coefficients for each model:
# Load required libraries
library(knitr)
# Read in dataset
data <- read.csv("./data/so53933238.csv", header = TRUE)
# Split dataset into separate datasets based on CatChro column
dataList <- split(data, data$CatChro)
# Process each list of datasets using lm() and extract results to a data frame
lmResults <- lapply(dataList, function(x) {
y <- summary(lm(Qend ~ Rainfall, data = x))
Intercept <- y$coefficients[1, 1]
Slope <- y$coefficients[2, 1]
rSquared <- y$r.squared
adjRSquared <- y$adj.r.squared
f <- y$fstatistic[1]
pValue <- pf(y$fstatistic[1], y$fstatistic[2], y$fstatistic[3], lower.tail = FALSE)
data.frame(Slope, Intercept, rSquared, adjRSquared, pValue)
})
# Combine results into a single data frame
lmResultTable <- do.call(rbind, lmResults)
# Add CatChro column to the data frame
lmResultTable$catChro <- names(dataList)
# Print first 5 rows of the table
kable(lmResultTable[1:5], row.names = TRUE, digits = 5)
This will produce a nicely formatted table with the extracted coefficients for each model.
Last modified on 2023-07-17