How to Extract Summary Statistics from stargazer Objects in R

Introduction

The problem presented in the Stack Overflow post is about obtaining data frames from a list of objects created using the stargazer function in R. The function generates a table with summary statistics for a given dataset, but the resulting list object contains the actual data instead of just the summary statistics. This makes it difficult to work with the output directly.

Background

The stargazer function is used to create tables from datasets in various formats, including data frames and matrices. The split function splits a data frame into separate data frames based on certain conditions, such as a specific column or group of columns. However, when using stargazer, the resulting list object contains more than just the summary statistics.

Solution

There are several ways to solve this problem, but we’ll explore two approaches here: using the dplyr package and using the map function from the base R.

Approach 1: Using dplyr

The most straightforward way to achieve the desired output is by using the dplyr package. We can use the select, pivot_longer, group_by, and summarise functions to create a new data frame with only the summary statistics.

# Install and load dplyr
install.packages("dplyr")
library(dplyr)

# Create a data frame with summary statistics
mtcars_sumstat <- mtcars %>%
  select(mpg:qsec, am) %>%
  pivot_longer(-am) %>%
  group_by(am, name) %>%
  summarise(across(value, .fns=list(mean = mean, sd = sd, n = length), .names = "{fn}")) %>%
  group_split()

# View the output
mtcars_sumstat

This approach produces a list of data frames with only the desired summary statistics.

Approach 2: Using map

Another way to achieve the desired output is by using the map function from the base R. We can use the stargazer function and then apply the map(tibble) function to convert each result to a data frame.

# Create a list of summary statistics
mtcars_sumstat <- mtcars %>%
  select(mpg:qsec, am) %>%
  as.data.frame() %>%
  split(.$am) %>%
  map_df(~stargazer(., type = "text", summary.stat = c("n", "mean", "sd"))) %>%
  map(tibble)

# View the output
mtcars_sumstat

This approach also produces a list of data frames with only the desired summary statistics.

Discussion

Both approaches produce the same result, but using dplyr is generally considered more efficient and readable. The map function can be useful when working with complex pipelines or when you need to apply multiple functions to each element of a list.

However, it’s worth noting that using stargazer directly might not be the best approach in this case, since it generates tables from datasets rather than summary statistics. Nevertheless, if you’re already familiar with stargazer, you can still use it to create summary statistics and then convert them to data frames.

Conclusion

In conclusion, obtaining a data frame from a list object created using stargazer requires some creativity and knowledge of R’s piping language. Both approaches presented in this solution work, but using dplyr is generally more efficient and readable.


Last modified on 2023-05-23