Subsetting Quosures with dplyr's strip() Function in R

Testing and Subsetting Elements of Quosures in R

In this article, we will explore how to test and subsetting elements of quosures in R. Quosures are a powerful feature introduced in the dplyr package that allows for flexible and expressive data manipulation. However, when it comes to testing and manipulating these quosures, things can get complicated.

Introduction to Quosures

A quosure is an object created by the quo() function, which wraps a value (e.g., a symbol, character string, or numeric) in a list-like structure. This allows for more flexibility when working with data manipulation functions like dplyr’s select(). Quosures can be used to specify columns that should be included in the output of a data manipulation operation.

Subsetting Quosures

One of the benefits of quosures is their ability to represent complex column selections. However, when it comes to subsetting these quosures, things get tricky. Once something is inside a quosure, it becomes difficult to manipulate it further.

The Problem with Quosures

Let’s consider an example where we have a tibble tst_df and want to apply the following condition: select columns A and B if they exist in the dataframe, but skip column C and D. We can create a quosure for this condition using the quo() function:

library(tidyverse)

tst_tb <- tibble(A = 1:10, B = 11:20, C=21:30, D = 31:40)

quosure <- quo(A, B)

However, when we try to apply the condition remove_cols = c(C, D), R throws an error:

# Error in remove_cols = c(C, D):
#   cannot use 'E' (as character) for names of non-existent column(s)

This is because quosures are sensitive to the order and grouping of elements within them. When we try to use a specific column name (C) as part of the remove_cols argument, R interprets it as if it’s one of several hundred possible column names (E), rather than a single column name.

The Proposed Solution

One possible solution to this problem is to convert the quosure back into a character string and then use the syms() function to recreate the quosure. However, this approach seems roundabout and inelegant, as it artificially supplies the benefits of using a quosure again.

Here’s an example implementation:

# Convert quosure to character vector
quosure_chars <- deparse(substitute(quosure))

# Use syms() function to recreate quosure
new_quosure <- sym(quosure_chars)

However, this approach still doesn’t solve the underlying problem of how to manipulate quosures in a flexible and expressive way.

The Proposed Solution (Again!)

A more elegant solution is to use the strip() function provided by the dplyr package. The strip() function allows for subsetting quosures while also preserving their flexibility:

library(dplyr)

tst_tb <- tibble(A = 1:10, B = 11:20, C=21:30, D = 31:40)

# Define strip() function to remove specific columns from dataframe
strip <- function(tib, remove_rows = FALSE, remove_cols = NULL) {
  remove_rows <- enquo(remove_rows)
  remove_cols <- enquo(remove_cols)
  
  # Get the name of the original dataframe
  tib_name <- deparse(substitute(tib))
  
  # Apply filtering and column selection
  out <- tib %>%
    filter(! (!!remove_rows)) %>%
    select(- !!remove_cols) %>% 
    function(XX = .){
      print(paste0(tib_name, ": Length = ", nrow(XX), "; Width = ", ncol(XX)))
      cat("\n")
      cat("     Names: ", names(XX))
    }
  
  out
}

# Apply strip() function to remove specific columns from dataframe
new_tst_tb <- strip(tib = tst_tb, remove_rows = (A < 3 | D > 36), remove_cols = c(C, D))

# Print the resulting dataframe
print(new_tst_tb)

This approach is more elegant and flexible than simply converting the quosure back into a character vector. It preserves the benefits of using quosures while still allowing for subsetting and manipulation.

Conclusion

Quosures are a powerful feature in R that allow for flexible and expressive data manipulation. However, when it comes to testing and subsetting these quosures, things can get complicated. The proposed solutions outlined above demonstrate how to manipulate quosures in a more elegant way using the strip() function.

By understanding how quosures work and leveraging the strip() function, we can write more flexible and expressive code that preserves the benefits of using quosures while still allowing for subsetting and manipulation.


Last modified on 2023-09-14