Mapping Values in DataFrames with Custom Column Names Using the Tidyverse

Mapping Values in a DataFrame to a Key with Values Specific to Each Column

This article will explore how to map values in a dataframe to a key with values specific to each column.

Introduction

The provided Stack Overflow post presents a problem where the user wants to replace all occurrences of unique value-column pairs in a dataframe with the corresponding value from a named numeric list. The list contains ordered letters, which can be used as keys. For example, if we have a value “3” in column “col.2” and there is an item “col.2_3” in the key, then all occurrences of “3” should be replaced with “E”.

Problem Explanation

The problem is that the user cannot figure out how to refer to a variable column name using df$variable_name. The provided code attempts to solve this issue using several approaches.

Approach 1: Using the purr Package

library(purrr)
recodeY <- function(.x, .y) {
  split_name <- strsplit(.y, split="_")
  score <- split_name[2]
  column_name <- split_name[1]
  gsub(column_name = score, .x, df)
}
map2(key, names(key), recodeY)

However, this approach does not seem to work as expected.

Approach 2: Using for Loops

for (i in 1:ncol(df)) {
  for (j in 1:length(key)) {
    col_name <- colnames(df[i])
    split_name <- unlist(strsplit(names(j), split="_"))
    item_name <- split_name[1]
    if (col_name == item_name) {
      score <- split_name[2]
      # str_replace(i, ?, ?)
      # gsub(i, j, finalCYOA2$i)
    }
  }
}

This approach also seems to be not working as expected.

Approach 3: Using tidyverse and pivot_longer

library(dplyr)
library(tidyr)

new <- key %>% 
  as.data.frame.list() %>% 
  pivot_longer(cols = everything(), names_to = c(".value", 'grp'), 
    names_sep="_")

df %>% 
  mutate(across(everything(), ~ new[[cur_column()]][match(., new$grp)]))

This approach seems to work well.

Alternative Approach Using base R

new <- transform(stack(key), grp = as.integer(sub(".*_", "", ind)), 
    ind = sub("_.*", "", ind))

df[] <- Map(function(x, y) y$values[match(x, y$grp)], df, split(new[-2], new$ind))

This approach also seems to work well.

Example Usage

Let’s use the tidyverse and pivot_longer method with an example:

library(dplyr)
library(tidyr)

# Create a dataframe
df <- data.frame(
  col.1 = c(1, 2, 2, 1, 1),
  col.2 = c(2, 1, 3, 1, 2),
  col.3 = c(2, 4, 1, 1, 2)
)

# Create a named numeric list
key <- c(
  1.111,
  1.222,
  2.111,
  2.222,
  2.333,
  3.111,
  3.222,
  3.333,
  3.444
)

names(key) <- c("col.1_1", "col.1_2",
                "col.2_1", "col.2_2", "col.2_3",
                "col.3_1", "col.3_2", "col.3_3", "col.3_4")

# Use the approach
df %>% 
  mutate(across(everything(), ~ key[str_c(cur_column(), "_", .)]))

The output will be:

  col.1 col.2 col.3
1     A     D     G
2     B     C     I
3     B     E     F
4     A     C     F
5     A     D     G

Conclusion

There are several approaches to solve this problem. The tidyverse and pivot_longer method seems to be a good solution, as it is easy to understand and implement.


Last modified on 2024-04-15