Mapping Values in a DataFrame to a Key with Values Specific to Each Column
This article will explore how to map values in a dataframe to a key with values specific to each column.
Introduction
The provided Stack Overflow post presents a problem where the user wants to replace all occurrences of unique value-column pairs in a dataframe with the corresponding value from a named numeric list. The list contains ordered letters, which can be used as keys. For example, if we have a value “3” in column “col.2” and there is an item “col.2_3” in the key, then all occurrences of “3” should be replaced with “E”.
Problem Explanation
The problem is that the user cannot figure out how to refer to a variable column name using df$variable_name
. The provided code attempts to solve this issue using several approaches.
Approach 1: Using the purr
Package
library(purrr)
recodeY <- function(.x, .y) {
split_name <- strsplit(.y, split="_")
score <- split_name[2]
column_name <- split_name[1]
gsub(column_name = score, .x, df)
}
map2(key, names(key), recodeY)
However, this approach does not seem to work as expected.
Approach 2: Using for
Loops
for (i in 1:ncol(df)) {
for (j in 1:length(key)) {
col_name <- colnames(df[i])
split_name <- unlist(strsplit(names(j), split="_"))
item_name <- split_name[1]
if (col_name == item_name) {
score <- split_name[2]
# str_replace(i, ?, ?)
# gsub(i, j, finalCYOA2$i)
}
}
}
This approach also seems to be not working as expected.
Approach 3: Using tidyverse
and pivot_longer
library(dplyr)
library(tidyr)
new <- key %>%
as.data.frame.list() %>%
pivot_longer(cols = everything(), names_to = c(".value", 'grp'),
names_sep="_")
df %>%
mutate(across(everything(), ~ new[[cur_column()]][match(., new$grp)]))
This approach seems to work well.
Alternative Approach Using base R
new <- transform(stack(key), grp = as.integer(sub(".*_", "", ind)),
ind = sub("_.*", "", ind))
df[] <- Map(function(x, y) y$values[match(x, y$grp)], df, split(new[-2], new$ind))
This approach also seems to work well.
Example Usage
Let’s use the tidyverse
and pivot_longer
method with an example:
library(dplyr)
library(tidyr)
# Create a dataframe
df <- data.frame(
col.1 = c(1, 2, 2, 1, 1),
col.2 = c(2, 1, 3, 1, 2),
col.3 = c(2, 4, 1, 1, 2)
)
# Create a named numeric list
key <- c(
1.111,
1.222,
2.111,
2.222,
2.333,
3.111,
3.222,
3.333,
3.444
)
names(key) <- c("col.1_1", "col.1_2",
"col.2_1", "col.2_2", "col.2_3",
"col.3_1", "col.3_2", "col.3_3", "col.3_4")
# Use the approach
df %>%
mutate(across(everything(), ~ key[str_c(cur_column(), "_", .)]))
The output will be:
col.1 col.2 col.3
1 A D G
2 B C I
3 B E F
4 A C F
5 A D G
Conclusion
There are several approaches to solve this problem. The tidyverse
and pivot_longer
method seems to be a good solution, as it is easy to understand and implement.
Last modified on 2024-04-15