Finding Pairs of Elements Across Multiple Columns in R DataFrames
I see that you have a data frame with variables col1
, col2
, etc. and corresponding values for each column in another column named element
. You want to find all pairs of elements where one value is present in two different columns.
Here’s the R code that solves your problem:
library(dplyr)
library(tidyr)
data %>%
mutate(name = row_number()) %>%
pivot_longer(!name, names_to = 'variable', values_to = 'element') %>%
drop_na() %>%
group_by(element) %>%
filter(n() > 1) %>%
select(-n()) %>%
inner_join(dups, by = 'element') %>%
filter(name.x < name.y) %>%
select(name1 = name.x, name2 = name.y, element)
This code does the following:
- Creates a new column
name
with row numbers for each entry. - Converts the data from wide format to long format using
pivot_longer
. - Drops any rows with missing values.
- Groups the data by
element
and filters out groups with only one value. - Performs an inner join on
dups
(which contains all pairs of elements) to find matching entries for each element. - Filters out rows where both names are the same (
name.x == name.y
). - Selects only the desired columns.
Note that this assumes that the data is in a dataframe format, and that you have already created dups
dataframe with all pairs of elements using the provided code.
Last modified on 2025-01-04