Finding Pairs of Elements Across Multiple Columns in R DataFrames

I see that you have a data frame with variables col1, col2, etc. and corresponding values for each column in another column named element. You want to find all pairs of elements where one value is present in two different columns.

Here’s the R code that solves your problem:

library(dplyr)
library(tidyr)

data %>% 
  mutate(name = row_number()) %>% 
  pivot_longer(!name, names_to = 'variable', values_to = 'element') %>% 
  drop_na() %>% 
  group_by(element) %>% 
  filter(n() > 1) %>% 
  select(-n()) %>% 
  inner_join(dups, by = 'element') %>% 
  filter(name.x < name.y) %>% 
  select(name1 = name.x, name2 = name.y, element)

This code does the following:

  1. Creates a new column name with row numbers for each entry.
  2. Converts the data from wide format to long format using pivot_longer.
  3. Drops any rows with missing values.
  4. Groups the data by element and filters out groups with only one value.
  5. Performs an inner join on dups (which contains all pairs of elements) to find matching entries for each element.
  6. Filters out rows where both names are the same (name.x == name.y).
  7. Selects only the desired columns.

Note that this assumes that the data is in a dataframe format, and that you have already created dups dataframe with all pairs of elements using the provided code.


Last modified on 2025-01-04