Creating One-Hot Encoded Interaction Terms in R Using model.matrix()

Here is the code with comments and explanations:

# Load necessary libraries
library(stats)

# Create a data frame with 30 rows and 5 columns, where each column represents one of the variables (alfa, beta, gamma, delta, epsilon)
df <- data.frame(
    alfa = sample(c(TRUE, FALSE), 30, replace = TRUE),
    beta = sample(c(TRUE, FALSE), 30, replace = TRUE),
    gamma = sample(c(TRUE, FALSE), 30, replace = TRUE),
    delta = sample(c(TRUE, FALSE), 30, replace = TRUE),
    epsilon = sample(c(TRUE, FALSE), 30, replace = TRUE)
)

# Create a new data frame with one-hot encoded columns for all possible interaction combinations
df_dummy <- model.matrix(~ .^5, data = df)

# Print the column names of the new data frame
colnames(df_dummy)

# To get only the interaction terms (excluding the intercept and main effects), replace the model formula by ~.^5 + 0 or ~.^5 - 1

# Replace the model formula by ~.^5 to include the intercept
df_dummy_intercept <- model.matrix(~ .^5, data = df)
colnames(df_dummy_intercept)

# Replace the model formula by ~.^5 - 1 to exclude the intercept
df_dummy_no_intercept <- model.matrix(~ .^5 - .^4 - 1, data = df)
colnames(df_dummy_no_intercept)

This code creates a new data frame df with one-hot encoded columns for each of the variables (alfa, beta, gamma, delta, epsilon). It then uses the model.matrix() function to create another data frame df_dummy, which includes all possible interaction combinations. The column names of df_dummy are printed.

To get only the interaction terms (excluding the intercept and main effects), you can replace the model formula by ~ .^5 + 0 or ~.^5 - 1. This will create a new data frame df_dummy_intercept with one-hot encoded columns for all possible interaction combinations, including the intercept. To exclude the intercept, use df_dummy_no_intercept.


Last modified on 2025-02-13