Creating One-Hot Encoded Interaction Terms in R Using model.matrix()
Here is the code with comments and explanations:
# Load necessary libraries
library(stats)
# Create a data frame with 30 rows and 5 columns, where each column represents one of the variables (alfa, beta, gamma, delta, epsilon)
df <- data.frame(
alfa = sample(c(TRUE, FALSE), 30, replace = TRUE),
beta = sample(c(TRUE, FALSE), 30, replace = TRUE),
gamma = sample(c(TRUE, FALSE), 30, replace = TRUE),
delta = sample(c(TRUE, FALSE), 30, replace = TRUE),
epsilon = sample(c(TRUE, FALSE), 30, replace = TRUE)
)
# Create a new data frame with one-hot encoded columns for all possible interaction combinations
df_dummy <- model.matrix(~ .^5, data = df)
# Print the column names of the new data frame
colnames(df_dummy)
# To get only the interaction terms (excluding the intercept and main effects), replace the model formula by ~.^5 + 0 or ~.^5 - 1
# Replace the model formula by ~.^5 to include the intercept
df_dummy_intercept <- model.matrix(~ .^5, data = df)
colnames(df_dummy_intercept)
# Replace the model formula by ~.^5 - 1 to exclude the intercept
df_dummy_no_intercept <- model.matrix(~ .^5 - .^4 - 1, data = df)
colnames(df_dummy_no_intercept)
This code creates a new data frame df
with one-hot encoded columns for each of the variables (alfa, beta, gamma, delta, epsilon). It then uses the model.matrix()
function to create another data frame df_dummy
, which includes all possible interaction combinations. The column names of df_dummy
are printed.
To get only the interaction terms (excluding the intercept and main effects), you can replace the model formula by ~ .^5 + 0
or ~.^5 - 1
. This will create a new data frame df_dummy_intercept
with one-hot encoded columns for all possible interaction combinations, including the intercept. To exclude the intercept, use df_dummy_no_intercept
.
Last modified on 2025-02-13