get_leadvars_GLM.Rdget_leadvars_GLM screens some predictors as "leading variables" based on predictor-response associations in generalized linear models.
get_leadvars_GLM(y, X, method = c("topk", "fixedetasqthresh", "percetasqthresh"), param)Predictor matrix. Can be a base matrix or something as.matrix() can coerce. No missing values are allowed.
Screening rule, one of c("topk", "fixedthresh", "percthresh"). The association measure is eta-squared. "topk" keeps the predictors with the largest \(k\) association values; "fixedthresh" keeps predictors whose association is greater than or equal to a specified threshold; "percthresh" keeps predictors whose association is within a given percentage of the best.
Tuning parameter for method. If "topk", supply an integer \(k\) (keep the top \(k\)). If "fixedthresh", supply a numeric threshold (keep predictors with association \(\ge\) threshold). If "percthresh", supply a percentage in \((0,100]\) (keep predictors with association \(\ge\) that percent of the highest association).
A character vector containing the names of the leading varibales.
# Simulate binary data
set.seed(123)
n <- 100
p <- 150
X <- matrix(rnorm(n * p), n, p)
colnames(X) <- paste0("V", 1:p)
eta <- X[,1] + 0.5 * X[,2]
prob <- 1 / (1 + exp(-eta))
y <- rbinom(n, size = 1, prob = prob)
# Select leading variables
leadvars <- get_leadvars_GLM(y = y, X = X, method = "topk", param = list(k=2))
leadvars
#> [1] "V32" "V1"