Screening Predictors As 'Leading Variables' By Evaluating Predictor-Response Associations In Generalized Linear Models — get_leadvars_GLM • S3VS

get_leadvars_GLM screens some predictors as "leading variables" based on predictor-response associations in generalized linear models.

get_leadvars_GLM(y, X, method = c("topk", "fixedetasqthresh", "percetasqthresh"), param)

Arguments

y: Response. A numeric/integer/logical vector with values in {0,1}.

X: Predictor matrix. Can be a base matrix or something as.matrix() can coerce. No missing values are allowed.

method: Screening rule, one of c("topk", "fixedthresh", "percthresh"). The association measure is eta-squared. "topk" keeps the predictors with the largest \(k\) association values; "fixedthresh" keeps predictors whose association is greater than or equal to a specified threshold; "percthresh" keeps predictors whose association is within a given percentage of the best.

param: Tuning parameter for method. If "topk", supply an integer \(k\) (keep the top \(k\)). If "fixedthresh", supply a numeric threshold (keep predictors with association \(\ge\) threshold). If "percthresh", supply a percentage in \((0,100]\) (keep predictors with association \(\ge\) that percent of the highest association).

Value

A character vector containing the names of the leading varibales.

Author

Nilotpal Sanyal <nsanyal@utep.edu>, Padmore N. Prempeh <pprempeh@albany.edu>

Examples

# Simulate binary data
set.seed(123)
n <- 100
p <- 150
X <- matrix(rnorm(n * p), n, p)
colnames(X) <- paste0("V", 1:p)
eta <- X[,1] + 0.5 * X[,2]
prob <- 1 / (1 + exp(-eta))
y <- rbinom(n, size = 1, prob = prob)
# Select leading variables
leadvars <- get_leadvars_GLM(y = y, X = X, method = "topk", param = list(k=2))
leadvars
#> [1] "V32" "V1"