get_leadvars_SURV screens some predictors as "leading variables" based on predictor-response associations in survival models.

get_leadvars_SURV(y, X, surv_model = c("AFT", "COX"), 
  method = c("topk", "fixedmuthresh", "percmuthresh"), param, 
  varsselected = NULL, varsleft = colnames(X), parallel = FALSE)

Arguments

y

Response. A list with components time and status (1 = event, 0 = censored).

X

Predictor matrix. Can be a base matrix or something as.matrix() can coerce. No missing values are allowed.

surv_model

Character string specifying the survival model. Must be explicitly provided; there is no default. Values are "Cox" for proportional hazards models, "AFT" for accelerated failure time models.

method

Screening rule, one of c("topk", "fixedthresh", "percthresh"). The association measure is marginal utility. "topk" keeps the predictors with the largest \(k\) association values; "fixedthresh" keeps predictors whose association is greater than or equal to a specified threshold; "percthresh" keeps predictors whose association is within a given percentage of the best.

param

Tuning parameter for method. If "topk", supply an integer \(k\) (keep the top \(k\)). If "fixedthresh", supply a numeric threshold (keep predictors with association \(\ge\) threshold). If "percthresh", supply a percentage in \((0,100]\) (keep predictors with association \(\ge\) that percent of the highest association).

varsselected

A character vector containing the predictors that are already selected in previous iterations. The association measure, conditional utility, is computed controling for these predictors. NULL, by default.

varsleft

A character vector containing the predictors that are neither selected, nor removed from consideration in previous iterations. Leading predictors are chosen from these predictors.

parallel

Logical. If TRUE, attempts to perform some computations in parallel mode in binomial and survival families, which is strongly recommended for faster execution. Defaults to colnames(X).

Value

A character vector containing the names of the leading varibales.

Author

Nilotpal Sanyal <nsanyal@utep.edu>, Padmore N. Prempeh <pprempeh@albany.edu>

Examples

# Simulate survival data (Cox)
set.seed(123)
n <- 100
p <- 150
X <- matrix(rnorm(n * p), n, p)
colnames(X) <- paste0("V", 1:p)
eta <- X[,1] + 0.5 * X[,2]
base_rate <- 0.05
T_event <- rexp(n, rate = base_rate * exp(eta))
C <- rexp(n, rate = 0.03)
time <- pmin(T_event, C)
status <- as.integer(T_event <= C)
y_surv <- list(time = time, status = status)
# Select leading variables
leadvars <- get_leadvars_SURV(y = y_surv, X = X, surv_model = "COX", 
                              method = "topk", param = list(k=2), 
                              varsselected = NULL, varsleft = colnames(X))
leadvars
#> [1] "V1"  "V28"