get_leadvars_SURV.Rdget_leadvars_SURV screens some predictors as "leading variables" based on predictor-response associations in survival models.
Predictor matrix. Can be a base matrix or something as.matrix() can coerce. No missing values are allowed.
Character string specifying the survival model. Must be explicitly provided; there is no default. Values are "Cox" for proportional hazards models, "AFT" for accelerated failure time models.
Screening rule, one of c("topk", "fixedthresh", "percthresh"). The association measure is marginal utility. "topk" keeps the predictors with the largest \(k\) association values; "fixedthresh" keeps predictors whose association is greater than or equal to a specified threshold; "percthresh" keeps predictors whose association is within a given percentage of the best.
Tuning parameter for method. If "topk", supply an integer \(k\) (keep the top \(k\)). If "fixedthresh", supply a numeric threshold (keep predictors with association \(\ge\) threshold). If "percthresh", supply a percentage in \((0,100]\) (keep predictors with association \(\ge\) that percent of the highest association).
A character vector containing the predictors that are already selected in previous iterations. The association measure, conditional utility, is computed controling for these predictors. NULL, by default.
A character vector containing the names of the leading varibales.
# Simulate survival data (Cox)
set.seed(123)
n <- 100
p <- 150
X <- matrix(rnorm(n * p), n, p)
colnames(X) <- paste0("V", 1:p)
eta <- X[,1] + 0.5 * X[,2]
base_rate <- 0.05
T_event <- rexp(n, rate = base_rate * exp(eta))
C <- rexp(n, rate = 0.03)
time <- pmin(T_event, C)
status <- as.integer(T_event <= C)
y_surv <- list(time = time, status = status)
# Select leading variables
leadvars <- get_leadvars_SURV(y = y_surv, X = X, surv_model = "COX",
method = "topk", param = list(k=2),
varsselected = NULL, varsleft = colnames(X))
leadvars
#> [1] "V1" "V28"