S3VS_SURV.RdS3VS_SURV performs variable selection based on the structured screen-and-select framework in survival models.
S3VS_SURV(y, X, surv_model = c("COX", "AFT"),
method_xy = c("topk", "fixedmuthresh", "percmuthresh"), param_xy,
method_xx = c("topk", "fixedcorthresh", "perccorthresh"), param_xx,
vsel_method = c("LASSO", "ENET", "AFTGEE", "BRIDGE", "PVAFT"),
alpha = 0.5,
method_sel = c("conservative", "liberal"),
method_rem = c("conservative_begin", "conservative_end", "liberal"),
m = 100, nskip = 3, verbose = FALSE, seed = NULL, parallel = FALSE)Design matrix of predictors. Can be a base matrix or something as.matrix() can coerce. No missing values are allowed.
Character string specifying the survival model. Must be explicitly provided; there is no default. Values are "Cox" for proportional hazards models, "AFT" for accelerated failure time models.
Rule for screening some predictors as "leading variables" based on their association with the response; one of c("topk", "fixedthresh", "percthresh"). The association measure is marginal utility.
"topk" keeps the predictors with the largest \(k\) association values; "fixedthresh" keeps predictors whose association is greater than or equal to a specified threshold; "percthresh" keeps predictors whose association is within a given percentage of the best.
Tuning parameter for method_xy. If "topk", supply a list with an integer k (keep the top \(k\)). If "fixedthresh", supply a list with a numeric threshold thresh (keep predictors with association \(\ge\) threshold). If "percthresh", supply a list with a numeric percentage thresh in \((0,100]\) (keep predictors with association \(\ge\) that percent of the highest association).
Rule for constructing, for each leading variable, the set of associated predictors (the "leading set") using inter-predictor association (absolute value of the correlation coefficient); one of c("topk", "fixedthresh", "percthresh") with same interpretation as method_xy.
Tuning parameter for method_xx; same interpretation as param_xy but applied to inter-predictor association (absolute value of the correlation coefficient).
Character string specifying the variable selection method to be used within each leading set. Available options are "LASSO", "ENET" for surv_model=COX and "AFTGEE", "BRIDGE", "PVAFT" for surv_model=AFT.
Only used when vsel_method == "ENET". Elastic net mixing parameter, with \(\alpha \in (0,1)\).
Policy for aggregating predictors selected across leading sets in an iteration; one of c("conservative","liberal"). "conservative" selects the smallest admissible set of predictors by intersecting the selected sets of predictors across leading sets, beginning with all and gradually reducing from the end until a non-empty intersection is found; this ensures only predictors consistently selected across leading sets are retained. "liberal" selects the largest admissible set of predictors by taking the union of all selected sets of predictors, so any predictor chosen in at least one leading set is included. If no predictor is selected from the first leading set, the iteration does not contribute to final selection and exclusion rules (method_rem) are applied instead.
Policy for excluding predictors when no selections are made in an iteration; one of c("conservative_begin","conservative_end","liberal"). "conservative_begin" excludes the smallest admissible set of predictors by intersecting the non-selected sets of predictors starting from the first leading set; "conservative_end" does the same but begins from the last leading set and moves backward; "liberal" excludes the largest admissible set of predictors by taking the union of all non-selected sets of predictor. Predictors excluded under this rule are removed from subsequent iterations.
Integer. Maximum number of iterations in which no new predictors are selected before the algorithm stops. Defaults to 3.
Logical. If TRUE, prints detailed progress information at each iteration (e.g., iteration number, predictors selected or removed). Defaults to FALSE.
If supplied, sets the random seed via set.seed() to ensure reproducibility of stochastic components. If NULL, no seed is set.
For a survival type response, S3VS considers two choices of models–the Cox model $$ \lambda(t\mid \boldsymbol{x}_i) = \lambda_0(t) \exp(\boldsymbol{x}_i^T \boldsymbol{\beta}) $$ and the AFT model $$ \log(\boldsymbol{T}) = \boldsymbol{X}\boldsymbol{\beta} + \boldsymbol{\epsilon} $$
For the S3VS algorithm, see the manual of the top-level function S3VS.
A list with the following components:
A character vector of predictor names that were selected across all iterations.
A list recording the predictors selected at each iteration, in the order they were considered.
Runtime in seconds.
# Simulate survival data (Cox)
set.seed(123)
n <- 100
p <- 150
X <- matrix(rnorm(n * p), n, p)
colnames(X) <- paste0("V", 1:p)
eta <- X[,1] + 0.5 * X[,2]
base_rate <- 0.05
T_event <- rexp(n, rate = base_rate * exp(eta))
C <- rexp(n, rate = 0.03)
time <- pmin(T_event, C)
status <- as.integer(T_event <= C)
y_surv <- list(time = time, status = status)
# Run S3VS for linear models
res_surv <- S3VS(y = y_surv, X = X, family = "survival",
surv_model = "COX",
method_xy = "topk", param_xy = list(k = 1),
method_xx = "topk", param_xx = list(k = 3),
vsel_method = "COXGLMNET",
method_sel = "conservative", method_rem = "conservative_begin",
sel_regout = FALSE, rem_regout = FALSE,
m = 100, nskip = 3, verbose = TRUE, seed = 123)
#> -------------
#> Iteration 1
#> -------------
#> Input Variables: V1 V119 V70
#> Selected Variables: V1 V119 V70
#> -------------
#> Iteration 2
#> -------------
#> Input Variables: V2 V43 V17
#> Selected Variables:
#> *** nskip= 1 ***
#> -------------
#> Iteration 3
#> -------------
#> Input Variables: V88 V35 V128
#> Selected Variables:
#> *** nskip= 2 ***
#> -------------
#> Iteration 4
#> -------------
#> Input Variables: V28 V117 V94
#> Selected Variables:
#> *** nskip= 3 ***
#> =================================
#> Number of selected variables: 3
#> Time taken: 0.1 sec
#> =================================
# View selected predictors
res_surv$selected
#> [1] "V1" "V119" "V70"