S3VS_SURV performs variable selection based on the structured screen-and-select framework in survival models.

S3VS_SURV(y, X, surv_model = c("COX", "AFT"), 
  method_xy = c("topk", "fixedmuthresh", "percmuthresh"), param_xy, 
  method_xx = c("topk", "fixedcorthresh", "perccorthresh"), param_xx, 
  vsel_method = c("LASSO", "ENET", "AFTGEE", "BRIDGE", "PVAFT"), 
  alpha = 0.5,
  method_sel = c("conservative", "liberal"), 
  method_rem = c("conservative_begin", "conservative_end", "liberal"), 
  m = 100, nskip = 3, verbose = FALSE, seed = NULL, parallel = FALSE)

Arguments

y

Response. A list with components time and status (1 = event, 0 = censored).

X

Design matrix of predictors. Can be a base matrix or something as.matrix() can coerce. No missing values are allowed.

surv_model

Character string specifying the survival model. Must be explicitly provided; there is no default. Values are "Cox" for proportional hazards models, "AFT" for accelerated failure time models.

method_xy

Rule for screening some predictors as "leading variables" based on their association with the response; one of c("topk", "fixedthresh", "percthresh"). The association measure is marginal utility.

"topk" keeps the predictors with the largest \(k\) association values; "fixedthresh" keeps predictors whose association is greater than or equal to a specified threshold; "percthresh" keeps predictors whose association is within a given percentage of the best.

param_xy

Tuning parameter for method_xy. If "topk", supply a list with an integer k (keep the top \(k\)). If "fixedthresh", supply a list with a numeric threshold thresh (keep predictors with association \(\ge\) threshold). If "percthresh", supply a list with a numeric percentage thresh in \((0,100]\) (keep predictors with association \(\ge\) that percent of the highest association).

method_xx

Rule for constructing, for each leading variable, the set of associated predictors (the "leading set") using inter-predictor association (absolute value of the correlation coefficient); one of c("topk", "fixedthresh", "percthresh") with same interpretation as method_xy.

param_xx

Tuning parameter for method_xx; same interpretation as param_xy but applied to inter-predictor association (absolute value of the correlation coefficient).

vsel_method

Character string specifying the variable selection method to be used within each leading set. Available options are "LASSO", "ENET" for surv_model=COX and "AFTGEE", "BRIDGE", "PVAFT" for surv_model=AFT.

alpha

Only used when vsel_method == "ENET". Elastic net mixing parameter, with \(\alpha \in (0,1)\).

method_sel

Policy for aggregating predictors selected across leading sets in an iteration; one of c("conservative","liberal"). "conservative" selects the smallest admissible set of predictors by intersecting the selected sets of predictors across leading sets, beginning with all and gradually reducing from the end until a non-empty intersection is found; this ensures only predictors consistently selected across leading sets are retained. "liberal" selects the largest admissible set of predictors by taking the union of all selected sets of predictors, so any predictor chosen in at least one leading set is included. If no predictor is selected from the first leading set, the iteration does not contribute to final selection and exclusion rules (method_rem) are applied instead.

method_rem

Policy for excluding predictors when no selections are made in an iteration; one of c("conservative_begin","conservative_end","liberal"). "conservative_begin" excludes the smallest admissible set of predictors by intersecting the non-selected sets of predictors starting from the first leading set; "conservative_end" does the same but begins from the last leading set and moves backward; "liberal" excludes the largest admissible set of predictors by taking the union of all non-selected sets of predictor. Predictors excluded under this rule are removed from subsequent iterations.

m

Integer. Maximum number of S3VS iterations to perform. Defaults to 100.

nskip

Integer. Maximum number of iterations in which no new predictors are selected before the algorithm stops. Defaults to 3.

verbose

Logical. If TRUE, prints detailed progress information at each iteration (e.g., iteration number, predictors selected or removed). Defaults to FALSE.

seed

If supplied, sets the random seed via set.seed() to ensure reproducibility of stochastic components. If NULL, no seed is set.

parallel

Logical. If TRUE, attempts to perform some computations in parallel mode, which is strongly recommended for faster execution. Defaults to FALSE.

Details

For a survival type response, S3VS considers two choices of models–the Cox model $$ \lambda(t\mid \boldsymbol{x}_i) = \lambda_0(t) \exp(\boldsymbol{x}_i^T \boldsymbol{\beta}) $$ and the AFT model $$ \log(\boldsymbol{T}) = \boldsymbol{X}\boldsymbol{\beta} + \boldsymbol{\epsilon} $$

For the S3VS algorithm, see the manual of the top-level function S3VS.

Value

A list with the following components:

selected

A character vector of predictor names that were selected across all iterations.

selected_iterwise

A list recording the predictors selected at each iteration, in the order they were considered.

runtime

Runtime in seconds.

Author

Nilotpal Sanyal <nsanyal@utep.edu>, Padmore N. Prempeh <pprempeh@albany.edu>

Examples

# Simulate survival data (Cox)
set.seed(123)
n <- 100
p <- 150
X <- matrix(rnorm(n * p), n, p)
colnames(X) <- paste0("V", 1:p)
eta <- X[,1] + 0.5 * X[,2]
base_rate <- 0.05
T_event <- rexp(n, rate = base_rate * exp(eta))
C <- rexp(n, rate = 0.03)
time <- pmin(T_event, C)
status <- as.integer(T_event <= C)
y_surv <- list(time = time, status = status)
# Run S3VS for linear models
res_surv <- S3VS(y = y_surv, X = X, family = "survival", 
                 surv_model = "COX", 
                 method_xy = "topk", param_xy = list(k = 1),
                 method_xx = "topk", param_xx = list(k = 3),
                 vsel_method = "COXGLMNET",
                 method_sel = "conservative", method_rem = "conservative_begin",
                 sel_regout = FALSE, rem_regout = FALSE, 
                 m = 100, nskip = 3, verbose = TRUE, seed = 123)
#> -------------
#> Iteration 1
#> -------------
#> Input Variables: V1 V119 V70 
#> Selected Variables: V1 V119 V70 
#> -------------
#> Iteration 2
#> -------------
#> Input Variables: V2 V43 V17 
#> Selected Variables: 
#> *** nskip= 1 *** 
#> -------------
#> Iteration 3
#> -------------
#> Input Variables: V88 V35 V128 
#> Selected Variables: 
#> *** nskip= 2 *** 
#> -------------
#> Iteration 4
#> -------------
#> Input Variables: V28 V117 V94 
#> Selected Variables: 
#> *** nskip= 3 *** 
#> =================================
#> Number of selected variables: 3
#> Time taken: 0.1 sec
#> =================================
# View selected predictors
res_surv$selected
#> [1] "V1"   "V119" "V70"