Base learner: Extreme gradient boosting (XGBoost)

XGBoost model is fitted at the defined rate (r_subsample) of the input dataset by grid search. With proper settings, users can utilize graphics processing units (GPU) to speed up the training process.

Usage

fit_base_xgb(
  dt_imputed,
  folds = NULL,
  tune_mode = "grid",
  tune_bayes_iter = 50L,
  learn_rate = 0.1,
  yvar = "Arithmetic.Mean",
  xvar = seq(5, ncol(dt_imputed)),
  vfold = 5L,
  device = "cuda:0",
  trim_resamples = TRUE,
  return_best = FALSE,
  ...
)

Arguments

dt_imputed: The input data table to be used for fitting.
folds: pre-generated rset object with minimal number of columns. If NULL, vfold should be numeric to be used in rsample::vfold_cv.
tune_mode: character(1). Hyperparameter tuning mode. Default is "grid", "bayes" is acceptable.
tune_bayes_iter: integer(1). The number of iterations for Bayesian optimization. Default is 50. Only used when tune_mode = "bayes".
learn_rate: The learning rate for the model. For branching purpose. Default is 0.1.
yvar: The target variable.
xvar: The predictor variables.
vfold: The number of folds for cross-validation.
device: The device to be used for training. Default is "cuda:0". Make sure that your system is equipped with CUDA-enabled graphical processing units.
trim_resamples: logical(1). Default is TRUE, which replaces the actual data.frames in splits column of tune_results object with NA.
return_best: logical(1). If TRUE, the best tuned model is returned.
...: Additional arguments to be passed.

Value

The fitted workflow.

Details

Hyperparameters mtry, ntrees, and learn_rate are tuned. With tune_mode = "grid", users can modify learn_rate explicitly, and other hyperparameters will be predefined (30 combinations per learn_rate).

Note

tune package should be 1.2.0 or higher. xgboost should be installed with GPU support.