This function subsets the full data by column subsamples (rate=50%) The optimal hyperparameter search is performed based on spatiotemporal cross-validation schemes. As of version 0.4.5, users can define metric used for selecting best hyperparameter set (default = "rmse").
Arguments
- data
data.frame. Full data.frame of base learner predictions and AQS spatiotemporal identifiers.
attach_pred
.- c_subsample
numeric(1). Rate of column resampling. Default is 0.5.
- r_subsample
numeric(1). The proportion of rows to be used. Default is 1.0, which uses full dataset but setting is required to balance groups generated with
make_subdata
- yvar
character(1). Outcome variable name
- target_cols
characters(1). Columns from
data
to be retained during column resampling. Default is c("site_id", "time", "Event.Type", "lon", "lat").- args_generate_cv
List of arguments to be passed to
switch_generate_cv_rset
function.- tune_iter
integer(1). Bayesian optimization iterations. Default is 50.
- trim_resamples
logical(1). Default is TRUE, which replaces the actual data.frames in splits column of
tune_results
object with NA. Passed tofit_base_tune
.- return_best
logical(1). If TRUE, the best tuned model is returned. Passed to
fit_base_tune
.- metric
character(1). The metric to be used for selecting the best. Must be one of "rmse", "rsq", "mae". Default = "rmse". Passed to
fit_base_tune
.
Value
List of 3, including the best-fit model, the best hyperparameters,
and the all performance records from tune::tune_bayes()
.
Note that the meta learner function returns the best-fit model,
not predicted values.