Skip to contents

This function generates a spatio-temporal cross-validation index based on the anticlust package. The function first calculates the spatial clustering index using the balanced_clustering function as default, and if cv_pairs is provided, it generates rank-based pairs based on the proximity between cluster centroids.

Usage

generate_cv_index(
  data,
  target_cols = c("lon", "lat", "time"),
  preprocessing = c("none", "normalize", "standardize"),
  cv_fold = 5L,
  cv_pairs = NULL,
  pairing = c("1", "2"),
  cv_mode = "spt",
  ...
)

Arguments

data

data.table with X, Y, and time information.

target_cols

character(3). Names of columns for X, Y, and time. Default is c("lon", "lat", "time"). Order insensitive.

preprocessing

character(1). Preprocessing method.

  • "none": no preprocessing.

  • "normalize": normalize the data.

  • "standardize": standardize the data.

cv_fold

integer(1). Number of folds for cross-validation. default is 5L.

cv_pairs

integer(1). Number of pairs for cross-validation. This value will be used to generate a rank-based pairs based on target_cols values.

pairing

character(1) Pair selection method.

  • "1": search the nearest for each cluster then others are selected based on the rank.

  • "2": rank the pairwise distances directly

cv_mode

character(1). Spatiotemporal cross-validation indexing

...

Additional arguments to be passed.

Value

rsample::manual_rset() object.

Note

nrow(data) %% cv_fold should be 0.

Author

Insang Song