Good practice of using `chopin` in HPC • chopin

Assumptions

Users can access multi-threaded central processing units and sufficient memory (i.e., at least 4GB/thread)
Single-node calculation: as of version 0.3.0, chopin only supports single-node

Consider using try() or tryCatch() to let an error will not halt all the work without results, especially when using higher-level functions

Use chopin::par_make_gridset to make exhaustive and padded grid objects in SpatVector class
chopin::par_grid filters features that intersect with each grid. Post-processing is necessary as polygons may be taken more than twice in parallelized calculation.

chopin way to parallelize is to iterate calculation over a list then to use future.apply function
If one wants to customize parallelization workflow like chopin, consider:
- Make list objects with:
  - An extent vector in each element
  - Preprocessed vector objects with respect to each extent vector in the first list
  - Preferably all lists are named
- terra’s SpatRaster objects made from external files are Rcpp pointers, which cannot be exported to processes in parallelization. Thus, users should
  - Use file or database table paths instead of SpatRaster objects in future parallel processing workflow

Most of HPC systems have CPU-memory usage quota for users
Always run a small amount of data to estimate the total computational demand.
- profvis::profvis() helps to summarize runtime per function call and memory usage
For testing, use future::plan(future::sequential) to see single-thread. Keep chopin::par_make_gridset arguments the same then only take one or two grids using lapply. Preferably the hardest case will be tested to estimate the maximum peak memory usage per thread.
Submit a job with a proper amount of computational assets: each HPC management system offers tools to see statistics and task summaries. Use these records to plan asset allocation for the next time.