Computational Considerations
Kyle Messier, with assistance from GitHub Copilot
2026-05-20
Source:vignettes/computational_considerations.Rmd
computational_considerations.RmdWhy parallelize amadeus workflows?
amadeus relies on spatial tools such as
terra and exactextractr. These packages use
efficient C++ implementations for raster, vector, and extraction
operations, so each individual operation is already optimized for common
workflows.
Wall-clock time can still grow quickly as spatial extents, time
ranges, variables, or extraction locations increase. Many
amadeus workflows parallelize naturally because dates,
variables, and location chunks can often be processed independently and
recombined after each worker writes its result.
We provide a few examples using common R parallel backend packages
such as purr, future + furrr,
mirai, and targets.
Reference: sequential purrr baseline
if (requireNamespace("purrr", quietly = TRUE)) {
library(purrr)
dates <- seq.Date(
as.Date("2022-01-01"),
as.Date("2022-01-05"),
by = "day"
)
results <- purrr::map(dates, function(d) {
process_covariates(
covariate = "narr",
date = c(d, d),
variable = "weasd",
path = "/path/to/narr"
)
})
}Parallel with future + furrr
terra::SpatRaster objects should not be returned across
worker boundaries. Instead, write each worker result to a file path and
load those files back in the parent process.
if (
requireNamespace("future", quietly = TRUE) &&
requireNamespace("furrr", quietly = TRUE) &&
requireNamespace("terra", quietly = TRUE)
) {
dates <- seq.Date(
as.Date("2022-01-01"),
as.Date("2022-01-05"),
by = "day"
)
future::plan(future::multisession, workers = 4)
raster_paths <- furrr::future_map_chr(dates, function(d) {
worker_dir <- file.path(tempdir(), paste0("amadeus-", format(d)))
dir.create(worker_dir, recursive = TRUE, showWarnings = FALSE)
processed <- amadeus::process_covariates(
covariate = "narr",
date = c(d, d),
variable = "weasd",
path = "/path/to/narr"
)
out_path <- file.path(worker_dir, paste0("weasd-", format(d), ".tif"))
terra::writeRaster(processed, out_path, overwrite = TRUE)
out_path
})
rasters <- lapply(raster_paths, terra::rast)
future::plan(future::sequential)
}Parallel with mirai
The same file-path handoff pattern applies when using
mirai workers.
if (
requireNamespace("mirai", quietly = TRUE) &&
requireNamespace("terra", quietly = TRUE)
) {
dates <- seq.Date(
as.Date("2022-01-01"),
as.Date("2022-01-05"),
by = "day"
)
mirai::daemons(4)
raster_paths <- mirai::mirai_map(dates, .f = function(d) {
worker_dir <- file.path(tempdir(), paste0("amadeus-", format(d)))
dir.create(worker_dir, recursive = TRUE, showWarnings = FALSE)
processed <- amadeus::process_covariates(
covariate = "narr",
date = c(d, d),
variable = "weasd",
path = "/path/to/narr"
)
out_path <- file.path(worker_dir, paste0("weasd-", format(d), ".tif"))
terra::writeRaster(processed, out_path, overwrite = TRUE)
out_path
})
rasters <- lapply(unlist(raster_paths), terra::rast)
mirai::daemons(0)
}Reproducible pipelines with targets
A _targets.R file can make date grids explicit and skip
work that is already up to date.
if (
requireNamespace("targets", quietly = TRUE) &&
requireNamespace("tarchetypes", quietly = TRUE) &&
requireNamespace("terra", quietly = TRUE)
) {
library(targets)
tar_option_set(packages = c("amadeus", "terra"))
dates <- seq.Date(
as.Date("2022-01-01"),
as.Date("2022-01-05"),
by = "day"
)
list(
tar_target(date_grid, dates),
tarchetypes::tar_map(
values = data.frame(date = dates),
tar_target(
processed_path,
{
processed <- process_covariates(
covariate = "narr",
date = c(date, date),
variable = "weasd",
path = "/path/to/narr"
)
out_path <- file.path(
tempdir(),
paste0("weasd-", format(date), ".tif")
)
terra::writeRaster(processed, out_path, overwrite = TRUE)
out_path
},
format = "file"
)
)
)
}Caveats and gotchas
-
terra::SpatRasterobjects cannot safely cross worker boundaries; pass file paths between workers and the parent process instead. - Be respectful of upstream APIs and rate-limit downloads. A sequential pre-download step is often safer than parallel downloads.
- Aggregate disk usage can grow quickly. Use a worker-specific
tempdir()path and clean up intermediate files when they are no longer needed. - For very large grids, Dask or spatial chunking with
terra::makeTiles()may outperform process-level parallelism.