US EPA Toxic Release Inventory (TRI)
Kyle Messier, with assistance from GitHub Copilot
2026-05-20
Source:vignettes/tri_workflow.Rmd
tri_workflow.RmdThis article demonstrates a compact workflow for EPA TRI facility emissions data.
This vignette runs its live workflow when rendered locally. The heavy
download, processing, extraction, and plotting chunks are skipped
automatically on CI, CRAN checks, and pkgdown builds; set
AMADEUS_RUN_VIGNETTES=true to force live execution in those
environments.
Available inputs and data availability
download_tri() works with EPA’s annual TRI basic data
files and exposes a small set of high-value selectors:
-
jurisdictionsupports the nationwide file ("US"), any two-letter state or territory code such as"NC", or the tribal file ("tbl"). -
yearaccepts a single year or a start/end pair, so multi-year requests download one annual TRI file per year. - TRI downloads are delivered directly as CSV files rather than zip archives.
- Output names reflect the jurisdiction requested: U.S.-wide files
keep the historical
tri_raw_<year>.csvpattern, while state and tribal requests append a jurisdiction suffix such as_NCor_tbl. - TRI does not require authentication. Because these are annual facility-reported releases and waste-management totals, temporal resolution is yearly.
Download representative requests
directory_to_save <- file.path(tempdir(), "tri_workflow")
download_data(
dataset_name = "tri",
year = 2023L,
jurisdiction = "US",
directory_to_save = directory_to_save,
acknowledgement = TRUE
)Demonstate processing and covariate calculation for a single chemical
The helper function get_tri_info() lists the available
chemicals in the downloaded files, which can be used to filter the
processing and covariate calculation steps. Here we demonstrate with
Polychlorinated biphenyls (PCBs), a group of persistent organic
pollutants that were widely used in industrial applications.
chems <- get_tri_info(path = directory_to_save, type = "chemicals")
processed_pcb <- process_covariates(
covariate = "tri",
path = directory_to_save,
chemical = c("Polychlorinated biphenyls"),
year = 2023
)
# Note that extent is an option in process_covariates() to limit the domain Calculate covariates at points
domain_x <- c(terra::xmin(processed_pcb), terra::xmax(processed_pcb))
domain_y <- c(terra::ymin(processed_pcb), terra::ymax(processed_pcb))
domain_dx <- diff(domain_x)
domain_dy <- diff(domain_y)
candidate_xy <- expand.grid(
lon = seq(domain_x[1] + 0.12 * domain_dx, domain_x[2] - 0.12 * domain_dx, length.out = 5),
lat = seq(domain_y[1] + 0.12 * domain_dy, domain_y[2] - 0.12 * domain_dy, length.out = 5)
)
example_points_sf <- sf::st_as_sf(
candidate_xy,
coords = c("lon", "lat"),
crs = 4326
)
example_points_sf$site_id <- paste0("site_", seq_len(nrow(example_points_sf)))
point_values_pcb <- calculate_covariates(
covariate = "tri",
from = processed_pcb,
locs = example_points_sf,
locs_id = "site_id",
decay_range = 5000, # 5 km decay range for illustrative purposes
use_threshold = FALSE,
geom = "sf"
)Plot the covariates at points along with the facility locations
pcb_sf <- sf::st_as_sf(processed_pcb)
point_basemap <-
sf::st_as_sf(maps::map("usa", plot = FALSE, fill = FALSE))
ggplot() +
geom_sf(data = point_basemap, fill = "gray80", color = "white") +
geom_sf(data = pcb_sf, color = "red", size = 2) +
geom_sf(data = point_values_pcb, aes(color = STACK_AIR_0001336363_05000), size = 3) Demonstate with multiple chemicals and a total emissions
processed_chems <- process_covariates(
covariate = "tri",
path = directory_to_save,
chemical = c("Polychlorinated biphenyls", "Vinyl Chloride", "Tetrachloroethylene"),
year = 2023,
variables = "ON-SITE RELEASE TOTAL"
)
point_values_chems <- calculate_covariates(
covariate = "tri",
from = processed_chems,
locs = example_points_sf,
locs_id = "site_id",
decay_range = 5000, # 5 km decay range for illustrative purposes
use_threshold = FALSE,
geom = "sf"
)