Stadtman Tenure Track Investigator
National Institute of Environmental Health Sciences
Division of Translational Toxicology
ENVR 500 Guest Lecture: February 22, 2024
\[ \newcommand\by{{\mathbf{y}}} \newcommand\bY{{\mathbf{Y}}} \]
Eva Marques
Daniel Zilber
Ranadeep Daw
Mariana Alifa
Insang Song
Kyle Messier
Mitchell Manware (Not Pictured)
Matheron and Krige developed geostatistical methods to predict ore content from core samples
Matheron coined the term “Kriging” after Krige
“Nugget” is a term used to random noise because predicting where gold nuggets were was so difficult
Matérn developed correlation models for spatial variation for applications in Forestry
To this day, we use the “Matérn” covariance function
Cressie, 1990: Statistics for Spatial Data
Waller and Gotway, 2004: Applied Statistics for Public Health Data
Wide scale adoption for statisticians and engineers in ecological and human exposure and risk applications
Questions:
In figure C, what is an example of geospatial health data geometry at a point? Check all that apply.
In figure I, what is an example of geospatial health data geometry at an area? Check all that apply.
Questions:
Many types of models are used for geospatial exposure assessment.
Proximity
Index
Land Use Regression
Geographically Weighted Regression
Kriging and Gaussian Processes
Machine Learning
Mechanistic Models
Satellite Imagery
Hybrid / Ensemble Models
Response
\[ Y_i \in \{1, ..., n\} \] \(p\) covariates:
\[ X = X_{i1}, ..., X_{ip} \]
The model is :
\[ Y_i = \beta_0 + X_{i1}\beta_1 + , ..., X_{ip}\beta_p \]
\[ E(Y_i) = \mu_i = \beta_0 + X_{i1}\beta_1 + ... + X_{ip}\beta_p \]
\[ \varepsilon_i = Y_i - \mu_i \]
As a method, not a terrible idea
Unbiased
Overconfident error estiamte (p-values, SE)
Model Selection (Type 1 and 2 error)
Linear regression for spatial data
\[ \bY(s) = X(s)\beta + \varepsilon \]
\[ X(s) = X_{i1}(s_1), ..., X_{ip}(s_p) \]
\[ \bY(s) = \mu(s) + \varepsilon + \eta(s) \]
\(\mu(s)\) can take many forms such as linear, nonlinear, or even machine learning models such as random forest.
More details on a spatial covariance model later
Proximity based metrics are the most basic form of an exposure assessment because they rely only on the distance between a pollution source and the observed outcome location.
A proximity model is simply a deterministic covariate based on distance:
\[ Y(s) = X(s) \]
Given a matrix of distances between monitoring locations and pollution sources, \(d_{ij}\), minimum and average distance are:
\[ X_i^{min} = min(d_{i,\cdot}) \\ \overline{X}_i = \frac{1}{n_j}\sum_{j=1}^{n_j}d_{ij}\\ \]
Questions:
Index variables are deterministic quantities that summarize (e.g. prinicipal components) multiple complex variables of interest into a simple and interpretable metric.
Covariates are made up of geographic variables across different domains
Social Vulnerability Index, Climate Vulnerability Index, etc.
\[ \bY(s) = X(s)\beta + \varepsilon \] - where \(\bY(p)\) are the \(n \times 1\) observations for the variable of interest (e.g. PM\(_{2.5}\), \(NO_3^{-}\), etc.). \(X(s)\) is a \(n \times k\) design matrix of \(k\) spatial geographic covariates
Key Steps
\[ \bY(s) = \mu(s) + \eta(s) \]
\(\mu(s)\) can take many forms such as linear, nonlinear, or even machine learning models such as random forest
\(\eta \sim N_n(0,\Sigma_{\theta}+\tau^2I)\)
\(\Sigma_{\theta}\) is a covariance matrix with parameters, \(\theta\), that accounts for correlation between spatial and temporal locations
Question:
Machine learning (ML) is the general culture, philosophy, or school of thought for predictive modeling that focuses on the learning algorithm and out-of-sample prediction generalization
I would be remiss if I didn’t mention another entirely different class of exposure models
Mechanistic models are not statistical models
Mechanistic models are based on physics and chemistry
Requia, Weeberb J., et al. “An ensemble learning approach for estimating high spatiotemporal resolution of ground-level ozone in the contiguous United States.” Environmental science & technology 54.18 (2020): 11037-11047.
Yu, Wenhua, et al. “Deep ensemble machine learning framework for the estimation of PM 2.5 concentrations.” Environmental Health Perspectives 130.3 (2022): 037004.
Questions