Skip to contents

This method fits a PrestoGP model given a set of locations and predictor and outcome variables.

Usage

# S4 method for PrestoGPModel
prestogp_fit(
  model,
  Y,
  X,
  locs,
  scaling = NULL,
  apanasovich = NULL,
  covparams = NULL,
  beta.hat = NULL,
  tol = 0.999999,
  max_iters = 100,
  center.y = NULL,
  impute.y = FALSE,
  lod = NULL,
  verbose = FALSE,
  optim.method = "Nelder-Mead",
  optim.control = list(trace = 0, reltol = 0.001, maxit = 5000),
  family = c("gaussian", "binomial"),
  nfolds = 10,
  foldid = NULL,
  parallel = FALSE
)

Arguments

model

The PrestoGP model object being fit.

Y

The values of the response variable(s). Should be a matrix or vector for univariate models or a list for multivariate models.

X

The values of the predictor variable(s). Should be a matrix for univariate models or a list for multivariate models.

locs

The values of the locations. Should be a matrix for univariate models or a list for multivariate models.

scaling

A vector of consecutive positive integers that is used to specify which columns of locs should have the same scaling parameter. For example, in a spatiotemporal model with two spatial measures and a time measure, the value of scaling would be c(1, 1, 2). The length of scaling must match the number of columns of locs. If it is not specified, all columns of locs will have a common scale parameter.

apanasovich

Should the multivariate Matern model described in Apanasovich et al. (2012) be used? Defaults to TRUE if there is only one scale parameter for each outcome and FALSE otherwise.

covparams

The initial covariance parameters estimate (optional).

beta.hat

The initial beta parameters estimates (optional).

tol

The model is considered converged when error is not less than tol*previous_error (optional). Defaults to 0.999999.

max_iters

Maximum number of iterations for the model fitting procedure. Defaults to 100.

center.y

Should the Y's be mean centered before fitting the model? Defaults to TRUE for gaussian models and FALSE for binomial models.

impute.y

Should missing Y's be imputed? Defaults to FALSE.

lod

Limit of detection value(s). Any Y value less than lod is assumed to be missing when performing missing data imputation. Should be numeric for univariate models and a list for multivariate models, where each element of the list corresponds to an outcome. The ith element of the lod is the limit of detection for observation i. Alternatively, one can specify a single limit of detection that is assumed to be the same for all observations. If not specified, it is assumed that no limit of detection exists. Ignored if impute.y is FALSE.

verbose

If TRUE, additional information about model fit will be printed. Defaults to FALSE.

optim.method

Optimization method to be used for the maximum likelihood estimation that is passed to optim. Defaults to "Nelder-Mead". See optim.

optim.control

Control parameter that is passed to optim. See optim.

family

Family parameter for the glmnet model. Currently only "gaussian" and "binomial" are supported. Defaults to "gaussian". See glmnet.

nfolds

Number of cross-validation folds for cv.glmnet. Defaults to 10. See cv.glmnet.

foldid

Optional vector of values between 1 and "nfolds" specifying what fold each observation should be assigned to in the cv.glmnet cross-validation. See cv.glmnet.

parallel

Should cv.glmnet use parallel "foreach" to fit each fold? Defaults to FALSE. See cv.glmnet.

Value

A PrestoGPModel object with slots updated based on the results of the model fitting procedure. See PrestoGPModel-class for details.

References

  • Apanasovich, T.V., Genton, M.G. and Sun, Y. "A valid Matérn class of cross-covariance functions for multivariate random fields with any number of components", Journal of the American Statistical Association (2012) 107(497):180-193.

  • Messier, K.P. and Katzfuss, M. "Scalable penalized spatiotemporal land-use regression for ground-level nitrogen dioxide", The Annals of Applied Statistics (2021) 15(2):688-710.

Examples

data(soil)
soil <- soil[!is.na(soil[,5]),] # remove rows with NA's
y <- soil[,4]                   # predict moisture content
X <- as.matrix(soil[,5:9])
locs <- as.matrix(soil[,1:2])

# Vecchia model
soil.vm <- new("VecchiaModel", n_neighbors = 10)
soil.vm <- prestogp_fit(soil.vm, y, X, locs)

# Impute missing y's
miss <- sample(1:nrow(soil), 20)
y.miss <- y
y.miss[miss] <- NA
soil.vm2 <- new("VecchiaModel", n_neighbors = 10)
soil.vm2 <- prestogp_fit(soil.vm, y, X, locs, impute.y = TRUE)

# Impute y's missing due to limit of detection
soil.lod <- quantile(y, 0.1)
y.lod <- y
y.lod[y.lod <= soil.lod] <- NA
soil.vm3 <- new("VecchiaModel", n_neighbors = 10)
soil.vm3 <- prestogp_fit(soil.vm, y, X, locs, impute.y = TRUE, lod = soil.lod)

# Full model
soil.fm <- new("FullModel")
soil.fm <- prestogp_fit(soil.fm, y, X, locs)

# Multivariate model
ym <- list()
ym[[1]] <- soil[,5]             # predict two nitrogen concentration levels
ym[[2]] <- soil[,7]
Xm <- list()
Xm[[1]] <- Xm[[2]] <- as.matrix(soil[,c(4,6,8,9)])
locsm <- list()
locsm[[1]] <- locsm[[2]] <- locs

soil.mvm <-  new("MultivariateVecchiaModel", n_neighbors = 10)
soil.mvm <- prestogp_fit(soil.mvm, ym, Xm, locsm)

# Space/elevation model
data(soil250, package="geoR")
y2 <- soil250[,7]               # predict pH level
X2 <- as.matrix(soil250[,c(4:6,8:22)])
# columns 1+2 are location coordinates; column 3 is elevation
locs2 <- as.matrix(soil250[,1:3])

soil.vm2 <- new("VecchiaModel", n_neighbors = 10)
# fit separate scale parameters for location and elevation
soil.vm2 <- prestogp_fit(soil.vm2, y2, X2, locs2, scaling = c(1, 1, 2))