Train a PrestoGP model.
prestogp_fit-PrestoGPModel-method.Rd
This method fits a PrestoGP model given a set of locations and predictor and outcome variables.
Usage
# S4 method for class 'PrestoGPModel'
prestogp_fit(
model,
Y,
X,
locs,
Y.names = NULL,
X.names = NULL,
scaling = NULL,
apanasovich = NULL,
covparams = NULL,
beta.hat = NULL,
tol = 0.999999,
max_iters = 100,
center.y = NULL,
impute.y = FALSE,
lod = NULL,
quiet = FALSE,
verbose = FALSE,
optim.method = "Nelder-Mead",
optim.control = list(trace = 0, reltol = 0.001, maxit = 5000),
family = c("gaussian", "binomial"),
nfolds = 10,
foldid = NULL,
parallel = FALSE
)
Arguments
- model
The PrestoGP model object being fit.
- Y
The values of the response variable(s). Should be a matrix or vector for univariate models or a list for multivariate models.
- X
The values of the predictor variable(s). Should be a matrix for univariate models or a list for multivariate models.
- locs
The values of the locations. Should be a matrix for univariate models or a list for multivariate models.
- Y.names
The name(s) for the response variable(s). Should be a vector with length equal to the number of response variables. Defaults to NULL.
- X.names
The names for the predictor variables. Should be a vector for univariate models or a list for multivariate models.
- scaling
A vector of consecutive positive integers that is used to specify which columns of locs should have the same scaling parameter. For example, in a spatiotemporal model with two spatial measures and a time measure, the value of scaling would be c(1, 1, 2). The length of scaling must match the number of columns of locs. If it is not specified, all columns of locs will have a common scale parameter.
- apanasovich
Should the multivariate Matern model described in Apanasovich et al. (2012) be used? Defaults to TRUE if there is only one scale parameter for each outcome and FALSE otherwise.
- covparams
The initial covariance parameters estimate (optional).
- beta.hat
The initial beta parameters estimates (optional).
- tol
The model is considered converged when error is not less than tol*previous_error (optional). Defaults to 0.999999.
- max_iters
Maximum number of iterations for the model fitting procedure. Defaults to 100.
- center.y
Should the Y's be mean centered before fitting the model? Defaults to TRUE for gaussian models and FALSE for binomial models.
- impute.y
Should missing Y's be imputed? Defaults to FALSE.
- lod
Limit of detection value(s). Any Y value less than lod is assumed to be missing when performing missing data imputation. Should be numeric for univariate models and a list for multivariate models, where each element of the list corresponds to an outcome. The ith element of the lod is the limit of detection for observation i. Alternatively, one can specify a single limit of detection that is assumed to be the same for all observations. If not specified, it is assumed that no limit of detection exists. Ignored if impute.y is FALSE.
- quiet
If FALSE, the penalized log likelihood and the model MSE will be printed for each iteration of the model fitting procedure. No intermediate output will be printed if TRUE. Defaults to FALSE.
- verbose
If TRUE, the estimated theta/beta parameters will be printed for each iteration of the model fitting procedure. Defaults to FALSE. Ignored if quiet is TRUE.
- optim.method
Optimization method to be used for the maximum likelihood estimation that is passed to optim. Defaults to "Nelder-Mead". See
optim
.- optim.control
Control parameter that is passed to optim. See
optim
.- family
Family parameter for the glmnet model. Currently only "gaussian" and "binomial" are supported. Defaults to "gaussian". See
glmnet
.- nfolds
Number of cross-validation folds for cv.glmnet. Defaults to 10. See
cv.glmnet
.- foldid
Optional vector of values between 1 and "nfolds" specifying what fold each observation should be assigned to in the cv.glmnet cross-validation. See
cv.glmnet
.- parallel
Should cv.glmnet use parallel "foreach" to fit each fold? Defaults to FALSE. See
cv.glmnet
.
Value
A PrestoGPModel object with slots updated based on the results of
the model fitting procedure. See PrestoGPModel-class
for
details.
References
Apanasovich, T.V., Genton, M.G. and Sun, Y. "A valid Matérn class of cross-covariance functions for multivariate random fields with any number of components", Journal of the American Statistical Association (2012) 107(497):180-193.
Messier, K.P. and Katzfuss, M. "Scalable penalized spatiotemporal land-use regression for ground-level nitrogen dioxide", The Annals of Applied Statistics (2021) 15(2):688-710.
Examples
data(soil)
soil <- soil[!is.na(soil[,5]),] # remove rows with NA's
y <- soil[,4] # predict moisture content
X <- as.matrix(soil[,5:9])
locs <- as.matrix(soil[,1:2])
# Vecchia model
soil.vm <- new("VecchiaModel", n_neighbors = 10)
soil.vm <- prestogp_fit(soil.vm, y, X, locs)
#>
#> Beginning iteration 1
#> Estimating theta...
#> Estimation of theta complete
#> Estimating beta...
#> Estimation of beta complete
#> Iteration 1 complete
#> Current penalized likelihood: 487.181
#> Current MSE: 9.104869
#> Beginning iteration 2
#> Estimating theta...
#> Estimation of theta complete
#> Estimating beta...
#> Estimation of beta complete
#> Iteration 2 complete
#> Current penalized likelihood: 487.1105
#> Current MSE: 9.107302
#> Beginning iteration 3
#> Estimating theta...
#> Estimation of theta complete
#> Estimating beta...
#> Estimation of beta complete
#> Iteration 3 complete
#> Current penalized likelihood: 487.1105
#> Current MSE: 9.107302
# Impute missing y's
miss <- sample(1:nrow(soil), 20)
y.miss <- y
y.miss[miss] <- NA
soil.vm2 <- new("VecchiaModel", n_neighbors = 10)
soil.vm2 <- prestogp_fit(soil.vm, y, X, locs, impute.y = TRUE)
#>
#> Beginning iteration 1
#> Estimating theta...
#> Estimation of theta complete
#> Estimating beta...
#> Estimation of beta complete
#> Iteration 1 complete
#> Current penalized likelihood: 487.393
#> Current MSE: 9.104869
#> Beginning iteration 2
#> Estimating theta...
#> Estimation of theta complete
#> Estimating beta...
#> Estimation of beta complete
#> Iteration 2 complete
#> Current penalized likelihood: 487.3047
#> Current MSE: 9.107217
#> Beginning iteration 3
#> Estimating theta...
#> Estimation of theta complete
#> Estimating beta...
#> Estimation of beta complete
#> Iteration 3 complete
#> Current penalized likelihood: 487.0089
#> Current MSE: 9.107062
#> Beginning iteration 4
#> Estimating theta...
#> Estimation of theta complete
#> Estimating beta...
#> Estimation of beta complete
#> Iteration 4 complete
#> Current penalized likelihood: 487.0079
#> Current MSE: 9.107144
#> Beginning iteration 5
#> Estimating theta...
#> Estimation of theta complete
#> Estimating beta...
#> Estimation of beta complete
#> Iteration 5 complete
#> Current penalized likelihood: 487.0079
#> Current MSE: 9.107144
# Impute y's missing due to limit of detection
soil.lod <- quantile(y, 0.1)
y.lod <- y
y.lod[y.lod <= soil.lod] <- NA
soil.vm3 <- new("VecchiaModel", n_neighbors = 10)
soil.vm3 <- prestogp_fit(soil.vm, y, X, locs, impute.y = TRUE, lod = soil.lod)
#>
#> Beginning iteration 1
#> Estimating theta...
#> Estimation of theta complete
#> Estimating beta...
#> Estimation of beta complete
#> Iteration 1 complete
#> Current penalized likelihood: 487.232
#> Current MSE: 9.104869
#> Beginning iteration 2
#> Estimating theta...
#> Estimation of theta complete
#> Estimating beta...
#> Estimation of beta complete
#> Iteration 2 complete
#> Current penalized likelihood: 487.1455
#> Current MSE: 9.107444
#> Beginning iteration 3
#> Estimating theta...
#> Estimation of theta complete
#> Estimating beta...
#> Estimation of beta complete
#> Iteration 3 complete
#> Current penalized likelihood: 487.1455
#> Current MSE: 9.107444
# Full model
soil.fm <- new("FullModel")
soil.fm <- prestogp_fit(soil.fm, y, X, locs)
#>
#> Beginning iteration 1
#> Estimating theta...
#> Estimation of theta complete
#> Estimating beta...
#> Estimation of beta complete
#> Iteration 1 complete
#> Current penalized likelihood: 486.3133
#> Current MSE: 9.104869
#> Beginning iteration 2
#> Estimating theta...
#> Estimation of theta complete
#> Estimating beta...
#> Estimation of beta complete
#> Iteration 2 complete
#> Current penalized likelihood: 486.2743
#> Current MSE: 9.105179
#> Beginning iteration 3
#> Estimating theta...
#> Estimation of theta complete
#> Estimating beta...
#> Estimation of beta complete
#> Iteration 3 complete
#> Current penalized likelihood: 486.2743
#> Current MSE: 9.105179
# Multivariate model
ym <- list()
ym[[1]] <- soil[,5] # predict two nitrogen concentration levels
ym[[2]] <- soil[,7]
Xm <- list()
Xm[[1]] <- Xm[[2]] <- as.matrix(soil[,c(4,6,8,9)])
locsm <- list()
locsm[[1]] <- locsm[[2]] <- locs
soil.mvm <- new("MultivariateVecchiaModel", n_neighbors = 10)
soil.mvm <- prestogp_fit(soil.mvm, ym, Xm, locsm)
#>
#> Beginning iteration 1
#> Estimating theta...
#> Estimation of theta complete
#> Estimating beta...
#> Estimation of beta complete
#> Iteration 1 complete
#> Current penalized likelihood: 1392.561
#> Current MSE: 53.9135
#> Beginning iteration 2
#> Estimating theta...
#> Estimation of theta complete
#> Estimating beta...
#> Estimation of beta complete
#> Iteration 2 complete
#> Current penalized likelihood: 1380.552
#> Current MSE: 132.9825
#> Beginning iteration 3
#> Estimating theta...
#> Estimation of theta complete
#> Estimating beta...
#> Estimation of beta complete
#> Iteration 3 complete
#> Current penalized likelihood: 1380.552
#> Current MSE: 222.1078
# Space/elevation model
data(soil250, package="geoR")
y2 <- soil250[,7] # predict pH level
X2 <- as.matrix(soil250[,c(4:6,8:22)])
# columns 1+2 are location coordinates; column 3 is elevation
locs2 <- as.matrix(soil250[,1:3])
soil.vm2 <- new("VecchiaModel", n_neighbors = 10)
# fit separate scale parameters for location and elevation
soil.vm2 <- prestogp_fit(soil.vm2, y2, X2, locs2, scaling = c(1, 1, 2))
#>
#> Beginning iteration 1
#> Estimating theta...
#> Estimation of theta complete
#> Estimating beta...
#> Estimation of beta complete
#> Iteration 1 complete
#> Current penalized likelihood: -255.9419
#> Current MSE: 0.009739877
#> Beginning iteration 2
#> Estimating theta...
#> Estimation of theta complete
#> Estimating beta...
#> Estimation of beta complete
#> Iteration 2 complete
#> Current penalized likelihood: -255.9419
#> Current MSE: 0.01132906