4 Raster Data

Profile-CMP Profile-SBS Profile-STU

Raster Data Access, Preparation, and Exploratory Analysis in R

Date Modified: January 21, 2025

Authors: Mitchell Manware author-mm, Kyle P. Messier author-kpm

Key Terms: Geospatial Data, Weather

Programming Language: R

4.1 Introduction

Air temperature data from the United States National Oceanic and Atmospheric Administration’s (NOAA) North American Regional Reanalysis (NARR) data set will be used to demonstrate using raster (grid) data with the terra package (9).

This tutorial will demonstrate the following steps with raster data in R:

  • Downloading data from a URL
  • Importing data
  • Checking data type, structure, and class
  • Reclassifying data
  • Computing summary and zonal statistics
  • Plotting individual and multiple data sets

The exploratory analyses in this unit are designed for educational purposes only. The results of the following analyses are not peer-reviewed findings, nor are they based on any hypotheses.

4.2 Access and Download

The website URL where the NOAA NARR exists is year-specific, meaning there is a unique URL for each annual data set. For the purpose of these exploratory analyses, air temperature data from the year 2021 will be used

Define the variable year according to the year of interest.

year <- "2021"

The utils::download.file() function downloads the file according to the defined URL and destination file.

Multiple chunks of code in this tutorial will contain ./dataset/, a fixed directory path to run the tutorial code. To run the code on your machine, substitute ./dataset/ with the file path where you would like to store the tutorial data.

# specify the URL where data is stored based on year variable of interest
url_narr <- paste0(
  "https://downloads.psl.noaa.gov//Datasets/NARR/Dailies/monolevel/air.2m.",
  year,
  ".nc"
)

# specify where to save downloaded data
destination_narr <- paste0(
  "./dataset/narr/air.2m_",
  year,
  ".nc"
)

# download the data
download.file(
  url_narr,
  destination_narr
)

Identify the desired data file with utils::list.files()

list.files("./dataset/narr")

The file downloaded from NOAA’s NARR data set is an .nc, or netCDF, file. NetCDF files are common for raster data, and do not need to be unzipped.

4.3 Data Preparation

4.3.1 Import

Now that the data file of interest has been downloaded and identified, import the data with terra::rast().

narr <- terra::rast("./dataset/narr/air.2m_2021.nc")

4.3.2 Inspect Structure

Inspect the structure of narr to see its class, dimensions, variables, and layer names.

narr
## class       : SpatRaster 
## dimensions  : 277, 349, 365  (nrow, ncol, nlyr)
## resolution  : 32462.99, 32463  (x, y)
## extent      : -16231.49, 11313351, -16231.5, 8976020  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=lcc +lat_0=50 +lon_0=-107 +lat_1=50 +lat_2=50 +x_0=5632642.22547 +y_0=4612545.65137 +datum=WGS84 +units=m +no_defs 
## source      : RtmppKZsnDair.2m_2021.nc:air 
## varname     : air (Daily Air Temperature at 2 m) 
## names       : air_1, air_2, air_3, air_4, air_5, air_6, ... 
## unit        :     K,     K,     K,     K,     K,     K, ... 
## time        : 2021-01-01 to 2021-12-31 UTC

When working with raster data, the dimensions of the raster refer to the number of rows (nrow) and columns (ncol) of grid cells that make up the raster. Similarly, the number of layers in the raster object (nlyr) represents the number of observations of the data. These can also be inspected individually with nrow(), ncol(), and nlyr(), respectively.

nrow(narr)
## [1] 277
ncol(narr)
## [1] 349
terra::nlyr(narr)
## [1] 365

4.3.3 Rename

The narr data set contains 365 layers, one for each daily observation of air temperature in 2021. When working with raster data that contains multiple layers, it is important to know and recognize the naming structure of each layer. In this case, the layer names are air_ followed by the day of the year (i.e., January 1 = air_1).

names(narr)[1:5]
## [1] "air_1" "air_2" "air_3" "air_4" "air_5"

Renaming raster layers can be useful for calculating summary statistics or when combining rasters with potentially identical layer names. Using the time() and gsub() functions, the layers can be renamed according to their date.

names(narr) <- paste0(
  "air_",
  gsub(
    "-",
    "",
    as.character(terra::time(narr))
  )
)

Check the layer names again to ensure proper renaming.

names(narr)[1:5]
## [1] "air_20210101" "air_20210102" "air_20210103" "air_20210104" "air_20210105"

4.3.4 Coordinate Reference System and Projection

Check the coordinate reference system of a SpatRaster object with terra::crs().

terra::crs(
  narr,
  describe = TRUE
)
##      name authority code area       extent
## 1 unnamed      <NA> <NA> <NA> NA, NA, ....

narr has a native coordinate reference system, but it is unnamed and was not identifiable by terra::rast(). The area of interest for these exploratory analyses is the conterminous United States, so we can project narr to the Albers Equal Area projected coordinate system (EPSG code: 5070).

narr_p <- terra::project(
  narr,
  "EPSG:5070"
)

We want to create plots with and analyze the air temperature data as it relates to the conterminous United States state boundaries. Subset states_t to remove Alaska, Hawaii, and the United States territories.

See the Polygon Data Chapter Section Plot (Multiple) for the steps for obtaining the states_t data set.

remove <- c(
  "Alaska",
  "Hawaii",
  "Puerto Rico",
  "United States Virgin Islands",
  "Commonwealth of the Northern Mariana Islands",
  "Guam",
  "American Samoa"
)

conus_t <- subset(
  states_t,
  !states_t$NAME %in% remove
)

Project conus_t to the Albers Equal Area projected coordinate system (EPSG code: 5070).

conus_t <- terra::project(
  conus_t,
  "EPSG:5070"
)

Ensure that both data sets have the same coordinate reference system.

terra::crs(conus_t) == terra::crs(narr_p)
## [1] TRUE

4.4 Exploratory Analysis

4.4.1 Plot (Multiple)

Now that the data sets and coordinate reference systems have been prepared, create a plot with ggplot2::gglot(). Identifying the data sets to be plotted within the tidyterra::geom_spatraster() and tidyterra::geom_spatvector() argument informs the function that the narr_p and conus_t data sets are SpatRaster and SpatVector objects, respectively (10).

Only the first layer of the narr_p data set will be plotted.

ggplot2::ggplot() +
  tidyterra::geom_spatraster(data = narr_p$air_20210101) +
  ggplot2::scale_fill_continuous(
    type = "viridis",
    na.value = NA,
    "Temperature (°K)"
  ) +
  ggplot2::ggtitle("Air Temperature") +
  tidyterra::geom_spatvector(
    data = conus_t,
    fill = "transparent",
    color = "black"
  ) +
  ggpubr::theme_pubr(legend = "bottom") +
  ggplot2::theme(plot.title = element_text(hjust = 0.5)) +
  ggpubr::grids()

4.4.2 Crop

The terra::crop() function can be used to reduce a SpatRaster to an area determined by SpatVector polygons. The mask = TRUE argument crops to the border of the polygons, whereas mask = FALSE crops to the bounding box surrounding the polygons. In this example, crop the narr data to the conterminous United States state boundaries.

narr_crop <- terra::crop(
  narr_p,
  conus_t,
  mask = TRUE
)

Plot the cropped temperature data and the conterminous United States state boundaries.

Only the first layer of the narr data set will be plotted.

ggplot2::ggplot() +
  tidyterra::geom_spatraster(data = narr_crop$air_20210101) +
  ggplot2::scale_fill_continuous(
    type = "viridis",
    na.value = NA,
    "Temperature (°K)"
  ) +
  ggplot2::ggtitle("Air Temperature (cropped)") +
  tidyterra::geom_spatvector(
    data = conus_t,
    fill = "transparent",
    color = "black"
  ) +
  ggpubr::theme_pubr(legend = "bottom") +
  ggplot2::theme(
    plot.title = element_text(hjust = 0.5),
    axis.line = element_blank(),
    axis.ticks = element_blank(),
    axis.text = element_blank()
  )

4.4.3 Units

The previous plot depicts the 2 m air temperature data in degrees Kelvin. To convert the units to degrees Celsius, simply subtract the values in narr_crop by 273.15. This subtraction will be applied to every grid cell within every layer of narr_crop.

narr_crop_c <- narr_crop - 273.15

4.4.4 Summary Statistics

Similar to mathematical operations, calculating summary statistics is very straight forward with raster data. For this example, calculate the annual mean 2m air temperature and range of 2m air temperatures at each grid cell, and save these as a new layer in narr_crop_c

narr_crop_c$mean <- mean(narr_crop_c)
narr_crop_c$range <- max(narr_crop_c) - min(narr_crop_c)

Inspect the results of the mean and range calculations.

summary(narr_crop_c$mean)
##       mean       
##  Min.   :-1.593  
##  1st Qu.: 9.397  
##  Median :12.493  
##  Mean   :13.203  
##  3rd Qu.:17.011  
##  Max.   :25.966  
##  NA's   :5192
summary(narr_crop_c$range)
##      range       
##  Min.   : 6.333  
##  1st Qu.:35.074  
##  Median :39.336  
##  Mean   :41.220  
##  3rd Qu.:47.790  
##  Max.   :63.791  
##  NA's   :5192

With the summary statistic layers prepared, create a plot with ggplot2::ggplot(). Identifying the data sets to be plotted within the tidyterra::geom_spatraster() and tidyterra::geom_spatvector() argument informs the function that the narr_crop_c and conus_t data sets are SpatRaster and SpatVector objects, respectively. Additionally, the facet_wrap(~lyr) argument creates a plot for each layer specified in geom_spatraster(data = narr_crop_c[[c("mean", "range")]]).

ggplot2::ggplot() +
  tidyterra::geom_spatraster(data = narr_crop_c[[c("mean", "range")]]) +
  ggplot2::scale_fill_continuous(
    type = "viridis",
    na.value = NA,
    "Temperature (°C)"
  ) +
  ggplot2::facet_wrap(~lyr) +
  ggplot2::ggtitle("Air Temperature") +
  tidyterra::geom_spatvector(
    data = conus_t,
    fill = "transparent",
    color = "black"
  ) +
  ggpubr::theme_pubr(legend = "bottom") +
  ggplot2::theme(
    plot.title = element_text(hjust = 0.5),
    axis.line = element_blank(),
    axis.ticks = element_blank(),
    axis.text = element_blank()
  )

4.4.5 Zonal Statistics

Looking closely at the previous plot, it is clear that the annual mean and range of temperatures differ between states. The terra package can be used to calculate zonal statistics of a SpatRaster object based on SpatVector polygons.

The terra::zonal() function can be used to calculate the average annual temperature in each state based on the annual grid cell temperatures stored in narr_crop_c$mean.

conus_t$MEAN <- terra::zonal(
  narr_crop_c$mean,
  conus_t,
  fun = "mean"
)

Plot the state-specific annual mean temperatures with ggplot2::ggplot(). Identifying the data set to be plotted within tidyterra::geom_spatvector() argument informs the function that the data is a SpatVector object.

ggplot2::ggplot() +
  tidyterra::geom_spatvector(
    data = conus_t,
    aes(fill = MEAN)
  ) +
  ggplot2::scale_fill_continuous(
    type = "viridis",
    na.value = NA,
    "Temperature (°C)"
  ) +
  ggplot2::ggtitle("Air Temperature (state mean)") +
  ggpubr::theme_pubr(legend = "bottom") +
  ggplot2::theme(
    plot.title = element_text(hjust = 0.5),
    axis.line = element_blank(),
    axis.ticks = element_blank(),
    axis.text = element_blank()
  )

4.4.6 Reclassify

Raster data is most often continuous numeric data. Sometimes, however, it is important to classify the continuous numeric raster data into discrete classes. For this example, we will reclassify the annual mean temperature data into three discrete classes: <10°C, 10°C - 20°C, >20°C.

The first step in the reclassification process is to create a matrix storing the “from”, “to”, and “becomes” values. As the names imply, the “from” and “to” values identify the discrete ranges to be reclassified, and “becomes” is the new value that data within this range will take (i.e., “from” 0 “to” 5 “becomes” 1 means that values ranging from 0 to 5 will be reclassified as 1).

Create the reclassification matrix.

from <- c(-Inf, 10, 20)
to <- c(10, 20, Inf)
becomes <- 1:3
reclass <- matrix(
  c(
    from,
    to,
    becomes
  ),
  ncol = 3
)

Now that the reclassification matrix has been prepared, reclassify the annual mean temperatures. The right = TRUE argument indicates that intervals are open on the left and closed on the right (i.e., (0,10] becomes 1).

narr_reclass <- terra::classify(
  narr_crop_c$mean,
  rcl = reclass,
  right = TRUE
)

Although narr_reclass contains the reclassified mean air temperature data, the data is still continuously numeric. The following chunk of code converts the narr_reclass$mean layer from numeric to character based on the defined levels.

narr_reclass contains the reclassified mean air temperature data, but the data is still numeric. The terra::set.cats() function assigns categories to numeric data based on values stored in a data frame.

level_values <- data.frame(
  c(1:3),
  c("1", "2", "3")
)
colnames(level_values) <- c("mean_continuous", "mean_discrete")
terra::set.cats(
  narr_reclass,
  layer = "mean",
  value = level_values
)

Plot the discretely reclassified mean temperature data.

ggplot2::ggplot() +
  tidyterra::geom_spatraster(data = narr_reclass$mean_discrete) +
  ggplot2::scale_fill_viridis_d(
    "",
    labels = c(
      "≤10°",
      "10°< & ≤20°",
      "20°<",
      ""
    ),
    na.value = NA
  ) +
  ggplot2::ggtitle("Air Temperature (reclassified)") +
  tidyterra::geom_spatvector(
    data = conus_t,
    fill = "transparent",
    color = "black"
  ) +
  ggpubr::theme_pubr(legend = "bottom") +
  ggplot2::theme(
    plot.title = element_text(hjust = 0.5),
    axis.line = element_blank(),
    axis.ticks = element_blank(),
    axis.text = element_blank()
  )

References

9.
Hijmans, Robert J. 2024. terra: Spatial Data Analysis. R Package Version 1.7-71. https://CRAN.R-project.org/package=terra
10.
Hernangómez, Diego. 2023. “Using the tidyverse with terra Objects: The tidyterra Package.” Journal of Open Source Software 8 (91): 5751. https://doi.org/10.21105/joss.05751