3 Polygon Data

Profile-CMP Profile-SBS Profile-STU

Polygon Data Access, Preparation, and Exploratory Analysis in R

Date Modified: May 6, 2024

Authors: Mitchell Manware author-mm, Kyle P. Messier author-kpm

Key Terms: Geospatial Data, Wildfire

Programming Language: R

3.1 Introduction

Wildfire smoke plume coverage data from the United States National Oceanic and Atmospheric Administration (NOAA) will be used to demonstrate using polygon data. This tutorial will cover polygon data with both the sf (1) and terra (9) packages separately, but the steps for accessing, downloading, and unzipping the data is the same for both packages.

This tutorial will demonstrate the following steps with polygon data in R:

  • Downloading data from a URL
  • Importing data
  • Checking data type, structure, and class
  • Reclassifying data
  • Computing summary and zonal statistics
  • Plotting individual and multiple data sets

The exploratory analyses in this unit are designed for educational purposes only. The results of the following analyses are not peer-reviewed findings, nor are they based on any hypotheses.

3.2 Access, Download, and Unzip

The website URL where the NOAA wildfire smoke plume data exists is date-specific, meaning there is a unique URL for each daily data set. For the purpose of these exploratory analyses, wildfire smoke plume data from September 1, 2023 will be used.

Define three variables for day, month, and year according to the date of interest.

day <- "01"
month <- "09"
year <- "2023"

The utils::download.file() function downloads the file according to the defined URL and destination file.

url_noaa <- paste0(
  "https://satepsanone.nesdis.noaa.gov/pub/FIRE/web/HMS/Smoke_Polygons",
  "/Shapefile/",
  year,
  "/",
  month,
  "/hms_smoke",
  year,
  month,
  day,
  ".zip"
)

destination_noaa <- paste0(
  "/   YOUR FILE PATH   /noaa_smoke",
  year,
  month,
  day,
  ".zip"
)

download.file(
  url_noaa,
  destination_noaa
)

The file downloaded from the NOAA website is a .zip file. Zip files need to be unzipped (decompressed) in order to access the data within. Unzip the NOAA wildfire smoke plume coverage file with utils::unzip().

Unzipping a .zip file will decompress the contents within. Spatial data sets can be very large (i.e., > 1 GB), so check the size of the data before unzipping on your machine.

The numeric value size of each file is listed under Length.

unzip("/   YOUR FILE PATH   /noaa_smoke20230901.zip",
  list = TRUE
)

After inspecting the file sizes, unzip noaa_smoke20230901.zip.

unzip("/   YOUR FILE PATH   /noaa_smoke20230901.zip")

Inspecting the file with utils::unzip(list = TRUE) returned the size of the file, but also the name of the data file of interest. The desired data file can also be identified with list.files().

list.files("/   YOUR FILE PATH   /")

Listing the contents of the unzipped file reveals four individual files. The data to be imported is stored in the hms_smoke20230901.shp, but the other files contain important information for the .shp file.

Deleting any of the supporting files (i.e., *.dbf, *.prj, or *.shx) will disrupt the data import.

3.3 Data Preparation with sf

This section will focus on exploratory analyses with polygon data using the sf package.

3.3.1 Import

Now that the contents of the zip file have been saved on your machine and the data file of interest has been identified, import the data with sf::st_read().

Although the supporting files are required to import a shapefile, only the file ending in .shp needs to be imported

smoke_sf <- st_read("/   YOUR FILE PATH   /hms_smoke20230901.shp")

Importing hms_smoke20230901.shp does not return a Warning: message because the data set has native spatial features, and is therefore imported as an sf object.

3.3.2 Inspect Structure

Inspect the structure of smoke_sf to see its class, column names, column classes, and the first two (specified by vec.len = 2) data points.

str(smoke_sf,
  vec.len = 2
)

As mentioned previously, the smoke_sf data set has native spatial features. These are reflected by the data set having classes of sf and data.frame, and the $geometry feature.

class(smoke_sf)

3.3.3 Reclassify

The main parameter of interest in this data set is $Density, which discretely categorizes each wildfire smoke plume as “Light”, “Medium”, or “Heavy”. Checking its class shows that $Density is class character.

class(smoke_sf$Density)

Nominal data, data without fixed order or rank system, can be stored as class character (i.e., State names). However, it is best to store ordinal data as class factor for conducting analyses in R.

Converting data from class character to class factor can be done with factor(). The levels = c() argument in the function specifies both the level names and the ranked order of the levels.

smoke_sf$Density <- factor(smoke_sf$Density,
  levels = c("Light", "Medium", "Heavy")
)

Check the class of $Density again to ensure proper reclassification.

class(smoke_sf$Density)

3.3.4 Coordinate Reference System and Projection

Check the coordinate reference system of an sf object with sf::st_crs().

st_crs(smoke_sf)

smoke_sf has a native coordinate reference system which was imported during the sf::st_read() step. The area of interest for these exploratory analyses is the conterminous United States, so we can transform smoke_sf to the Albers Equal Area projected coordinate system (EPSG code: 5070).

smoke_sf <- st_transform(
  smoke_sf,
  5070
)

3.4 Exploratory Analysis with sf

3.4.1 Plot (Single)

With the data prepared, plot the wildfire smoke plume polygons with ggplot2::ggplot().

Now that the parameters of interest and coordinate reference system have been prepared, create a plot with ggplot2::ggplot(). Identifying the data set to be plotted within the geom_sf() argument informs the function that the data is an sf object.

ggplot() +
  geom_sf(
    data = smoke_sf,
    aes(fill = Density)
  ) +
  scale_fill_manual("Smoke Density",
    values = c("lightgreen", "lightgoldenrod", "tomato")
  ) +
  ggtitle("Wildfire Smoke Plumes") +
  theme_pubr(legend = "bottom") +
  theme(plot.title = element_text(hjust = 0.5)) +
  grids()

The wildfire smoke plume polygons are clearly visible and colored according to their individual smoke density classification. This plot, however, is difficult to interpret for two reasons. First, there are multiple polygons for each smoke density classification. Multiple borders and overlapping polygons with the same smoke density type can be confusing. To make the polygons more clear, individual polygons for each smoke density classification can be combined.

For the purposes of these exploratory analyses, the satellite travelling direction and time of collection will be ignored.

3.4.2 Union

Individual polygons can be unioned (combined) into one multi-part polygon with sf::st_union. The group_by(Density) argument specifies that the polygons should be combined based on the value stored in $Density. Adding the Date = paste0(... argument within the dplyr::summarise() function creates a parameter to store the date based on the year, month, and day of the data.

smoke_sf_density <-
  smoke_sf %>%
  group_by(Density) %>%
  summarise(
    geometry = st_union(geometry),
    Date = paste0(
      year,
      month,
      day
    )
  )

The resulting data set contains three multi-polygons, a column for the smoke plume classification, and a column for the date.

smoke_sf_density

Creating a new plot with smoke_sf_density.

ggplot() +
  geom_sf(
    data = smoke_sf_density,
    aes(fill = Density)
  ) +
  scale_fill_manual("Smoke Density",
    values = c("lightgreen", "lightgoldenrod", "tomato")
  ) +
  ggtitle("Wildfire Smoke Plumes (unioned)") +
  theme_pubr(legend = "bottom") +
  theme(plot.title = element_text(hjust = 0.5)) +
  grids()

The plot is still difficult to interpret because it lacks geospatial context. The grid lines provide latitude and longitude references, but physical or geopolitical boundaries can help show where the wildfire smoke plumes are relative to the study area of interest. To provide geospatial context to the wildfire smoke plume polygons, we can add the United States state boundary polygons to the plot.

3.4.3 Plot (Multiple)

The steps taken to access, download, unzip, and import the United States state boundary data are the same as those taken for the wildfire smoke plume coverage data. Refer to Sections Access, Download, and Unzip and Import for detailed descriptions.

url_states <-
  "https://www2.census.gov/geo/tiger/GENZ2018/shp/cb_2018_us_state_500k.zip"

destination_states <- "/   YOUR FILE PATH   /states.zip"

download.file(
  url_states,
  destination_states
)
unzip("/   YOUR FILE PATH   /states.zip",
  list = TRUE
)
unzip("/   YOUR FILE PATH   /states.zip")
list.files("/   YOUR FILE PATH   /")
states <- st_read("/   YOUR FILE PATH   /cb_2018_us_state.shp")

Inspect the structure of states_sf.

str(states_sf,
  vec.len = 2
)

For the purpose of these exploratory analyses, only the conterminous United States (CONUS) will be used. Subset states_sf to remove Alaska, Hawaii, and United States territories.

remove <- c(
  "Alaska",
  "Hawaii",
  "Puerto Rico",
  "United States Virgin Islands",
  "Commonwealth of the Northern Mariana Islands",
  "Guam",
  "American Samoa"
)

conus_sf <- subset(
  states_sf,
  !NAME %in% remove
)

Check the coordinate reference system.

st_crs(conus_sf)

When analyzing multiple spatial data sets together, all data sets must have the same coordinate reference system or projected coordinate system.

Transform conus_sf to match the projected coordinate system of the smoke_sf_density data set.

conus_sf <- st_transform(
  conus_sf,
  st_crs(smoke_sf_density)
)

Plot the conterminous United States state boundaries.

ggplot() +
  geom_sf(data = conus_sf) +
  ggtitle("Coterminous United States' State Boundaries") +
  theme_pubr() +
  theme(plot.title = element_text(hjust = 0.5)) +
  grids()

With the wildfire smoke plume and conterminous United States polygons imported and prepared, ensure that they have the same coordinate reference system.

st_crs(smoke_sf_density) == st_crs(conus_sf)

Create a plot which shows the distribution of wildfire smoke plumes over the conterminous United States state boundaries.

ggplot() +
  geom_sf(
    data = smoke_sf_density,
    aes(fill = Density)
  ) +
  scale_fill_manual("Smoke Density",
    values = c("lightgreen", "lightgoldenrod", "tomato")
  ) +
  geom_sf(
    data = conus_sf,
    fill = "transparent"
  ) +
  ggtitle("Wildfire Smoke Plumes (with states)") +
  theme_pubr(legend = "bottom") +
  theme(plot.title = element_text(hjust = 0.5)) +
  grids()

This plot provides important geospatial context for understanding where the wildfire smoke plumes are in relation to the study area of interest.

3.4.4 Crop

The sf::st_crop() function can be used to reduce the extent of a set of polygons to a specific rectangle, typically the bounding box of another spatial data set. In this example we can crop the smoke_sf_density polygons to the bounding box surrounding the conterminous United States.

smoke_sf_crop <- st_crop(
  smoke_sf_density,
  conus_sf
)

Plot the cropped wildfire smoke plume polygons and the conterminous United States state boundaries.

ggplot() +
  geom_sf(
    data = smoke_sf_crop,
    aes(fill = Density)
  ) +
  scale_fill_manual("Smoke Density",
    values = c("lightgreen", "lightgoldenrod", "tomato")
  ) +
  geom_sf(
    data = conus_sf,
    fill = "transparent"
  ) +
  ggtitle("Wildfire Smoke Plumes (cropped)") +
  theme_pubr(legend = "bottom") +
  theme(plot.title = element_text(hjust = 0.5)) +
  grids()

3.5 Data Preparation with terra

This section will focus on exploratory analyses with polygon data using the terra package (9).

3.5.1 Import

Now that the contents of the zip files have been saved on your machine and the data files of interest have been identified, import both the wildfire smoke plume coverage data and the United States state boundary data with terra::vect().

See Sections Access, Download, and Unzip and Plot (Multiple) for obtaining the wildfire smoke plume coverage and United States state boundary data sets, respectively.

smoke_t <- vect("/   YOUR FILE PATH   /hms_smoke20230901.shp")
states_t <- vect("/   YOUR FILE PATH   /cb_2018_us_state_500k.shp")

3.5.2 Inspect Structure

Inspect the structures of smoke_t and states_t to see their classes, column names, and column classes.

smoke_t
states_t

Both smoke_t and states_t have native spatial features. These are reflected by the type of spatial data in geometry:, and the spatial attributes extent: and coord. ref.:

3.5.3 Reclassify

The main parameter of interest in this data set is $Density, which discretely categorizes each wildfire smoke plume as “Light”, “Medium”, or “Heavy”. Checking its class shows that $Density is class character.

class(smoke_t$Density)

Nominal data, data without fixed order or rank system, can be stored as class character (i.e., State names). However, it is best to store ordinal data as class factor for conducting analyses in R.

Converting data from class character to class factor can be done with factor(). The levels = c() argument in the function specifies both the level names and the ranked order of the levels.

smoke_t$Density <- factor(smoke_t$Density,
  levels = c("Light", "Medium", "Heavy")
)

Check the class of $Density again to ensure proper reclassification.

class(smoke_t$Density)

For the purpose of these exploratory analyses, only the conterminous United States (CONUS) will be used. Subset states_t to remove Alaska, Hawaii, and the United States territories.

remove <- c(
  "Alaska",
  "Hawaii",
  "Puerto Rico",
  "United States Virgin Islands",
  "Commonwealth of the Northern Mariana Islands",
  "Guam",
  "American Samoa"
)

conus_t <- subset(
  states_t,
  !states_t$NAME %in% remove
)

3.5.4 Coordinate Reference System and Projection

Check the coordinate reference systems of SpatVector objects with terra::crs().

crs(smoke_t,
  describe = TRUE
)
crs(conus_t,
  describe = TRUE
)

Both data sets have native coordinate reference systems which were imported during the terra::vect() step. The two data sets, however, have different coordinate reference systems from each other. The area of interest for these exploratory analyses is the conterminous United States, so we can project smoke_t and conus_t to the Albers Equal Area projected coordinate system (EPSG code: 5070).

smoke_t <- project(
  smoke_t,
  "EPSG:5070"
)
conus_t <- project(
  conus_t,
  "EPSG:5070"
)

Although both data sets were transformed to the same projected coordinate system, it is important to ensure that all data sets have the same coordinate reference system or projected coordinate system.

crs(smoke_t) == crs(conus_t)

3.6 Exploratory Analysis with terra

3.6.1 Plot (Multiple)

Plot both data sets together in one plot with ggplot2::ggplot().

Now that the parameters of interest and coordinate reference systems have been prepared, create a plot with ggplot2::ggplot(). Identifying the data sets to be plotted within the tidyterra::geom_spatvector() arguments informs the function that the data are SpatVector objects (10).

ggplot() +
  geom_spatvector(
    data = smoke_t,
    aes(fill = Density)
  ) +
  scale_fill_manual("Smoke Density",
    values = c("lightgreen", "lightgoldenrod", "tomato")
  ) +
  geom_spatvector(
    data = conus_t,
    fill = "transparent"
  ) +
  ggtitle("Wildfire Smoke Plumes (with states)") +
  theme_pubr(legend = "bottom") +
  theme(plot.title = element_text(hjust = 0.5)) +
  grids()

The wildfire smoke plume polygons are clearly visible and colored according to their individual smoke density classification. The plot, however, is difficult to interpret because there are multiple polygons for each smoke density classification. Multiple borders and overlapping polygons with the same smoke density type can be confusing. To make the polygons more clear, individual polygons for each smoke density classification can be combined.

For the purposes of these exploratory analyses, the satellite travelling direction and time of collection will be ignored.

3.6.2 Aggregate

Individual polygons an be aggregated (combined) into one multi-part polygon with terra::aggregate(). The by = "Density" argument specifies that the polygons should be combined based on the value stored in $Density.

smoke_t_density <- terra::aggregate(smoke_t,
  by = "Density",
  dissolve = TRUE
)

Aggregating the polygons based on the values stored in the $Density column can result in the other columns containing NA values. To remove these columns, subset smoke_t_density to remove $Satellite, $Start, and $End.

smoke_t_density <- smoke_t_density[
  seq_len(nrow(smoke_t_density)),
  c("Density", "agg_n")
]

The resulting data set contains three multi-polygons, a column for the smoke plume classification, and a count of the number of individual polygons that were aggregated to create the multi-polygon. This last column, $agg_n is automatically calculated by the terra::aggregate() function.

smoke_t_density

Create a new plot with smoke_t_density.

ggplot() +
  geom_spatvector(
    data = smoke_t_density,
    aes(fill = Density)
  ) +
  scale_fill_manual("Smoke Density",
    values = c("lightgreen", "lightgoldenrod", "tomato")
  ) +
  geom_spatvector(
    data = conus_t,
    fill = "transparent"
  ) +
  ggtitle("Wildfire Smoke Plumes (aggregated)") +
  theme_pubr(legend = "bottom") +
  theme(plot.title = element_text(hjust = 0.5)) +
  grids()

3.6.3 Crop

The terra::crop() function can be used to reduce SpatVector to an area determined by another SpatVector. In this example, we can crop the smoke_t_density polygons to the conterminous United States state boundaries.

smoke_t_crop <- terra::crop(
  smoke_t_density,
  conus_t
)

Plot the cropped wildfire smoke plume polygons and the conterminous United States state boundaries.

ggplot() +
  geom_spatvector(
    data = smoke_t_crop,
    aes(fill = Density)
  ) +
  scale_fill_manual("Smoke Density",
    values = c("lightgreen", "lightgoldenrod", "tomato")
  ) +
  geom_spatvector(
    data = conus_t,
    fill = "transparent"
  ) +
  ggtitle("Wildfire Smoke Plumes (cropped)") +
  theme_pubr(legend = "bottom") +
  theme(
    plot.title = element_text(hjust = 0.5),
    axis.line = element_blank(),
    axis.ticks = element_blank(),
    axis.text = element_blank()
  )

3.6.4 Zonal Statistics

Looking closely at the previous plot, it is clear that wildfire smoke plumes cover each state differently. The terra package can be used to identify which states are covered by each classification of wildfire smoke plumes.

The terra::relate() function can be used to identify spatial relationships between two SpatVector objects. The relation = "intersects" argument logically identifies if any portion of each state is or is not covered by each of the three wildfire smoke plume classification multi-polygons.

The output of terra::relate() is a wide matrix. The nested data.frame() and t() wrappers convert the output from a wide matrix to a long data frame, which is required to combine the results with the conus_t data set.

conus_smoke <- data.frame(
  t(
    relate(smoke_t_density,
      conus_t,
      relation = "intersects"
    )
  )
)

Set the column names of conus_smoke to match the smoke density classifications.

The order of the columns in conus_smoke are based on the ordered factor levels in smoke_t_density$Density (see Section Reclassify).

colnames(conus_smoke) <- c("Light", "Medium", "Heavy")

Combine the wildfire smoke plume indicator data frame with the the conterminous United States state boundaries data.

conus_t <- cbind(
  conus_t,
  conus_smoke
)

The conus_t data set now contains separate columns indicating the presence or absence of “Light”, “Medium”, and “Heavy” wildfire smoke plumes for each conterminous state.

names(conus_t)

3.6.5 Plot (for Loop)

A for loop can be used to create indicator plots for each wildfire smoke plume classification. The layout of a for loop can look complicated, but it simply applies the same set of functions to a given list of inputs.

The list of inputs must first be created. As the goal is to plot each of the wildfire smoke plume density classifications, create a character vector of the three classification names.

This “list of inputs” must first be created. Store the three wildfire smoke plume classifications in a vector of class character.

dens_c <- c("Light", "Medium", "Heavy")

Create a for loop that creates a plot for each wildfire smoke plume density stored within dens_c.

Code line 1 tells the for loop to apply the following functions to each observation in dens_c.

Code lines 3 through 9 define the plotting colors based on the wildfire smoke plume classification (dens_c[d]). As in previous plots, “Light” smoke plumes will be colored green, “Medium” smoke plumes will be covered yellow, and “Heavy” smoke plumes will be colored red.

Code lines 12 through 32 create the plot based on the wildfire smoke plume classification (dens_c[d]), and previously defined plotting colors (color_values).

for (d in seq_along(dens_c)) {
  # define color palette based on smoke density
  if (dens_c[d] == "Light") {
    color_values <- c("lightgrey", "lightgreen")
  } else if (dens_c[d] == "Medium") {
    color_values <- c("lightgrey", "lightgoldenrod")
  } else if (dens_c[d] == "Heavy") {
    color_values <- c("lightgrey", "tomato")
  }

  # create plot
  print(
    ggplot() +
      geom_spatvector(
        data = conus_t,
        aes_string(fill = dens_c[d])
      ) +
      scale_fill_manual(
        paste0(
          dens_c[d],
          " Smoke Plume Coverage Present"
        ),
        values = color_values
      ) +
      theme_pubr(legend = "bottom") +
      theme(
        plot.title = element_text(hjust = 0.5),
        axis.line = element_blank(),
        axis.ticks = element_blank(),
        axis.text = element_blank()
      )
  )
}

References

1.
Pebesma, Edzer. 2018. Simple Features for R: Standardized Support for Spatial Vector Data.” The R Journal 10 (1): 439–46. https://doi.org/10.32614/RJ-2018-009
9.
Hijmans, Robert J. 2024. terra: Spatial Data Analysis. R Package Version 1.7-71. https://CRAN.R-project.org/package=terra
10.
Hernangómez, Diego. 2023. “Using the tidyverse with terra Objects: The tidyterra Package.” Journal of Open Source Software 8 (91): 5751. https://doi.org/10.21105/joss.05751