pacu: Precision Agriculture Computational Utilities - Satellite data

Satellite data

Interacting with the OData API

Setting up your credentials

The first step is to register your Copernicus Data Space Ecosystem credentials in the R environment using pa_initialize_dataspace().

un <- 'my-username'
pw <- 'my-password'
pa_initialize_dataspace(username = un, password = pw)

Browsing the Data Space catalog

Let us define our area of interest. In this example, we are interested in a field in Ames, Iowa, U.S.

extd.dir <- system.file("extdata", package = "pacu")
area.of.interest <- sf::st_read(file.path(extd.dir, 'cobs_a_aoi.shp'), quiet = TRUE)

Now we can browse the Data Space catalog for images that meet our requirements by passing arguments to pa_browse_dataspace(). The function lets us filter satellite products by date, cloud cover, and collection. In this example, we are interested in Sentinel-2 images from 2023 with at most 50% cloud cover.

available.images <- pa_browse_dataspace(aoi = area.of.interest,
                                        max.cloud.cover = 50,
                                        start.date = '2023-01-01',
                                        end.date = '2023-12-31',
                                        collection.name = 'SENTINEL-2')

Each entry in the data frame returned by the function is a satellite image available to be downloaded from Data Space. In this case, there are 69 images that meet our requirements.

available.images
## Search parameters
## Start date: 2023-01-01 
## End date: 2023-12-31 
## Max. cloud cover: 50%
## Collection name:  SENTINEL-2 
## 
## Results
## Total:  69 
## Online:  69

Using the “summary” function, we can extract the number of available images for every month in the data set.

summary(available.images)
## ------------------
## Year  Month  Count 
## ---   ---    ---   
## 2023  1      4     
## 2023  2      6     
## 2023  3      4     
## 2023  4      6     
## 2023  5      7     
## 2023  6      6     
## 2023  7      8     
## 2023  8      7     
## 2023  9      6     
## 2023  10     5     
## 2023  11     7     
## 2023  12     3     
## ------------------
## Total    69

Downloading satellite images from Data Space

The next step is to download the images. If we supply the available.images object to pa_download_dataspace(), the function will download all 69 available images. For this demonstration, we can focus only on two images. Let’s pick the 32nd and 33rd images because these were captured in mid-July, so there should be maize or soybeans in the field.

An important consideration is that these images can occupy a considerable amount of storage, roughly 500 MB each. That means downloading all 69 images would require about 35 GB. If you provide an area of interest (aoi), the function will crop the downloaded imagery to the relevant extent, which helps reduce storage needs when working with many images.

out.dir <- tempdir()
available.images <- available.images[32:33, ]
pa_download_dataspace(x = available.images,
                      dir.path = out.dir,
                      aoi = area.of.interest,
                      verbose = FALSE)

Visualizing downloaded images

Now that we have downloaded the data, we might be interested in looking at RGB images.

s2.files <- list.files(out.dir, '\\.zip', full.names = TRUE)
rgb.img <- pa_get_rgb(s2.files,
                      aoi = area.of.interest,
                      verbose = FALSE)

pa_plot(rgb.img)

Similarly, we might be interested in NDVI or NDRE.

ndvi.img <- pa_compute_vi(s2.files,
                          vi = 'ndvi', 
                          aoi = area.of.interest,
                          verbose = FALSE)

ndre.img <- pa_compute_vi(s2.files, 
                          vi = 'ndre',
                          aoi = area.of.interest,
                          verbose = FALSE)

pa_plot(ndvi.img, main = 'NDVI')

pa_plot(ndre.img, main = 'NDRE')

Area-level summary

In many precision agriculture applications, we are interested in a summary value for each experimental unit to evaluate the effect of a treatment. This statistic is often the mean or median of the pixels within a polygon representing a research plot or another area of interest. To illustrate this in pacu, we can split the area of interest into four smaller polygons and then compute and visualize summary statistics.

split.aoi <- st_make_grid(area.of.interest, n = c(4, 1))
split.aoi <- st_as_sf(split.aoi)
split.aoi <- st_transform(split.aoi, st_crs(ndvi.img))
ndvi.mean <- summary(ndvi.img, by = split.aoi, fun = mean)
pa_plot(ndvi.mean)

Interacting with the Copernicus Statistical API

The Data Space Statistics API allows users to request area-level statistics derived from satellite imagery without downloading the imagery itself. The API uses OAuth2.0 authentication with a client ID and client secret. To use these functions, you will need to register an OAuth client.

Setting up your credentials

cid <- 'my-client-id'
cs <- 'my-client-secret'
pa_initialize_oauth(client_id = cid, client_secret = cs)

Requesting areal statistics

Now that your OAuth client credentials have been registered, you can request vegetation index statistics using the Statistics API. Let us request NDVI data for our area of interest.

ndvi.statistics <- pa_get_vi_stats(aoi = area.of.interest,
                                   start.date = '2022-01-01',
                                   end.date = '2022-12-31',
                                   vegetation.index = 'ndvi',
                                   agg.time = 'P1D')

The pa_plot() function helps visualize the results in a spatial context. Because we requested NDVI statistics for a single polygon, each facet representing a date is expected to contain only one color.

pa_plot(ndvi.statistics)
## Warning: number of facets. Number of facets for mode "plot" is limited to 64. Change the
## option facet.max to allow more facets, with `tmap_options(facet.max = 70)` or
## `+ tm_options(facet.max = 70)`

If we are more interested in a time-series view, we can specify plot.type.

pa_plot(ndvi.statistics, 
        plot.type = 'timeseries')

We might also want to compare index values across different regions of the field. Let us split the previous field into four parts and retrieve NDVI separately for each section.

Note: using the by.feature argument sends multiple requests to the Statistical API, so you should be mindful of your quota. Quotas and limitations are described here.

ndvi.statistics.2 <- pa_get_vi_stats(aoi = split.aoi,
                                     start.date = '2022-01-01',
                                     end.date = '2022-05-01',
                                     vegetation.index = 'ndvi',
                                     agg.time = 'P1D',
                                     by.feature = TRUE)

Now we can visualize the different parts of the field in a spatial and temporal context.

pa_plot(ndvi.statistics.2)

Similarly, we can examine the same data in a time-series format. In this case, specifying by = "id" plots a different series for each polygon.

pa_plot(ndvi.statistics.2, 
        plot.type = 'timeseries',
        by = c('id'),
        legend.outside = TRUE)