We propose the following workflow for processing raw yield monitor data:
To illustrate the yield monitor workflow, we will look at data from the research paper Prairie strips improve biodiversity and the delivery of multiple ecosystem services from corn–soybean croplands. Specifically, we will use the 2012 data.
extd.dir <- system.file("extdata", package = "pacu")
raw.yield <- st_read(file.path(extd.dir, '2012-basswood.shp'), quiet = TRUE)
boundary <- st_read(file.path(extd.dir, 'boundary.shp'), quiet = TRUE)We can see that the raw yield data has multiple columns. Some of them are more or less relevant depending on how we choose to process the yield data and produce a yield map.
## [1] "ID" "LONGITUDE" "LATITUDE" "FLOW" "TIME"
## [6] "CYCLES" "DISTANCE" "SWATH" "MOISTURE" "STATUS"
## [11] "PASS" "SERIAL" "FIELD" "LOAD" "CROP"
## [16] "GPS" "PDOP" "ALTITUDE" "DRY_BU_AC" "DAY"
## [21] "MONTH" "DAYOFMONTH" "HOUR" "MINUTE" "SECOND"
## [26] "TIMELAPSE" "SPEED" "geometry"
Our main objective in this step of the proposed workflow is to make sure that the data conforms to what we would expect of yield monitor data. For instance, we can examine the raw yield data or the distance between measurements.
The boxplot shows some observations that appear to be abnormally high. These can be a result of sudden speed or direction change, GPS errors, and other sources of unknown variability.
In this step, pa_check_yield() searches for potential
problems in the data so they can be addressed before the yield map is
generated. For instance, when units are not supplied to
pa_yield(), the function attempts to guess them. Because
pa_check_yield() reports those guesses, the user can
confirm or override them before processing.
The user can check for issues relevant to all available algorithms or target a specific algorithm.
## Field information
## -------------------------
## # of points 4240
## Approx. area (ha) 9.8529
## Approx. area (ac) 24.352
## Latitude 41.539
## Longitude -93.27
## CRS NA
## -------------------------
## The CRS is missing from the input object. The pa_yield function will default to EPSG:4326
##
##
## Algorithm: Simple
## Checking column names and units
## --------------------------------
## variable column mean units
## --- --- --- ---
## yield DRY_BU_AC 116.7 bu/ac
## moisture MOISTURE 17 %
## interval CYCLES 2 s
## --------------------------------
## Checking data values
## ---------------------------------------
## variable min max median NAs extreme
## --- --- --- --- --- ---
## yield 10.25 372 118.553 0 6
## moisture 6.90 22.9 17.3000 0 17
## interval 1 3 2 0 0
## ---------------------------------------
## Warning in print.check.yield(x): Extreme values identified. These these are
## values outside of the range of the mean ± 3sd
##
##
## Algorithm: RITAS
## Checking column names and units
## -------------------------------
## variable column mean units
## --- --- --- ---
## mass - - -
## flow FLOW 18.80 lb/s
## moisture MOISTURE 17 %
## interval CYCLES 2 s
## angle - - -
## width SWATH 240 in
## distance DISTANCE 147.2 in
## -------------------------------
## Checking data values
## ---------------------------------------
## variable min max median NAs extreme
## --- --- --- --- --- ---
## mass - - - - -
## flow 0.3 40.2 18.4000 0 0
## moisture 6.90 22.9 17.3000 0 17
## interval 1 3 2 0 0
## angle - - - - -
## width 240 240 240 0 0
## distance 5 304 153 0 0
## ---------------------------------------
## Warning in print.check.yield(x): Extreme values identified. These these are
## values outside of the range of the mean ± 3sd
## Warning in print.check.yield(x): Column angle not found but can be estimated
## within pa_yield
## Median overlap between harvest polygons is 3.1 %
## If this value seems high, please check the units of distance and swath.
pa_check_yield() searches for several kinds of issues
that could prevent pa_yield() from working correctly. It is
therefore worth addressing the identified issues before processing the
data.
Let us imagine a scenario in which we want to use the simple algorithm to process the data. To do so, we need a column with the yield data. The pa_yield() and pa_check_yield() functions will try to identify which columns of the input data frame contain the relevant information, and the units if those are not supplied. We can create a toy example in which the yield column has an uncommon name, preventing the functions from identifying it.
toy.example <- raw.yield
names(toy.example) <- gsub('DRY_BU_AC', 'NOT_A_COMMON_NAME', names(toy.example))
chk <- pa_check_yield(toy.example, algorithm = 'simple')
chk## Field information
## -------------------------
## # of points 4240
## Approx. area (ha) 9.8529
## Approx. area (ac) 24.352
## Latitude 41.539
## Longitude -93.27
## CRS NA
## -------------------------
## The CRS is missing from the input object. The pa_yield function will default to EPSG:4326
##
##
## Algorithm: Simple
## Checking column names and units
## -------------------------------
## variable column mean units
## --- --- --- ---
## yield - - -
## moisture MOISTURE 17 %
## interval CYCLES 2 s
## -------------------------------
## Checking data values
## ---------------------------------------
## variable min max median NAs extreme
## --- --- --- --- --- ---
## yield - - - - -
## moisture 6.90 22.9 17.3000 0 17
## interval 1 3 2 0 0
## ---------------------------------------
## Warning in print.check.yield(x): Extreme values identified. These these are
## values outside of the range of the mean ± 3sd
## Warning in print.check.yield(x): Columns yield and moisture are needed for the
## simple algorithm
To deal with this issue, we can rename the column to a more common name such as “yield” or “YIELD”. Additionally, a valid solution would be to provide the column name to the pa_yield() function directly when processing the data. Here, we rename the column simply to “yield” and we can see that the algorithm is able to identify it once again.
names(toy.example) <- gsub('NOT_A_COMMON_NAME', 'yield', names(toy.example))
chk <- pa_check_yield(toy.example, algorithm = 'simple')
chk## Field information
## -------------------------
## # of points 4240
## Approx. area (ha) 9.8529
## Approx. area (ac) 24.352
## Latitude 41.539
## Longitude -93.27
## CRS NA
## -------------------------
## The CRS is missing from the input object. The pa_yield function will default to EPSG:4326
##
##
## Algorithm: Simple
## Checking column names and units
## -------------------------------
## variable column mean units
## --- --- --- ---
## yield yield 116.7 bu/ac
## moisture MOISTURE 17 %
## interval CYCLES 2 s
## -------------------------------
## Checking data values
## ---------------------------------------
## variable min max median NAs extreme
## --- --- --- --- --- ---
## yield 10.25 372 118.553 0 6
## moisture 6.90 22.9 17.3000 0 17
## interval 1 3 2 0 0
## ---------------------------------------
## Warning in print.check.yield(x): Extreme values identified. These these are
## values outside of the range of the mean ± 3sd
The pa_yield() function can produce yield maps using two algorithms: simple and ritas. The simple algorithm allows moisture standardization, data cleaning, and smoothing. By default, it does not conduct any area-based aggregations. The ritas algorithm (Damiano & Niemi, 2020) follows a constructive framework, in which the steps are: rectangle creation; intersection assignment; tessellation; apportioning; and smoothing.
When algorithm is simple, the function will search for the columns that indicate yield, moisture, and time. Note that, since different crops have a different conversion factors from bushel to pounds, it is important to specify the lbs.per.bushel argument when the US standard system of units is used. In this case, we are producing a yield map for a maize crop, thus, 56 lbs/bushel. Additionally, the function supports the output of the yield map in both U.S. standard and metric unit systems. In this example, we have chosen “metric”.
ymp1 <- pa_yield(input = raw.yield,
boundary = boundary,
algorithm = 'simple',
lbs.per.bushel = 56,
unit.system = 'metric',
verbose = FALSE)The “ymp1” object contains information on the yield processing algorithm, smoothing method, conversion factor used, moisture, and a summary of the yield data.
## Yield data processing algorithm: simple
## Smoothing method: none
## Conversion factor: 56 lbs/bushel
## Adj. moisture: 17.01373%
## Yield summary (t/ha)
## -----------------
## Min. 0.635
## 1st Qu. 4.4947
## Median 7.4521
## Mean 7.3107
## 3rd Qu. 10.0796
## Max. 23.054
## -----------------
We can visualize the yield map by plotting it
In the previous map, we can see that the units are “t/ha”. However, a user might want to produce yield maps using US standard units. Additionally, the user can set the moisture level to the market standard moisture. When no moisture adjustment is provided, the function adjusts the moisture level to the average moisture in the data set.
ymp2 <- pa_yield(input = raw.yield,
boundary = boundary,
algorithm = 'simple',
unit.system = 'standard',
moisture.adj = 15.5,
lbs.per.bushel = 56,
verbose = FALSE)## Yield data processing algorithm: simple
## Smoothing method: none
## Conversion factor: 56 lbs/bushel
## Adj. moisture: 15.5%
## Yield summary (bushel/acre)
## -----------------
## Min. 9.9172
## 1st Qu. 70.247
## Median 116.47
## Mean 114.26
## 3rd Qu. 157.53
## Max. 360.31
## -----------------
We can visualize the yield map in bushels/acre and at a moisture level of 15.5%
In the previous maps we used simple aggregation and moisture standardization. We can also add a cleaning step to remove outliers from the raw yield data.
ymp3 <- pa_yield(input = raw.yield,
boundary = boundary,
algorithm = 'simple',
unit.system = 'metric',
clean = TRUE,
clean.sd = 3,
lbs.per.bushel = 56,
verbose = FALSE)This cleaning step will remove observations outside of a 3 standard deviation range from the field mean. This will often leave empty spots in the field if no smoothing method is selected.
If we want to interpolate the data, the function can interpolate the yield map using two prediction methods: inverse distance weighted (IDW) and kriging. The IDW method is deterministic and is much faster than the kriging method. The kriging method is stochastic and the computation time scales with \(O(n^{3})\), where n is the number of observations.
Here, we can produce an interpolated yield map using the IDW method.
By default, the function uses a power of 2, but that can be overridden
by passing the argument idp to the function.
If the argument grid is not specified, the function will provide smoothed predictions at the same points as the input data. This provides predictions for locations where observations may have been removed during the cleaning step.
ymp4 <- pa_yield(input = raw.yield,
boundary = boundary,
algorithm = 'simple',
unit.system = 'metric',
clean = TRUE,
clean.sd = 3,
smooth.method = 'idw',
lbs.per.bushel = 56,
verbose = FALSE)## Yield data processing algorithm: simple
## Smoothing method: idw
## Conversion factor: 56 lbs/bushel
## Adj. moisture: 17.01373%
## Yield summary (t/ha)
## -----------------
## Min. 0.635
## 1st Qu. 4.4886
## Median 7.4437
## Mean 7.2890
## 3rd Qu. 10.0523
## Max. 17.426
## -----------------
Alternatively, we can interpolate the maps using the Kriging method. As noted earlier, the computational time scales quickly as the number of observations increases. Therefore, we will add the argument “maxdist” to decrease computational time. It should be noted that the value at which the user sets the “maxdist” argument should be determined after examining the variogram.
ymp5 <- pa_yield(input = raw.yield,
boundary = boundary,
algorithm = 'simple',
unit.system = 'metric',
clean = TRUE,
clean.sd = 3,
smooth.method = 'krige',
lbs.per.bushel = 56,
verbose = FALSE,
maxdist = 50)## Yield data processing algorithm: simple
## Smoothing method: krige
## Conversion factor: 56 lbs/bushel
## Adj. moisture: 17.01373%
## Yield summary (t/ha)
## -----------------
## Min. 1.2440
## 1st Qu. 5.1989
## Median 7.6114
## Mean 7.4753
## 3rd Qu. 9.7014
## Max. 13.910
## -----------------
## Variogram model: Mat (Matern)
## Partial sill: 122865.5
## Range: 107.4692
## Kappa: 0.3
We can also investigate the variogram
When algorithm is ritas, the function expects an
argument data.columns that indicates crop mass or crop
flow, moisture, interval, angle, swath, and distance. Additionally, the
function expects a data.units argument to indicate the
units of the data columns. When neither of these arguments is provided,
the function uses a dictionary of common names to search for likely
matches and then attempts to guess the units.
The ritas algorithm is more computationally
intensive than the simple algorithm. Some of the steps
can be sped up by specifying the cores argument, which
allows some of the processing steps to run in parallel.
ymp6 <- pa_yield(input = raw.yield,
algorithm = 'ritas',
lbs.per.bushel = 56,
unit.system = 'metric',
verbose = FALSE)In this case, since the function is able to guess the columns and the units correctly, this would be the same as the chunk below.
ymp6 <- pa_yield(input = raw.yield,
data.columns = c(flow = 'FLOW', moisture = 'MOISTURE', interval = 'CYCLES', width = 'SWATH', distance = 'DISTANCE'),
data.units = c(flow = 'lb/s', moisture = '%', interval = 's', width = 'in', distance = 'in'),
unit.system = 'metric',
algorithm = 'ritas',
verbose = FALSE) In the previous yield map, we did not apply smoothing, so technically
we did not complete the full ritas workflow. Following
the algorithm’s steps, the best results are often obtained by setting
smooth.method = "krige". The function also includes the
option to krige on the log scale when the yield distribution is heavily
skewed. In that case, set fun = "log".
ymp7 <- pa_yield(input = raw.yield,
boundary = boundary,
algorithm = 'ritas',
smooth.method = 'krige',
unit.system = 'metric',
lbs.per.bushel = 56,
verbose = FALSE,
maxdist = 50)## Yield data processing algorithm: ritas
## Smoothing method: krige
## Conversion factor: 56 lbs/bushel
## Adj. moisture: 17.01373%
## Yield summary (t/ha)
## -----------------
## Min. 1.6147
## 1st Qu. 6.2176
## Median 9.0567
## Mean 8.8636
## 3rd Qu. 11.634
## Max. 17.456
## -----------------
## Variogram model: Mat (Matern)
## Partial sill: 194472.9
## Range: 197.6451
## Kappa: 0.3