-
-
Notifications
You must be signed in to change notification settings - Fork 592
/
Copy path04-spatial-operations.Rmd
1161 lines (941 loc) · 70.7 KB
/
04-spatial-operations.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# Spatial data operations {#spatial-operations}
```{r, include=FALSE}
source("code/before_script.R")
```
## Prerequisites {-}
- This chapter requires the same packages used in Chapter \@ref(attr):
```{r 04-spatial-operations-1, message=FALSE, results='hide'}
library(sf)
library(terra)
library(dplyr)
library(spData)
```
## Introduction
Spatial operations, including spatial joins between vector datasets and local and focal operations on raster datasets, are a vital part of geocomputation\index{geocomputation}.
This chapter shows how spatial objects can be modified in a multitude of ways based on their location and shape.
Many spatial operations have a non-spatial (attribute) equivalent, so concepts such as subsetting and joining datasets demonstrated in the previous chapter are applicable here.
This is especially true for *vector* operations: Section \@ref(vector-attribute-manipulation) on vector attribute manipulation provides the basis for understanding its spatial counterpart, namely spatial subsetting (covered in Section \@ref(spatial-subsetting)).
Spatial joining (Sections \@ref(spatial-joining), \@ref(non-overlapping-joins) and \@ref(incongruent)) and aggregation (Section \@ref(spatial-aggr)) also have non-spatial counterparts, covered in the previous chapter.
Spatial operations differ from non-spatial operations in a number of ways, however:
spatial joins, for example, can be done in a number of ways --- including matching entities that intersect with or are within a certain distance of the target dataset --- while the attribution joins discussed in Section \@ref(vector-attribute-joining) in the previous chapter can only be done in one way (except when using fuzzy joins, as described in the documentation of the [**fuzzyjoin**](https://cran.r-project.org/package=fuzzyjoin) package).
Different *types* of spatial relationship between objects, including intersects and disjoint, are described in Sections \@ref(topological-relations) and \@ref(DE-9IM-strings).
\index{spatial operations}
Another unique aspect of spatial objects is distance: all spatial objects are related through space, and distance calculations can be used to explore the strength of this relationship, as described in the context of vector data in Section \@ref(distance-relations).
Spatial operations on raster objects include subsetting --- covered in Section \@ref(spatial-raster-subsetting).
*Map algebra* covers a range of operations that modify raster cell values, with or without reference to surrounding cell values.
The concept of map algebra, vital for many applications, is introduced in Section \@ref(map-algebra); local, focal and zonal map algebra operations are covered in sections \@ref(local-operations), \@ref(focal-operations), and \@ref(zonal-operations), respectively.
Global map algebra operations, which generate summary statistics representing an entire raster dataset, and distance calculations on rasters, are discussed in Section \@ref(global-operations-and-distances).
Next, the relationships between map algebra and vector operations are discussed in Section \@ref(map-algebra-counterparts-in-vector-processing).
In the Section \@ref(merging-rasters),1 the process of merging two raster datasets is discussed and demonstrated with reference to a reproducible example.
```{block2 04-spatial-operations-2, type='rmdnote'}
It is important to note that spatial operations that use two spatial objects rely on both objects having the same coordinate reference system, a topic that was introduced in Section \@ref(crs-intro) and which will be covered in more depth in Chapter \@ref(reproj-geo-data).
```
## Spatial operations on vector data {#spatial-vec}
This section provides an overview of spatial operations on vector geographic data represented as simple features in the **sf** package.
Section \@ref(spatial-ras) presents spatial operations on raster datasets using classes and functions from the **terra** package.
### Spatial subsetting
Spatial subsetting is the process of taking a spatial object and returning a new object containing only features that *relate* in space to another object.
Analogous to *attribute subsetting* (covered in Section \@ref(vector-attribute-subsetting)), subsets of `sf` data frames can be created with square bracket (`[`) operator using the syntax `x[y, , op = st_intersects]`, where `x` is an `sf` object from which a subset of rows will be returned, `y` is the 'subsetting object' and `, op = st_intersects` is an optional argument that specifies the topological relation (also known as the binary predicate) used to do the subsetting.
The default topological relation used when an `op` argument is not provided is `st_intersects()`: the command `x[y, ]` is identical to `x[y, , op = st_intersects]` shown above but not `x[y, , op = st_disjoint]` (the meaning of these and other topological relations is described in the next section).
The `filter()` function from the **tidyverse**\index{tidyverse (package)} can also be used, but this approach is more verbose, as we will see in the examples below.
\index{vector!subsetting}
\index{spatial!subsetting}
To demonstrate spatial subsetting, we will use the `nz` and `nz_height` datasets in the **spData** package, which contain geographic data on the 16 main regions and 101 highest points in New Zealand, respectively (Figure \@ref(fig:nz-subset)), in a projected coordinate reference system.
The following code chunk creates an object representing Canterbury, then uses spatial subsetting to return all high points in the region.
```{r 04-spatial-operations-3}
canterbury = nz |> filter(Name == "Canterbury")
canterbury_height = nz_height[canterbury, ]
```
```{r nz-subset, echo=FALSE, warning=FALSE, fig.cap="Spatial subsetting, with red triangles representing 101 high points in New Zealand, clustered near the central Canterbuy region (left). The points in Canterbury were created with the `[` subsetting operator (highlighted in gray, right).", fig.scap="Spatial subsetting.", message=FALSE}
library(tmap)
p_hpnz1 = tm_shape(nz) +
tm_polygons(fill = "white") +
tm_shape(nz_height) +
tm_symbols(shape = 2, col = "red", size = 0.5, col_alpha = 0.75) +
tm_title("High points in New Zealand") +
tm_layout(bg.color = "lightblue")
p_hpnz2 = tm_shape(nz) +
tm_polygons(fill = "white") +
tm_shape(canterbury) +
tm_fill(col = "gray") +
tm_shape(canterbury_height) +
tm_symbols(shape = 2, col = "red", size = 0.5, col_alpha = 0.75) +
tm_title("High points in Canterbury") +
tm_layout(bg.color = "lightblue")
tmap_arrange(p_hpnz1, p_hpnz2, ncol = 2)
```
Like attribute subsetting, the command `x[y, ]` (equivalent to `nz_height[canterbury, ]`) subsets features of a *target* `x` using the contents of a *source* object `y`.
Instead of `y` being a vector of class `logical` or `integer`, however, for spatial subsetting both `x` and `y` must be geographic objects.
Specifically, objects used for spatial subsetting in this way must have the class `sf` or `sfc`: both `nz` and `nz_height` are geographic vector data frames and have the class `sf`, and the result of the operation returns another `sf` object representing the features in the target `nz_height` object that intersect with (in this case high points that are located within) the `canterbury` region.
Various *topological relations*\index{topological relations} can be used for spatial subsetting which determine the type of spatial relationship that features in the target object must have with the subsetting object to be selected.
These include *touches*, *crosses* or *within*, as we will see shortly in Section \@ref(topological-relations).
The default setting `st_intersects` is a 'catch all' topological relation that will return features in the target that *touch*, *cross* or are *within* the source 'subsetting' object.
Alternative spatial operators can be specified with the `op =` argument, as demonstrated in the following command which returns the opposite of `st_intersects()`, points that do not intersect with Canterbury (see Section \@ref(topological-relations)).
```{r 04-spatial-operations-4, eval=FALSE}
nz_height[canterbury, , op = st_disjoint]
```
```{block2 04-spatial-operations-5, type='rmdnote'}
Note the empty argument --- denoted with `, ,` --- in the preceding code chunk is included to highlight `op`, the third argument in `[` for `sf` objects.
One can use this to change the subsetting operation in many ways.
`nz_height[canterbury, 2, op = st_disjoint]`, for example, returns the same rows but only includes the second attribute column (see `` sf::`[.sf` `` and the `?sf` for details).
```
For many applications, this is all you'll need to know about spatial subsetting for vector data: it just works.
If you are impatient to learn about more topological relations, beyond `st_intersects()` and `st_disjoint()`, skip to the next section (\@ref(topological-relations)).
If you're interested in the details, including other ways of subsetting, read on.
Another way of doing spatial subsetting uses objects returned by topological operators.
These objects can be useful in their own right, for example when exploring the graph network of relationships between contiguous regions, but they can also be used for subsetting, as demonstrated in the code chunk below.
```{r 04-spatial-operations-6, out.lines=9}
sel_sgbp = st_intersects(x = nz_height, y = canterbury)
class(sel_sgbp)
sel_sgbp
sel_logical = lengths(sel_sgbp) > 0
canterbury_height2 = nz_height[sel_logical, ]
```
The above code chunk creates an object of class `sgbp` (a sparse geometry binary predicate, a list of length `x` in the spatial operation) and then converts it into a logical vector `sel_logical` (containing only `TRUE` and `FALSE` values, something that can also be used by **dplyr**'s filter function).
\index{binary predicate|see {topological relations}}
The function `lengths()` identifies which features in `nz_height` intersect with *any* objects in `y`.
In this case, 1 is the greatest possible value, but for more complex operations one could use the method to subset only features that intersect with, for example, 2 or more features from the source object.
```{block2 04-spatial-operations-7, type='rmdnote'}
Note: another way to return a logical output is by setting `sparse = FALSE` (meaning 'return a dense matrix not a sparse one') in operators such as `st_intersects()`. The command `st_intersects(x = nz_height, y = canterbury, sparse = FALSE)[, 1]`, for example, would return an output identical to `sel_logical`.
Note: the solution involving `sgbp` objects is more generalizable though, as it works for many-to-many operations and has lower memory requirements.
```
The same result can be also achieved with the **sf** function `st_filter()` which was [created](https://github.com/r-spatial/sf/issues/1148) to increase compatibility between `sf` objects and **dplyr** data manipulation code:
```{r}
canterbury_height3 = nz_height |>
st_filter(y = canterbury, .predicate = st_intersects)
```
```{r 04-spatial-operations-7b-old, eval=FALSE, echo=FALSE}
# Additional tests of subsetting
canterbury_height4 = nz_height |>
filter(st_intersects(x = _, y = canterbury, sparse = FALSE))
canterbury_height5 = nz_height |>
filter(sel_logical)
identical(canterbury_height3, canterbury_height4)
identical(canterbury_height3, canterbury_height5)
identical(canterbury_height2, canterbury_height4)
identical(canterbury_height, canterbury_height4)
waldo::compare(canterbury_height2, canterbury_height4)
```
At this point, there are three identical (in all but row names) versions of `canterbury_height`, one created using the `[` operator, one created via an intermediary selection object, and another using **sf**'s convenience function `st_filter()`.
<!-- RL: commented out for now as old. Todo: if we ever update that vignette uncomment the next line. -->
<!-- To explore spatial subsetting in more detail, see the supplementary vignettes on `subsetting` and [`tidyverse-pitfalls`](https://geocompr.github.io/geocompkg/articles/) on the [geocompkg website](https://geocompr.github.io/geocompkg/articles/). -->
The next section explores different types of spatial relation, also known as binary predicates, that can be used to identify whether two features are spatially related or not.
### Topological relations
Topological relations\index{topological relations} describe the spatial relationships between objects.
"Binary topological relationships", to give them their full name, are logical statements (in that the answer can only be `TRUE` or `FALSE`) about the spatial relationships between two objects defined by ordered sets of points (typically forming points, lines and polygons) in two or more dimensions [@egenhofer_mathematical_1990].
That may sound rather abstract and, indeed, the definition and classification of topological relations is based on mathematical foundations first published in book form in 1966 [@spanier_algebraic_1995], with the field of algebraic topology continuing beyond the year 2000 [@dieck_algebraic_2008].
Despite their mathematical origins, topological relations can be understood intuitively with reference to visualizations of commonly used functions that test for common types of spatial relationships.
Figure \@ref(fig:relations) shows a variety of geometry pairs and their associated relations.
The third and fourth pairs in Figure \@ref(fig:relations) (from left to right and then down) demonstrate that, for some relations, order is important.
While the relations *equals*, *intersects*, *crosses*, *touches* and *overlaps* are symmetrical, meaning that if `function(x, y)` is true, `function(y, x)` will also be true, relations in which the order of the geometries are important such as *contains* and *within* are not.
Notice that each geometry pair has a "DE-9IM" string such as FF2F11212, described in the next section.
\index{topological relations}
```{r relations, echo=FALSE, fig.cap="Topological relations between vector geometries, inspired by figures 1 and 2 in Egenhofer and Herring (1990). The relations for which the function(x, y) is true are printed for each geometry pair, with x represented in pink and y represented in blue. The nature of the spatial relationship for each pair is described by the Dimensionally Extended 9-Intersection Model string.", fig.show='hold', message=FALSE, fig.asp=0.66, warning=FALSE}
# source("https://github.com/geocompx/geocompr/raw/main/code/de_9im.R")
source("code/de_9im.R")
library(sf)
xy2sfc = function(x, y) st_sfc(st_polygon(list(cbind(x, y))))
p1 = xy2sfc(x = c(0, 0, 1, 1, 0), y = c(0, 1, 1, 0.5, 0))
p2 = xy2sfc(x = c(0, 1, 1, 0), y = c(0, 0, 0.5, 0))
p3 = xy2sfc(x = c(0, 1, 1, 0), y = c(0, 0, 0.7, 0))
p4 = xy2sfc(x = c(0.7, 0.7, 0.9, 0.7), y = c(0.8, 0.5, 0.5, 0.8))
p5 = xy2sfc(x = c(0.6, 0.7, 1, 0.6), y = c(0.7, 0.5, 0.5, 0.7))
p6 = xy2sfc(x = c(0.1, 1, 1, 0.1), y = c(0, 0, 0.3, 0))
p7 = xy2sfc(x = c(0.05, 0.05, 0.6, 0.5, 0.05), y = c(0.4, 0.97, 0.97, 0.4, 0.4))
# todo: add 3 more with line/point relations?
tmap::tmap_arrange(de_9im(p1, p2), de_9im(p1, p3), de_9im(p1, p4),
de_9im(p7, p1), de_9im(p1, p5), de_9im(p1, p6), nrow = 2)
```
In `sf`, functions testing for different types of topological relations are called 'binary predicates', as described in the vignette *Manipulating Simple Feature Geometries*, which can be viewed with the command [`vignette("sf3")`](https://r-spatial.github.io/sf/articles/sf3.html), and in the help page [`?geos_binary_pred`](https://r-spatial.github.io/sf/reference/geos_binary_ops.html).
To see how topological relations work in practice, let's create a simple reproducible example, building on the relations illustrated in Figure \@ref(fig:relations) and consolidating knowledge of how vector geometries are represented from a previous chapter (Section \@ref(geometry)).
Note that to create tabular data representing coordinates (x and y) of the polygon vertices, we use the base R function `cbind()` to create a matrix representing coordinates points, a `POLYGON`, and finally an `sfc` object, as described in Chapter \@ref(spatial-class):
```{r}
polygon_matrix = cbind(
x = c(0, 0, 1, 1, 0),
y = c(0, 1, 1, 0.5, 0)
)
polygon_sfc = st_sfc(st_polygon(list(polygon_matrix)))
```
We will create additional geometries to demonstrate spatial relations with the following commands which, when plotted on top of the polygon created above, relate in space to one another, as shown in Figure \@ref(fig:relation-objects).
Note the use of the function `st_as_sf()` and the argument `coords` to efficiently convert from a data frame containing columns representing coordinates to an `sf` object containing points:
```{r}
point_df = data.frame(
x = c(0.2, 0.7, 0.4),
y = c(0.1, 0.2, 0.8)
)
point_sf = st_as_sf(point_df, coords = c("x", "y"))
```
```{r relation-objects, echo=FALSE, fig.cap="Points, line and polygon objects arranged to illustrate topological relations.", fig.asp=1, out.width="50%", fig.scap="Demonstration of topological relations."}
par(pty = "s")
plot(polygon_sfc, border = "red", col = "gray", axes = TRUE)
plot(point_sf, add = TRUE, lab = 1:4, cex = 2)
text(point_df[, 1] + 0.02, point_df[, 2] + 0.04, 1:3, cex = 1.3)
```
A simple query is: which of the points in `point_sf` intersect in some way with polygon `polygon_sfc`?
The question can be answered by inspection (points 1 and 3 are touching and within the polygon, respectively).
This question can be answered with the spatial predicate `st_intersects()` as follows:
```{r 04-spatial-operations-9, eval=FALSE}
st_intersects(point_sf, polygon_sfc)
#> Sparse geometry binary predicate... `intersects'
#> 1: 1
#> 2: (empty)
#> 3: 1
```
The result should match your intuition:
positive (`1`) results are returned for the first and third point, and a negative result (represented by an empty vector) for the second are outside the polygon's border.
What may be unexpected is that the result comes in the form of a list of vectors.
This *sparse matrix* output only registers a relation if one exists, reducing the memory requirements of topological operations on multi-feature objects.
As we saw in the previous section, a *dense matrix* consisting of `TRUE` or `FALSE` values is returned when `sparse = FALSE`.
```{r 04-spatial-operations-10}
st_intersects(point_sf, polygon_sfc, sparse = FALSE)
```
In the above output each row represents a feature in the target (argument `x`) object, and each column represents a feature in the selecting object (`y`).
In this case, there is only one feature in the `y` object `polygon_sfc` so the result, which can be used for subsetting as we saw in Section \@ref(spatial-subsetting), has only one column.
`st_intersects()` returns `TRUE` even in cases where the features just touch: *intersects*\index{intersects} is a 'catch-all' topological operation which identifies many types of spatial relation, as illustrated in Figure \@ref(fig:relations).
More restrictive questions include which points lie within the polygon, and which features are on or contain a shared boundary with `y`?
These can be answered as follows (results not shown):
```{r 04-spatial-operations-9-2, eval=FALSE}
st_within(point_sf, polygon_sfc)
st_touches(point_sf, polygon_sfc)
```
Note that although the first point *touches* the boundary polygon, it is not within it; the third point is within the polygon but does not touch any part of its border.
The opposite of `st_intersects()` is `st_disjoint()`, which returns only objects that do not spatially relate in any way to the selecting object (note `[, 1]` converts the result into a vector).
```{r 04-spatial-operations-11}
st_disjoint(point_sf, polygon_sfc, sparse = FALSE)[, 1]
```
The function `st_is_within_distance()` detects features that *almost touch* the selection object, which has an additional `dist` argument.
It can be used to set how close target objects need to be before they are selected.
The 'is within distance' binary spatial predicate is demonstrated in the code chunk below, the results of which show that every point is within 0.2 units of the polygon.
```{r 04-spatial-operations-14}
st_is_within_distance(point_sf, polygon_sfc, dist = 0.2, sparse = FALSE)[, 1]
```
Note that although point 2 is more than 0.2 units of distance from the nearest vertex of `polygon_sfc`, it is still selected when the distance is set to 0.2.
This is because distance is measured to the nearest edge, in this case the part of the polygon that lies directly above point 2 in Figure \@ref(fig:relation-objects).
(You can verify the actual distance between point 2 and the polygon is 0.13 with the command `st_distance(point_sf, polygon_sfc)`.)
```{r, eval=FALSE, echo=FALSE}
# verify distances to the polygon with reference to paragraph above:
st_distance(point_sf, polygon_sfc)
# [,1]
# [1,] 0.0000000
# [2,] 0.1341641
# [3,] 0.0000000
```
```{block2 04-spatial-operations-15, type='rmdnote'}
Functions for calculating topological relations use spatial indices to largely speed up spatial query performance.
They achieve that using the Sort-Tile-Recursive (STR) algorithm.
The `st_join` function, mentioned in the next section, also uses the spatial indexing.
You can learn more at https://www.r-spatial.org/r/2017/06/22/spatial-index.html.
```
```{r 04-spatial-operations-16, eval=FALSE, echo=FALSE}
# other tests
st_overlaps(point_sf, polygon_sfc, sparse = FALSE)
st_covers(point_sf, polygon_sfc, sparse = FALSE)
st_covered_by(point_sf, polygon_sfc, sparse = FALSE)
```
```{r 04-spatial-operations-17, eval=FALSE, echo=FALSE}
st_contains(a, p[2, ], sparse = TRUE)
```
```{r 04-spatial-operations-18, eval=FALSE, echo=FALSE}
# starting simpler so commented
a1 = st_polygon(list(rbind(c(-1, -1), c(1, -1), c(1, 1), c(-1, -1))))
a2 = st_polygon(list(rbind(c(2, 0), c(2, 2), c(3, 2), c(3, 0), c(2, 0))))
a = st_sfc(a1, a2)
b1 = a1 * 0.5
b2 = a2 * 0.4 + c(1, 0.5)
b = st_sfc(b1, b2)
l1 = st_linestring(x = matrix(c(0, 3, -1, 1), , 2))
l2 = st_linestring(x = matrix(c(-1, -1, -0.5, 1), , 2))
l = st_sfc(l1, l2)
p = st_multipoint(x = matrix(c(0.5, 1, -1, 0, 1, 0.5), , 2))
plot(a, border = "red", axes = TRUE)
plot(b, border = "green", add = TRUE)
plot(l, add = TRUE)
plot(p, add = TRUE)
```
### Distance relations
While the topological relations presented in the previous section are binary (a feature either intersects with another or does not) distance relations are continuous\index{distance relations}.
The distance between two `sf` objects is calculated with `st_distance()`, which is also used behind the scenes in Section \@ref(non-overlapping-joins) for distance-based joins.
This is illustrated in the code chunk below, which finds the distance between the highest point in New Zealand and the geographic centroid of the Canterbury region, created in Section \@ref(spatial-subsetting):
\index{vector!distance relations}
```{r 04-spatial-operations-31, warning=FALSE}
nz_highest = nz_height |> slice_max(n = 1, order_by = elevation)
canterbury_centroid = st_centroid(canterbury)
st_distance(nz_highest, canterbury_centroid)
```
There are two potentially surprising things about the result:
- It has `units`, telling us the distance is 100,000 meters, not 100,000 inches, or any other measure of distance
- It is returned as a matrix, even though the result only contains a single value
This second feature hints at another useful feature of `st_distance()`, its ability to return *distance matrices* between all combinations of features in objects `x` and `y`.
This is illustrated in the command below, which finds the distances between the first three features in `nz_height` and the Otago and Canterbury regions of New Zealand represented by the object `co`.
```{r 04-spatial-operations-32}
co = filter(nz, grepl("Canter|Otag", Name))
st_distance(nz_height[1:3, ], co)
```
Note that the distance between the second and third features in `nz_height` and the second feature in `co` is zero.
This demonstrates the fact that distances between points and polygons refer to the distance to *any part of the polygon*.
The second and third points in `nz_height` are *in* Otago, which can be verified by plotting them (result not shown):
```{r 04-spatial-operations-33, eval=FALSE}
plot(st_geometry(co)[2])
plot(st_geometry(nz_height)[2:3], add = TRUE)
```
### DE-9IM strings {#DE-9IM-strings}
Underlying the binary predicates demonstrated in the previous section is the Dimensionally Extended 9-Intersection Model (DE-9IM)\index{topological relations!DE-9IM}.
As the cryptic name suggests, this is not an easy topic to understand, but it is worth knowing about because it underlies many spatial operations and enables the creation of custom spatial predicates.
The model was originally labelled "DE + 9IM" by its inventors, referring to the "dimension of the intersections of boundaries, interiors, and exteriors of two features" [@clementini_comparison_1995], but it is now referred to as DE-9IM [@shen_classification_2018].
DE-9IM is applicable to two-dimensional objects (points, lines and polygons) in Euclidean space, meaning that the model (and software implementing it such as GEOS) assumes you are working with data in a projected coordinate reference system, described in Chapter \@ref(reproj-geo-data).
```{r de-9im, echo=FALSE, eval=FALSE}
# Todo one day: revive this
b = st_sfc(st_point(c(0, 1)), st_point(c(1, 1))) # create 2 points
b = st_buffer(b, dist = 1) # convert points to circles
bsf = sf::st_sf(data.frame(Object = c("a", "b")), geometry = b)
b9 = replicate(bsf, n = 9, simplify = FALSE)
b9sf = do.call(rbind, b9)
domains = c("Interior", "Boundary", "Exterior")
b9sf$domain_a = rep(rep(domains, 3), each = 2)
b9sf$domain_b = rep(rep(domains, each = 3), each = 2)
library(ggplot2)
ggplot(b9sf) +
geom_sf() +
facet_grid(domain_a ~ domain_b)
plot(b9sf)
tmap_arrange(
tm_shape(b) + tm_polygons(alpha = 0.5) + tm_layout(title = "Interior-Interior"),
tm_shape(b) + tm_polygons(alpha = 0.5) + tm_layout(title = "Interior-Boundary"),
tm_shape(b) + tm_polygons(alpha = 0.5),
tm_shape(b) + tm_polygons(alpha = 0.5),
tm_shape(b) + tm_polygons(alpha = 0.5),
tm_shape(b) + tm_polygons(alpha = 0.5),
tm_shape(b) + tm_polygons(alpha = 0.5),
tm_shape(b) + tm_polygons(alpha = 0.5),
tm_shape(b) + tm_polygons(alpha = 0.5),
nrow = 3
)
plot(b)
text(x = c(-0.5, 1.5), y = 1, labels = c("x", "y")) # add text
```
To demonstrate how DE-9IM strings work, let's take a look at the various ways that the first geometry pair in Figure \@ref(fig:relations) relate.
Figure \@ref(fig:de9imgg) illustrates the 9-intersection model (9IM) which shows the intersections between every combination of each object's interior, boundary and exterior: when each component of the first object `x` is arranged as columns, and each component of `y` is arranged as rows, a facetted graphic is created with the intersections between each element highlighted.
```{r de9imgg, echo=FALSE, warning=FALSE, fig.cap="Illustration of how the Dimensionally Extended 9 Intersection Model (DE-9IM) works. Colors not in the legend represent the overlap between different components. The thick lines highlight two-dimensional intersections, e.g., between the boundary of object x and the interior of object y, shown in the middle top facet.", message=FALSE}
p1_2 = st_as_sf(c(p1, p3))
ii = st_as_sf(st_intersection(p1, p3))
ii$Object = "Intersection"
ii$domain_a = "Interior"
ii$domain_b = "Interior"
bi = st_sf(x = st_intersection(
st_cast(p1, "LINESTRING"),
st_difference(p3, st_buffer(st_cast(p3, "LINESTRING"), dist = 0.01))
))
bi = st_buffer(bi, dist = 0.01)
bi$Object = "Intersection"
bi$domain_a = "Boundary"
bi$domain_b = "Interior"
ei = st_sf(x = st_difference(p3, p1))
ei$Object = "Intersection"
ei$domain_a = "Exterior"
ei$domain_b = "Interior"
ib = st_sf(x = st_intersection(
st_cast(p3, "LINESTRING"),
st_difference(p1, st_buffer(st_cast(p1, "LINESTRING"), dist = 0.005))
))
ib = st_buffer(ib, dist = 0.01)
ib$Object = "Intersection"
ib$domain_a = "Interior"
ib$domain_b = "Boundary"
bb = st_cast(ii, "POINT")
bb_line = st_sf(x = st_sfc(st_linestring(matrix(c(1, 0.5, 1, 0.7), nrow = 2, byrow = TRUE))))
bb_line_buffer = st_buffer(bb_line, dist = 0.01)
bb_buffer = st_buffer(bb, dist = 0.01)
bb = st_union(bb_buffer, bb_line_buffer)
bb$Object = "Intersection"
bb$domain_a = "Boundary"
bb$domain_b = "Boundary"
eb = st_sf(x = st_difference(
st_cast(p3, "LINESTRING"),
p1
))
eb = st_buffer(eb, dist = 0.01)
eb$Object = "Intersection"
eb$domain_a = "Exterior"
eb$domain_b = "Boundary"
ie = st_sf(x = st_difference(p1, p3))
ie$Object = "Intersection"
ie$domain_a = "Interior"
ie$domain_b = "Exterior"
be = st_sf(x = st_difference(
st_cast(p1, "LINESTRING"),
p3
))
be = st_buffer(be, dist = 0.01)
be$Object = "Intersection"
be$domain_a = "Boundary"
be$domain_b = "Exterior"
ee = st_sf(x = st_difference(
st_buffer(st_union(p1, p3), 0.02),
st_union(p1, p3)
))
ee$Object = "Intersection"
ee$domain_a = "Exterior"
ee$domain_b = "Exterior"
b9 = replicate(p1_2, n = 9, simplify = FALSE)
b9sf = do.call(rbind, b9)
b9sf$Object = rep(c("x", "y"), 9)
domains = c("Interior", "Boundary", "Exterior")
b9sf$domain_a = rep(rep(domains, 3), each = 2)
b9sf$domain_b = rep(rep(domains, each = 3), each = 2)
b9sf = rbind(b9sf, ii, bi, ei, ib, bb, eb, ie, be, ee)
b9sf$domain_a = ordered(b9sf$domain_a, levels = c("Interior", "Boundary", "Exterior"))
b9sf$domain_b = ordered(b9sf$domain_b, levels = c("Interior", "Boundary", "Exterior"))
b9sf = b9sf |>
mutate(alpha = case_when(
Object == "x" ~ 0.1,
Object == "y" ~ 0.1,
TRUE ~ 0.2
))
library(ggplot2)
ggplot(b9sf) +
geom_sf(aes(fill = Object, alpha = alpha)) +
facet_grid(domain_b ~ domain_a) +
scale_fill_manual(values = c("red", "lightblue", "yellow"), position = "top", name = "") +
scale_alpha_continuous(range = c(0.3, 0.9)) +
guides(alpha = "none") +
theme_void() +
theme(legend.position = "top")
```
DE-9IM strings are derived from the dimension of each type of relation.
In this case, the red intersections in Figure \@ref(fig:de9imgg) have dimensions of 0 (points), 1 (lines), and 2 (polygons), as shown in Table \@ref(tab:de9emtable).
```{r de9emtable, echo=FALSE}
# See https://github.com/geocompx/geocompr/issues/699
pattern = st_relate(p1, p3)
matrix_de_9im = function(pattern) {
string = unlist(strsplit(pattern , ""))
matrix_de_9im = matrix(string, nrow = 3, byrow = TRUE)
colnames(matrix_de_9im) = c("I", "B", "E")
row.names(matrix_de_9im) = c("I", "B", "E")
return(matrix_de_9im)
}
m = matrix_de_9im(pattern)
colnames(m) = c("Interior (x)", "Boundary (x)", "Exterior (x)")
rownames(m) = c("Interior (y)", "Boundary (y)", "Exterior (y)")
knitr::kable(m, caption = "Relations between interiors, boundaries and exteriors of geometries x and y.")
```
Flattening this matrix 'row-wise' (meaning concatenating the first row, then the second, then the third) results in the string `212111212`.
Another example will serve to demonstrate the system:
the relation shown in Figure \@ref(fig:relations) (the third polygon pair in the third column and 1st row) can be defined in the DE-9IM system as follows:
- The intersections between the *interior* of the larger object `x` and the interior, boundary and exterior of `y` have dimensions of 2, 1 and 2, respectively
- The intersections between the *boundary* of the larger object `x` and the interior, boundary and exterior of `y` have dimensions of F, F and 1, respectively, where 'F' means 'false', the objects are disjoint
- The intersections between the *exterior* of `x` and the interior, boundary and exterior of `y` have dimensions of F, F and 2, respectively: the exterior of the larger object does not touch the interior or boundary of `y`, but the exterior of the smaller and larger objects cover the same area
These three components, when concatenated, create the string `212`, `FF1`, and `FF2`.
This is the same as the result obtained from the function `st_relate()` (see the source code of this chapter to see how other geometries in Figure \@ref(fig:relations) were created):
```{r}
xy2sfc = function(x, y) st_sfc(st_polygon(list(cbind(x, y))))
x = xy2sfc(x = c(0, 0, 1, 1, 0), y = c(0, 1, 1, 0.5, 0))
y = xy2sfc(x = c(0.7, 0.7, 0.9, 0.7), y = c(0.8, 0.5, 0.5, 0.8))
st_relate(x, y)
```
Understanding DE-9IM strings allows new binary spatial predicates to be developed.
The help page `?st_relate` contains function definitions for 'queen' and 'rook' relations in which polygons share a border or only a point, respectively.
'Queen' relations mean that 'boundary-boundary' relations (the cell in the second column and the second row in Table \@ref(tab:de9emtable), or the 5th element of the DE-9IM string) must not be empty, corresponding to the pattern `F***T****`, while for 'rook' relations, the same element must be 1 (meaning a linear intersection) (see Figure \@ref(fig:queens)).
These are implemented as follows:
```{r}
st_queen = function(x, y) st_relate(x, y, pattern = "F***T****")
st_rook = function(x, y) st_relate(x, y, pattern = "F***1****")
```
Building on the object `x` created previously, we can use the newly created functions to find out which elements in the grid are a 'queen' and 'rook' in relation to the middle square of the grid as follows:
```{r queenscode, fig.show='hide'}
grid = st_make_grid(x, n = 3)
grid_sf = st_sf(grid)
grid_sf$queens = lengths(st_queen(grid, grid[5])) > 0
plot(grid, col = grid_sf$queens)
grid_sf$rooks = lengths(st_rook(grid, grid[5])) > 0
plot(grid, col = grid_sf$rooks)
```
```{r queens, fig.cap="Demonstration of custom binary spatial predicates for finding queen (left) and rook (right) relations to the central square in a grid with 9 geometries.", echo=FALSE, warning=FALSE}
st_crs(grid_sf) = "EPSG:2180"
tm_shape(grid_sf) +
tm_fill(fill = c("queens", "rooks"),
fill.scale = tm_scale(values = c("white", "black"))) +
tm_shape(grid_sf) +
tm_borders(col = "gray", lwd = 2) +
tm_layout(frame = FALSE, legend.show = FALSE,
panel.labels = c("queen", "rook"))
```
<!-- Another of a custom binary spatial predicate is 'overlapping lines' which detects lines that overlap for some or all of another line's geometry. -->
<!-- This can be implemented as follows, with the pattern signifying that the intersection between the two line interiors must be a line: -->
```{r, echo=FALSE, eval=FALSE}
st_lineoverlap = function(x, y) st_relate(x, y, pattern = "T*1******")
line1 = st_sfc(st_linestring(cbind(
x = c(0, 0.8),
y = c(0, 0)
)))
line2 = st_sfc(st_linestring(cbind(
x = c(0.1, 0.5),
y = c(0, 0)
)))
line3 = st_sfc(st_linestring(cbind(
x = c(0, 0.5),
y = c(0, 0.2)
)))
st_queen(line1, line2)
st_relate(line1, line2)
st_relate(line1, line3)
st_lineoverlap(line1, line2)
st_lineoverlap(line1, line3)
de_9im(line1, line2)
# test the function
rnet = pct::get_pct_rnet(region = "isle-of-wight")
osm_net = osmextract::oe_get_network(place = "isle-of-wight", mode = "driving")
sel = st_relate(rnet, osm_net, pattern = "T*1******")
summary(lengths(sel) > 0)
rnet_joined1 = st_join(rnet, osm_net, join = st_lineoverlap)
rnet_joined2 = st_join(rnet, osm_net, join = st_relate, pattern = "T*1******")
rnet_joined3 = st_join(rnet, osm_net)
summary(is.na(rnet_joined1$osm_id))
summary(is.na(rnet_joined2$osm_id))
summary(is.na(rnet_joined3$osm_id))
sel_relates = st_relate(rnet[1, ], osm_net)
dim(sel_relates)
sel_table = table(sel_relates)
sel_table
dim(sel_table)
sel_restrictive = sel_relates[1, ] == "0F1FF0102"
summary(sel_restrictive)
nrow(osm_net)
mapview::mapview(rnet[1, ]) + mapview::mapview(osm_net[sel_restrictive, ])
rnet_approx = rnet
st_precision(rnet_approx) = 100
head(st_coordinates(rnet_approx))
sel_relates = st_relate(rnet_approx[1, ], osm_net)
dim(sel_relates)
sel_table = table(sel_relates)
sel_table
```
### Spatial joining
Joining two non-spatial datasets relies on a shared 'key' variable, as described in Section \@ref(vector-attribute-joining).
Spatial data joining applies the same concept, but instead relies on spatial relations, described in the previous section.
As with attribute data, joining adds new columns to the target object (the argument `x` in joining functions), from a source object (`y`).
\index{join!spatial}
\index{spatial!join}
The process is illustrated by the following example: imagine you have ten points randomly distributed across the Earth's surface and you ask, for the points that are on land, which countries are they in?
Implementing this idea in a [reproducible example](https://github.com/geocompx/geocompr/blob/main/code/04-spatial-join.R) will build your geographic data-handling skills and will showhow spatial joins work.
The starting point is to create points that are randomly scattered over the Earth's surface.
```{r 04-spatial-operations-19}
set.seed(2018) # set seed for reproducibility
(bb = st_bbox(world)) # the world's bounds
random_df = data.frame(
x = runif(n = 10, min = bb[1], max = bb[3]),
y = runif(n = 10, min = bb[2], max = bb[4])
)
random_points = random_df |>
st_as_sf(coords = c("x", "y"), crs = "EPSG:4326") # set coordinates and CRS
```
The scenario illustrated in Figure \@ref(fig:spatial-join) shows that the `random_points` object (top left) lacks attribute data, while the `world` (top right) has attributes, including country names shown for a sample of countries in the legend.
Spatial joins are implemented with `st_join()`, as illustrated in the code chunk below.
The output is the `random_joined` object which is illustrated in Figure \@ref(fig:spatial-join) (bottom left).
Before creating the joined dataset, we use spatial subsetting to create `world_random`, which contains only countries that contain random points, to verify that number of country names returned in the joined dataset should be four (Figure \@ref(fig:spatial-join), top right panel).
```{r 04-spatial-operations-20, message=FALSE}
world_random = world[random_points, ]
nrow(world_random)
random_joined = st_join(random_points, world["name_long"])
```
```{r spatial-join, echo=FALSE, fig.cap="Illustration of a spatial join. A new attribute variable is added to random points (top left) from source world object (top right) resulting in the data represented in the final panel.", fig.asp=0.5, warning=FALSE, message=FALSE, out.width="100%", fig.scap="Illustration of a spatial join."}
source("code/04-spatial-join.R", print.eval = TRUE)
```
By default, `st_join()` performs a left join, meaning that the result is an object containing all rows from `x` including rows with no match in `y` (see Section \@ref(vector-attribute-joining)), but it can also do inner joins by setting the argument `left = FALSE`.
Like spatial subsetting, the default topological operator used by `st_join()` is `st_intersects()`, which can be changed by setting the `join` argument (see `?st_join` for details).
The example above demonstrates the addition of a column from a polygon layer to a point layer, but the approach works regardless of geometry types.
In such cases, for example when `x` contains polygons, each of which matches multiple objects in `y`, spatial joins will result in duplicate features by creating a new row for each match in `y`.
### Distance-based joins {#non-overlapping-joins}
Sometimes two geographic datasets do not intersect but still have a strong geographic relationship due to their proximity.
The datasets `cycle_hire` and `cycle_hire_osm`, already attached in the **spData** package, provide a good example.
Plotting them shows that they are often closely related, but they do not touch, as shown in Figure \@ref(fig:cycle-hire), a base version of which is created with the following code below:
\index{join!non-overlapping}
```{r 04-spatial-operations-21, eval=FALSE}
plot(st_geometry(cycle_hire), col = "blue")
plot(st_geometry(cycle_hire_osm), add = TRUE, pch = 3, col = "red")
```
We can check if any points are the same using `st_intersects()` as shown below:
```{r 04-spatial-operations-22, message=FALSE}
any(st_intersects(cycle_hire, cycle_hire_osm, sparse = FALSE))
```
```{r 04-spatial-operations-23, echo=FALSE, eval=FALSE}
# included to show alternative ways of showing there's no overlap
sum(st_geometry(cycle_hire) %in% st_geometry(cycle_hire_osm))
sum(st_coordinates(cycle_hire)[, 1] %in% st_coordinates(cycle_hire_osm)[, 1])
```
```{r cycle-hire, fig.cap="The spatial distribution of cycle hire points in London based on official data (blue) and OpenStreetMap data (red).", echo=FALSE, warning=FALSE, fig.scap="The spatial distribution of cycle hire points in London."}
if (knitr::is_latex_output()){
knitr::include_graphics("images/cycle-hire-1.png")
} else if (knitr::is_html_output()){
# library(tmap)
# osm_tiles = tmaptools::read_osm(tmaptools::bb(cycle_hire, ext = 1.3), type = "https://korona.geog.uni-heidelberg.de/tiles/roadsg/x={x}&y={y}&z={z}")
# qtm(osm_tiles) +
# tm_shape(cycle_hire) +
# tm_bubbles(col = "blue", alpha = 0.5, size = 0.2) +
# tm_shape(cycle_hire_osm) +
# tm_bubbles(col = "red", alpha = 0.5, size = 0.2) +
# tm_scale_bar()
library(leaflet)
leaflet() |>
# addProviderTiles(providers$OpenStreetMap.BlackAndWhite) |>
addCircles(data = cycle_hire) |>
addCircles(data = cycle_hire_osm, col = "red")
}
```
Imagine that we need to join the `capacity` variable in `cycle_hire_osm` onto the official 'target' data contained in `cycle_hire`.
This is when a non-overlapping join is needed.
The simplest method is to use the binary predicate `st_is_within_distance()`, as demonstrated below using a threshold distance of 20 m.
One can set the threshold distance in metric units also for unprojected data (e.g., lon/lat CRSs such as WGS84), if the spherical geometry engine (S2) is enabled, as it is in **sf** by default (see Section \@ref(s2)).
```{r 04-spatial-operations-24}
sel = st_is_within_distance(cycle_hire, cycle_hire_osm,
dist = units::set_units(20, "m"))
summary(lengths(sel) > 0)
```
This shows that there are `r sum(lengths(sel) > 0)` points in the target object `cycle_hire` within the threshold distance of `cycle_hire_osm`.
How to retrieve the *values* associated with the respective `cycle_hire_osm` points?
The solution is again with `st_join()`, but with an additional `dist` argument (set to 20 m below):
```{r 04-spatial-operations-25}
z = st_join(cycle_hire, cycle_hire_osm, st_is_within_distance,
dist = units::set_units(20, "m"))
nrow(cycle_hire)
nrow(z)
```
Note that the number of rows in the joined result is greater than the target.
This is because some cycle hire stations in `cycle_hire` have multiple matches in `cycle_hire_osm`.
To aggregate the values for the overlapping points and return the mean, we can use the aggregation methods learned in Chapter \@ref(attr), resulting in an object with the same number of rows as the target.
```{r 04-spatial-operations-26}
z = z |>
group_by(id) |>
summarize(capacity = mean(capacity))
nrow(z) == nrow(cycle_hire)
```
The capacity of nearby stations can be verified by comparing a plot of the capacity of the source `cycle_hire_osm` data with the results in this new object (plots not shown):
```{r 04-spatial-operations-27, eval=FALSE}
plot(cycle_hire_osm["capacity"])
plot(z["capacity"])
```
The result of this join has used a spatial operation to change the attribute data associated with simple features; the geometry associated with each feature has remained unchanged.
### Spatial aggregation {#spatial-aggr}
As with attribute data aggregation, spatial data aggregation *condenses* data: aggregated outputs have fewer rows than non-aggregated inputs.
Statistical *aggregating functions*, such as mean average or sum, summarize multiple values \index{statistics} of a variable, and return a single value per *grouping variable*.
Section \@ref(vector-attribute-aggregation) demonstrated how `aggregate()` and `group_by() |> summarize()` condense data based on attribute variables, this section shows how the same functions work with spatial objects.
\index{aggregation!spatial}
Returning to the example of New Zealand, imagine you want to find out the average height of high points in each region: it is the geometry of the source (`y` or `nz` in this case) that defines how values in the target object (`x` or `nz_height`) are grouped.
This can be done in a single line of code with base R's `aggregate()` method.
```{r 04-spatial-operations-28}
nz_agg = aggregate(x = nz_height, by = nz, FUN = mean)
```
The result of the previous command is an `sf` object with the same geometry as the (spatial) aggregating object (`nz`), which you can verify with the command `identical(st_geometry(nz), st_geometry(nz_agg))`.
The result of the previous operation is illustrated in Figure \@ref(fig:spatial-aggregation), which shows the average value of features in `nz_height` within each of New Zealand's 16 regions.
The same result can also be generated by piping the output from `st_join()` into the 'tidy' functions `group_by()` and `summarize()` as follows:
```{r spatial-aggregation, echo=FALSE, fig.cap="Average height of the top 101 high points across the regions of New Zealand.", fig.asp=1, message=FALSE, out.width="50%"}
library(tmap)
tm_shape(nz_agg) +
tm_fill("elevation",
fill.scale = tm_scale(breaks = seq(27, 30, by = 0.5) * 1e2)) +
tm_borders() +
tm_layout(scale = 1.8)
```
```{r 04-spatial-operations-29}
nz_agg2 = st_join(x = nz, y = nz_height) |>
group_by(Name) |>
summarize(elevation = mean(elevation, na.rm = TRUE))
```
```{r test-tidy-spatial-join, eval=FALSE, echo=FALSE}
plot(nz_agg)
plot(nz_agg2)
# aggregate looses the name of aggregating objects
```
The resulting `nz_agg` objects have the same geometry as the aggregating object `nz` but with a new column summarizing the values of `x` in each region using the function `mean()`.
Other functions could be used instead of `mean()` here, including `median()`, `sd()` and other functions that return a single value per group.
Note: one difference between the `aggregate()` and `group_by() |> summarize()` approaches is that the former results in `NA` values for unmatching region names, while the latter preserves region names.
The 'tidy' approach is thus more flexible in terms of aggregating functions and the column names of the results.
Aggregating operations that also create new geometries are covered in Section \@ref(geometry-unions).
### Joining incongruent layers {#incongruent}
Spatial congruence\index{spatial congruence} is an important concept related to spatial aggregation.
An *aggregating object* (which we will refer to as `y`) is *congruent* with the target object (`x`) if the two objects have shared borders.
Often this is the case for administrative boundary data, whereby larger units --- such as Middle Layer Super Output Areas ([MSOAs](https://www.ons.gov.uk/methodology/geography/ukgeographies/censusgeography)) in the UK or districts in many other European countries --- are composed of many smaller units.
*Incongruent* aggregating objects, by contrast, do not share common borders with the target [@qiu_development_2012].
This is problematic for spatial aggregation (and other spatial operations) illustrated in Figure \@ref(fig:areal-example): aggregating the centroid of each sub-zone will not return accurate results.
Areal interpolation overcomes this issue by transferring values from one set of areal units to another, using a range of algorithms including simple area weighted approaches and more sophisticated approaches such as 'pycnophylactic' methods [@tobler_smooth_1979].
```{r areal-example, echo=FALSE, fig.cap="Congruent (left) and incongruent (right) areal units with respect to larger aggregating zones (translucent red borders).", fig.asp=0.33, fig.scap="Congruent and incongruent areal units."}
source("code/04-areal-example.R", print.eval = TRUE)
```
The **spData** package contains a dataset named `incongruent` (colored polygons with black borders in the right panel of Figure \@ref(fig:areal-example)) and a dataset named `aggregating_zones` (the two polygons with the translucent blue border in the right panel of Figure \@ref(fig:areal-example)).
Let us assume that the `value` column of `incongruent` refers to the total regional income in million Euros.
How can we transfer the values of the underlying nine spatial polygons into the two polygons of `aggregating_zones`?
The simplest useful method for this is *area weighted* spatial interpolation, which transfers values from the `incongruent` object to a new column in `aggregating_zones` in proportion with the area of overlap: the larger the spatial intersection between input and output features, the larger the corresponding value.
This is implemented in `st_interpolate_aw()`, as demonstrated in the code chunk below.
```{r 04-spatial-operations-30}
iv = incongruent["value"] # keep only the values to be transferred
agg_aw = st_interpolate_aw(iv, aggregating_zones, extensive = TRUE)
agg_aw$value
```
In our case it is meaningful to sum up the values of the intersections falling into the aggregating zones since total income is a so-called spatially extensive variable (which increases with area), assuming income is evenly distributed across the smaller zones (hence the warning message above).
This would be different for spatially [intensive](https://geodacenter.github.io/workbook/3b_rates/lab3b.html#spatially-extensive-and-spatially-intensive-variables) variables such as *average* income or percentages, which do not increase as the area increases.
`st_interpolate_aw()` works equally with spatially intensive variables: set the `extensive` parameter to `FALSE` and it will use an average rather than a sum function when doing the aggregation.
## Spatial operations on raster data {#spatial-ras}
This section builds on Section \@ref(manipulating-raster-objects), which highlights various basic methods for manipulating raster datasets, to demonstrate more advanced and explicitly spatial raster operations, and it uses the objects `elev` and `grain` manually created in Section \@ref(manipulating-raster-objects).
For the reader's convenience, these datasets can be also found in the **spData** package.
```{r}
elev = rast(system.file("raster/elev.tif", package = "spData"))
grain = rast(system.file("raster/grain.tif", package = "spData"))
```
### Spatial subsetting {#spatial-raster-subsetting}
The previous chapter (Section \@ref(manipulating-raster-objects)) demonstrated how to retrieve values associated with specific cell IDs or row and column combinations.
Raster objects can also be extracted by location (coordinates) and other spatial objects.
To use coordinates for subsetting, one can 'translate' the coordinates into a cell ID with the **terra** function `cellFromXY()`.
An alternative is to use `terra::extract()` (be careful, there is also a function called `extract()` in the **tidyverse**\index{tidyverse (package)}) to extract values.
Both methods are demonstrated below to find the value of the cell that covers a point located at coordinates of 0.1, 0.1.
\index{raster!subsetting}
\index{spatial!subsetting}
```{r 04-spatial-operations-34, eval = FALSE}
id = cellFromXY(elev, xy = matrix(c(0.1, 0.1), ncol = 2))
elev[id]
# the same as
terra::extract(elev, matrix(c(0.1, 0.1), ncol = 2))
```
Raster objects can also be subset with another raster object, as demonstrated in the code chunk below:
```{r 04-spatial-operations-35, eval=FALSE}
clip = rast(xmin = 0.9, xmax = 1.8, ymin = -0.45, ymax = 0.45,
resolution = 0.3, vals = rep(1, 9))
elev[clip]
# we can also use extract
# terra::extract(elev, ext(clip))
```
This amounts to retrieving the values of the first raster object (in this case `elev`) that fall within the extent of a second raster (here: `clip`), as illustrated in Figure \@ref(fig:raster-subset).
```{r raster-subset, echo = FALSE, fig.cap = "Original raster (left), raster mask (middle), and output of masking a raster (right).", fig.scap="Subsetting raster values."}
knitr::include_graphics("images/04_raster_subset.png")
```
The example above returned the values of specific cells, but in many cases spatial outputs from subsetting operations on raster datasets are needed.
This can be done by setting the `drop` argument of the `[` operator to `FALSE`.
The code below returns the first two cells of `elev`, i.e., the first two cells of the top row, as a raster object (only the first 2 lines of the output is shown):
```{r 04-spatial-operations-36, eval=FALSE}
elev[1:2, drop = FALSE] # spatial subsetting with cell IDs
#> class : SpatRaster
#> dimensions : 1, 2, 1 (nrow, ncol, nlyr)
#> ...
```
```{r 04-spatial-operations-37, echo=FALSE, eval=FALSE}
# aim: illustrate the result of previous spatial subsetting example
x = elev[1, 1:2, drop = FALSE]
plot(x)
```
Another common use case of spatial subsetting is when a raster with `logical` (or `NA`) values is used to mask another raster with the same extent and resolution, as illustrated in Figure \@ref(fig:raster-subset).
In this case, the `[` and `mask()` functions can be used (results not shown).
```{r 04-spatial-operations-38, eval=FALSE}
# create raster mask
rmask = elev
values(rmask) = sample(c(NA, TRUE), 36, replace = TRUE)
```
In the code chunk above, we have created a mask object called `rmask` with values randomly assigned to `NA` and `TRUE`.
Next, we want to keep those values of `elev` which are `TRUE` in `rmask`.
In other words, we want to mask `elev` with `rmask`.
```{r 04-spatial-operations-38b, eval=FALSE}
# spatial subsetting
elev[rmask, drop = FALSE] # with [ operator
# we can also use mask
# mask(elev, rmask)
```
The above approach can be also used to replace some values (e.g., expected to be wrong) with NA.
```{r 04-spatial-operations-38c, eval=FALSE}
elev[elev < 20] = NA
```
These operations are in fact Boolean local operations since we compare cell-wise two rasters.
The next subsection explores these and related operations in more detail.
### Map algebra
\index{map algebra}
The term 'map algebra' was coined in the late 1970s to describe a "set of conventions, capabilities, and techniques" for the analysis of geographic raster *and* (although less prominently) vector data [@tomlin_map_1994].
In this context, we define map algebra more narrowly, as operations that modify or summarize raster cell values, with reference to surrounding cells, zones, or statistical functions that apply to every cell.
Map algebra operations tend to be fast, because raster datasets only implicitly store coordinates, hence the [old adage](https://geozoneblog.wordpress.com/2013/04/19/raster-vs-vector/) "raster is faster but vector is corrector".
The location of cells in raster datasets can be calculated by using its matrix position and the resolution and origin of the dataset (stored in the header).
For the processing, however, the geographic position of a cell is barely relevant as long as we make sure that the cell position is still the same after the processing.
Additionally, if two or more raster datasets share the same extent, projection and resolution, one could treat them as matrices for the processing.
This is the way that map algebra works with the **terra** package.
First, the headers of the raster datasets are queried and (in cases where map algebra operations work on more than one dataset) checked to ensure the datasets are compatible.
Second, map algebra retains the so-called one-to-one locational correspondence, meaning that cells cannot move.
This differs from matrix algebra, in which values change position, for example when multiplying or dividing matrices.
Map algebra (or cartographic modeling with raster data) divides raster operations into four sub-classes [@tomlin_geographic_1990], with each working on one or several grids simultaneously:
1. *Local* or per-cell operations
2. *Focal* or neighborhood operations.
Most often the output cell value is the result of a 3 x 3 input cell block
3. *Zonal* operations are similar to focal operations, but the surrounding pixel grid on which new values are computed can have irregular sizes and shapes
4. *Global* or per-raster operations.
That means the output cell derives its value potentially from one or several entire rasters
This typology classifies map algebra operations by the number of cells used for each pixel processing step and the type of the output.
For the sake of completeness, we should mention that raster operations can also be classified by discipline such as terrain, hydrological analysis, or image classification.
The following sections explain how each type of map algebra operations can be used, with reference to worked examples.
### Local operations
\index{map algebra!local operations}
**Local** operations comprise all cell-by-cell operations in one or several layers.
This includes adding or subtracting values from a raster, squaring and multiplying rasters.
Raster algebra also allows logical operations such as finding all raster cells that are greater than a specific value (5 in our example below).
The **terra** package supports all these operations and more, as demonstrated below (Figure \@ref(fig:04-local-operations)):
```{r 04-spatial-operations-41, eval = FALSE}
elev + elev
elev^2
log(elev)
elev > 5
```
```{r 04-local-operations, echo=FALSE, fig.cap="Examples of different local operations of the elev raster object: adding two rasters, squaring, applying logarithmic transformation, and performing a logical operation."}
knitr::include_graphics("images/04-local-operations.png")
```
Another good example of local operations is the classification of intervals of numeric values into groups such as grouping a digital elevation model into low (class 1), middle (class 2) and high elevations (class 3).
Using the `classify()` command, we need first to construct a reclassification matrix, where the first column corresponds to the lower and the second column to the upper end of the class.
The third column represents the new value for the specified ranges in columns one and two.
```{r 04-spatial-operations-40}
rcl = matrix(c(0, 12, 1, 12, 24, 2, 24, 36, 3), ncol = 3, byrow = TRUE)
rcl
```
Here, we assign the raster values in the ranges 0--12, 12--24 and 24--36 are *reclassified* to take values 1, 2 and 3, respectively.
```{r 04-spatial-operations-40b, eval = FALSE}
recl = classify(elev, rcl = rcl)
```
The `classify()` function can be also used when we want to reduce the number of classes in our categorical rasters.
We will perform several additional reclassifications in Chapter \@ref(location).
Apart from applying arithmetic operators directly, one can also use the `app()`, `tapp()` and `lapp()` functions.
They are more efficient, hence, they are preferable in the presence of large raster datasets.
Additionally, they allow you to save an output file directly.
The `app()` function applies a function to each cell of a raster and is used to summarize (e.g., calculating the sum) the values of multiple layers into one layer.
`tapp()` is an extension of `app()`, allowing us to select a subset of layers (see the `index` argument) for which we want to perform a certain operation.
Finally, the `lapp()` function allows us to apply a function to each cell using layers as arguments -- an application of `lapp()` is presented below.
The calculation of the normalized difference vegetation index (NDVI) is a well-known local (pixel-by-pixel) raster operation.
It returns a raster with values between -1 and 1; positive values indicate the presence of living plants (mostly > 0.2).
NDVI is calculated from red and near-infrared (NIR) bands of remotely sensed imagery, typically from satellite systems such as Landsat or Sentinel.
Vegetation absorbs light heavily in the visible light spectrum, and especially in the red channel, while reflecting NIR light. Here's the NDVI formula:
$$
\begin{split}
NDVI&= \frac{\text{NIR} - \text{Red}}{\text{NIR} + \text{Red}}\\
\end{split}
$$
Let's calculate NDVI for the multi-spectral satellite file of Zion National Park.
```{r}
multi_raster_file = system.file("raster/landsat.tif", package = "spDataLarge")
multi_rast = rast(multi_raster_file)
```
Our raster object has four satellite bands from the Landsat 8 satellite: blue, green, red, and NIR.
Importantly, Landsat level-2 products are stored as integers to save disk space, and thus we need to convert them to floating-point numbers before doing any calculations.