Building spatial databases using locations

name: 1
class: center middle main-title section-title-4

# Building spatial databases using locations

.class-info[

**Session 13**

.light[HES597: Introduction to Spatial Data in R<br>
Boise State University Human-Environment Systems<br>
Fall 2021]

]

---

# Outline for today

- Update on assignments and mini-projects

- Refresher: Building a spatial analysis workflow

- Building a database for an analysis (part 2) based on location

---

# Update on assignments

- Looking good for the most part
  - _Please knit your final documents_

- Class projects
  - Full description of each will be on class webpage by 8 Oct 2012
  - Updated due dates
    - Mini-project 1: October 25
    - Mini-project 2: November 12
  - Schedule updated by end of today
  
---
class: center middle
# Revisiting spatial analysis

---
name: workflows
class: center middle main-title section-title-4

# Workflows for spatial analysis
---
# Workflows for spatial analysis

.pull-left[
<figure>
  <img src="img/07/Process.png" alt="ZZZ" title="General workflow" width="80%">
</figure>
.caption[
courtesy of [Humboldt State University](http://gsp.humboldt.edu/olm/Lessons/GIS/06%20Vector%20Analysis%20Attributes/00_SpatialAnalysis.html)
  ]
]

.pull-right[

- Align processing with objectives

- Imagining the visualizations and analysis clarifies file formats and variables

- Helps build reproducibility 
]

---

# Databases and attributes

.pull-left[
<figure>
  <img src="img/06/4.1.png" alt="ZZZ" title="DB orientation" width="100%">
</figure>
.caption[
courtesy of [Giscommons](https://giscommons.org/data-tables-and-data-preprocessing/)
]
]
.pull-right[

- Attributes: Information that further describes a spatial feature

- Attributes &rightarrow; predictors for analysis

- Last week focus on thematic relations between datasets
  - Shared 'keys' help define linkages between objects

- Sometimes we are interested in attributes that describe location (overlaps, contains, distance)

- Sometimes we want to join based on location rather than thematic connections
  - __Must have the same CRS__
]

---

# Calculating attributes based on geometry and location vector data

- Attributes like area and length can be useful for a number of analyses
  - Estimates of 'effort' in sampling designs
  - Offsets for modeling rates (e.g., Poisson regression)
  
- Need to assign the result of the function to a column in data frame (e.g., `$`, `mutate`, and `summarize`)

- Often useful to test before assigning

---
# Estimating area

- `sf` bases area (and length) calculations on the map units of the CRS

- the `units` library allows conversion into a variety of units

- can use `st_length` in the same way

.pull-left[

```r
nz.sf <- nz %>% 
  mutate(area = set_units(st_area(nz), km^2))
head(nz.sf$area, 3)
```

```
## Units: [km^2]
## [1] 12890.576  4911.565 24588.820
```
]

.pull-right[

```r
nz.sf$areaagain <- set_units(st_area(nz), km^2)
head(nz.sf$areaagain, 3)
```

```
## Units: [km^2]
## [1] 12890.576  4911.565 24588.820
```
]

---
# Extending area

- Sometimes we want to estimate the area of overlap between two vectors
  - How much of home range _a_ occurs on soil type _b_
  - How much of each Census tract is contained with a service provision area?
- `st_intersection`, `st_union`, and `st_difference` return new geometries whose area we can estimate
  
.pull-left[
<img src="07-slides_files/figure-html/overlap-1.png" width="504" style="display: block; margin: auto;" />
]

.pull-right[

```r
intersect_pct <- st_intersection(nc, tr_buff) %>% 
   mutate(intersect_area = st_area(.)) %>%   # create new column with shape area
   dplyr::select(NAME, intersect_area) %>%   # only select columns needed to merge
   st_drop_geometry()

nc <- mutate(nc, county_area = st_area(nc))

# Merge by county name
nc <- merge(nc, intersect_pct, by = "NAME", all.x = TRUE)

# Calculate coverage
nc <- nc %>% 
   mutate(coverage = as.numeric(intersect_area/county_area))
```
]

---
# Extending area

```r
ggplot() +
  geom_sf(data = nc, aes(fill=coverage)) +
  geom_sf(data = tr_buff, fill=NA, color="red")
```

<img src="07-slides_files/figure-html/plotover-1.png" width="504" style="display: block; margin: auto;" />
---
# Estimating distance

- As a covariate

- For use in covariance matrices

- As a means of assigning connections in networks

---

# Estimating distance
 -  `grepl` here is returning a `logical` (TRUE/FALSE) result for items in `nz$Name` that match (partially) "Canter" or (`|`) "Otag"
 
 -  `nz_height[canterbury, ]` is subsetting the `nz_height` dataset based on the `canterbury` polygon
 
 - `nz_height[1]` occurs outside Otago (in red) while the remaining are _in_ Otago (so distance is 0)
 
.pull-left[

```r
canterbury = nz %>% filter(Name == "Canterbury")
canterbury_height = nz_height[canterbury, ]
co = filter(nz, grepl("Canter|Otag", Name))
st_distance(nz_height[1:3, ], co)
```

```
## Units: [m]
##           [,1]     [,2]
## [1,] 123537.16 15497.72
## [2,]  94282.77     0.00
## [3,]  93018.56     0.00
```
]
.pull-right[

```r
plot(st_geometry(co)[2], col="red") 
plot(st_geometry(nz_height)[1], col="blue", add=TRUE)
plot(st_geometry(nz_height)[2:3], add = TRUE, col="black")
```

<img src="07-slides_files/figure-html/distplot-1.png" width="324" style="display: block; margin: auto;" />
]

---
# Topological Subsetting

- Topological relations describe the spatial relationships between objects
- We can use the overlap (or not) of vector data to subset the data based on topology
- Easiest way is to use `[` notation, but also most restrictive

.pull-left[

```r
canterbury_height = nz_height[canterbury, ]
```
]
.pull-right[

```r
plot(st_geometry(canterbury))
plot(st_geometry(nz_height), col="red", add=TRUE)
plot(st_geometry(canterbury_height), col="blue", add=TRUE)
```

<img src="07-slides_files/figure-html/plotsub-1.png" width="360" style="display: block; margin: auto;" />
]
---
# Topological Subsetting
- Lots of verbs in `sf` for doing this (e.g., `st_intersects`, `st_contains`, `st_touches`)
- see `?geos_binary_pred` for a full list
- The `sparse` option controls how the results are returned
- We can then find out if one or more elements satisfies the criteria

__Using `sparse=TRUE`__

```r
st_intersects(nz_height, co, sparse = TRUE)[1:3] 
```

```
## [[1]]
## integer(0)
## 
## [[2]]
## [1] 2
## 
## [[3]]
## [1] 2
```

```r
lengths(st_intersects(nz_height, co, sparse = TRUE))[1:3] > 0
```

```
## [1] FALSE  TRUE  TRUE
```
---
# Topological Subsetting

- The `sparse` option controls how the results are returned
- We can then find out if one or more elements satisfies the criteria

__Using `sparse=FALSE`__

```r
st_intersects(nz_height, co, sparse = FALSE)[1:3] 
```

```
## [1] FALSE FALSE FALSE
```

```r
apply(st_intersects(nz_height, co, sparse = TRUE), 1,any)[1:3]
```

```
## [1] FALSE  TRUE  TRUE
```

---
# Topological Subsetting

.pull-left[

```r
canterbury_height3 = nz_height %>%
  filter(st_intersects(x = ., y = canterbury, sparse = FALSE))
```
]
.pull-right[
<img src="07-slides_files/figure-html/subsetplot-1.png" width="504" style="display: block; margin: auto;" />
]

---
# Spatial Joins

- `sf` package provides `st_join` for vectors

- Allows joins based on the predicates (`st_intersects`, `st_touches`, `st_within_distance`, etc.)

- Default is a left join