Modelling probability of occurrence

name: 1
class: center middle main-title section-title-4

# Modelling probability of occurrence

.class-info[

**Session 27**

.light[HES597: Introduction to Spatial Data in R<br>
Boise State University Human-Environment Systems<br>
Fall 2021]

]
---
# Goals for today

* Describe the general analysis situation for event distribution models

* Introduce common data structures and analyses for distribution models

* Discuss inferential limitations for different model types

---
name: motivations
class: center middle main-title section-title-4

# Motivations
---
# Why do we create distribution models?
.pull-left[

* To identify important correlations between predictors and the occurrence of an event

* To generate maps of the 'range' or 'niche' of events

* To understand spatial patterns of event co-occurrence

* To forecast changes in event distributions due to changes in those predictors

]
.pull-right[
<figure>
  <img src="img/14/climchange.png" alt="ZZZ" title="ZZZ" width="100%">
</figure>
.caption[
From [Wiens et al. 2009](https://www.pnas.org/content/106/Supplement_2/19729)
]
]
---
# General analysis situation for spatial models of occurrence
.pull-left[
<figure>
  <img src="img/14/SDMfigure1resized.png" alt="ZZZ" title="ZZZ" width="100%">
</figure>
.caption[
From [Long](https://www.biodiversityscience.com/2011/04/27/species-distribution-modelling/)
]
]
.pull-right[

* Spatially referenced locations of events `$(\mathbf{y})$` sampled from the study extent

* A matrix of predictors `$(\mathbf{X})$` that can be assigned to each event based on spatial location

* Additional (non-spatial) predictors `$(\mathbf{Z})$` may describe the sampling process
]

* __Goal__: Estimate the probability of occurrence of events across unsampled regions of the study area based on correlations with predictors
---
name: logistic
class: center middle main-title section-title-4

# Modelling Presence-Absence Data
---

# The sampling situation
.pull-left[
* Random or systematic sample of the study region

* The presence (or absence) of the event is recorded for each point

* Hypothesized predictors of occurrence are measured (or extracted) at each point 
]
.pull-right[
<figure>
  <img src="img/14/Predicting_habitats.png" alt="ZZZ" title="ZZZ" width="100%">
</figure>
.caption[
From [By Ragnvald - Own work, CC BY-SA 3.0](https://commons.wikimedia.org/w/index.php?curid=2107716)
]
]
---
# Logistic regression
.pull-left[
* When we have presences and absences across the study area we can model the __probability__ of occurrence using a logistic regression:

$$ y_{i} \sim \text{Bern}(p_i)\\
\text{link}(p_i) = \mathbf{x_i}'\beta + \alpha
$$
* A _link_ function is used to map the linear predictor `$(\mathbf{x_i}'\beta + \alpha)$` onto the support (0-1) for probabilities

* Most `R` modelling packages use the logit link `$ln\big(\frac{p}{1-p}\big)$` as the default; interpreted as the _log-odds_

* Estimates of `$\beta$` can then be used to generate 'wall-to-wall' spatial predictions
]

.pull-right[
<figure>
  <img src="img/14/Probit.png" alt="ZZZ" title="ZZZ" width="100%">
</figure>
.caption[
From [Mendoza](https://www.ou.edu/faculty/M/Jorge.L.Mendoza-1/comparison_of_probit.htm)
]
]
---
# Key assumptions of logistic regression

* Dependent variable must be binary

* Observations must be independent (important for spatial analyses)

* Predictors should not be collinear

* Predictors should be linearly related to the log-odds

* __Sample Size__
---
# Alternatives to logistic regression
.pull-left[
* Classification and regression trees

* Random Forests

* Support-Vector Machines

* Artifical Neural Nets

* Lots of info in [Introduction to Statistical Learning](https://web.stanford.edu/~hastie/ISLR2/ISLRv2_website.pdf)
]

.pull-right[
<figure>
  <img src="img/14/randomforest.png" alt="ZZZ" title="ZZZ" width="100%">
</figure>
.caption[
From [Misra et al. 2020](https://www.sciencedirect.com/topics/engineering/random-forest)
]
]
---
name: maxent
class: center middle main-title section-title-4

# Modelling Presence-Background Data
---
# The sampling situation
.pull-left[
<figure>
  <img src="img/14/maxentresult.png" alt="ZZZ" title="ZZZ" width="100%">
</figure>
.caption[
From [Lentz et al. 2008](https://www.journals.uchicago.edu/doi/full/10.1086/528754)
]
]

.pull-right[

* Opportunistic collection of presences only

* Hypothesized predictors of occurrence are measured (or extracted) at each presence

* Background points (or pseudoabsences) generated for comparison
]
---
# Maximum Entropy models
.pull-left[
* Commonly referred to as MaxEnt (after the original software)

* Relies on the generation of _plausible_ background points across the remainder of the study area

* Iterative fitting to maximize the distance between predictions generated by a spatially uniform model

* Tuning parameters to account for differences in sampling effort, placement of background points, etc

* Development of the model beyond the scope of this course, but see [Elith et al. 2010](https://web.stanford.edu/~hastie/Papers/maxent_explained.pdf)
]
.pull-right[
<figure>
  <img src="img/14/maxentschem.png" alt="ZZZ" title="ZZZ" width="100%">
</figure>
.caption[
From [Elith et al. 2010](https://web.stanford.edu/~hastie/Papers/maxent_explained.pdf)
]
]
---
# Challenges with MaxEnt

* Not measuring _probability_, but relative likelihood of occurrence

* Sampling bias affects estimation (but can be mitigated using tuning parameters)

* Theoretical issues with background points and the intercept

* Recent developments relate MaxEnt (with cloglog links) to Inhomogenous Point Process models

---
name: occupancy
class: center middle main-title section-title-4

# Modelling data when detection is imperfect
---
# The sampling situation

.pull-left[
* Random or systematic sample of the study region

* The presence (or absence) of the event is recorded for each point

* Hypothesized predictors of occurrence are measured (or extracted) at each point

* Imperfect detection makes _absences_ __ambiguous__

* Repeated measurements at the same location can help (time vs. space)
]
.pull-right[
<figure>
  <img src="img/14/birds.png" alt="ZZZ" title="ZZZ" width="100%">
</figure>
]
---
# Occupancy models

.pull-left[
* Mixture of both and ecological process and an observation process
<figure>
  <img src="img/14/occ1.png" alt="ZZZ" title="ZZZ" width="100%">
</figure>
<figure>
  <img src="img/14/occ2.png" alt="ZZZ" title="ZZZ" width="80%">
</figure>
]
.pull-right[
<figure>
  <img src="img/14/williamsonocc.png" alt="ZZZ" title="ZZZ" width="100%">
</figure>
.caption[
From [Williamson et al. 2021](https://conbio.onlinelibrary.wiley.com/doi/full/10.1111/cobi.13673)
]
]
* __Predictor effects__ estimated conditional on detection probability
---
# Implications of accounting for detection
<figure>
  <img src="img/14/mappeddif.png" alt="ZZZ" title="ZZZ" width="60%">
</figure>
.caption[
From [Williamson et al. 2021](https://conbio.onlinelibrary.wiley.com/doi/full/10.1111/cobi.13673)
]
---
# Parting thoughts

* This is a _very_ brief (and rushed) introduction, lots of literature and new developments

* None of these models may be any good (we'll look at that on Thurs and Mon)

* Bayesian extensions; applications with telemetry; changes through time