name: 1 class: center middle main-title section-title-4 # Modelling probability of occurrence .class-info[ **Session 27** .light[HES597: Introduction to Spatial Data in R<br> Boise State University Human-Environment Systems<br> Fall 2021] ] --- # Goals for today * Describe the general analysis situation for event distribution models * Introduce common data structures and analyses for distribution models * Discuss inferential limitations for different model types --- name: motivations class: center middle main-title section-title-4 # Motivations --- # Why do we create distribution models? .pull-left[ * To identify important correlations between predictors and the occurrence of an event * To generate maps of the 'range' or 'niche' of events * To understand spatial patterns of event co-occurrence * To forecast changes in event distributions due to changes in those predictors ] .pull-right[ <figure> <img src="img/14/climchange.png" alt="ZZZ" title="ZZZ" width="100%"> </figure> .caption[ From [Wiens et al. 2009](https://www.pnas.org/content/106/Supplement_2/19729) ] ] --- # General analysis situation for spatial models of occurrence .pull-left[ <figure> <img src="img/14/SDMfigure1resized.png" alt="ZZZ" title="ZZZ" width="100%"> </figure> .caption[ From [Long](https://www.biodiversityscience.com/2011/04/27/species-distribution-modelling/) ] ] .pull-right[ * Spatially referenced locations of events `\((\mathbf{y})\)` sampled from the study extent * A matrix of predictors `\((\mathbf{X})\)` that can be assigned to each event based on spatial location * Additional (non-spatial) predictors `\((\mathbf{Z})\)` may describe the sampling process ] * __Goal__: Estimate the probability of occurrence of events across unsampled regions of the study area based on correlations with predictors --- name: logistic class: center middle main-title section-title-4 # Modelling Presence-Absence Data --- # The sampling situation .pull-left[ * Random or systematic sample of the study region * The presence (or absence) of the event is recorded for each point * Hypothesized predictors of occurrence are measured (or extracted) at each point ] .pull-right[ <figure> <img src="img/14/Predicting_habitats.png" alt="ZZZ" title="ZZZ" width="100%"> </figure> .caption[ From [By Ragnvald - Own work, CC BY-SA 3.0](https://commons.wikimedia.org/w/index.php?curid=2107716) ] ] --- # Logistic regression .pull-left[ * When we have presences and absences across the study area we can model the __probability__ of occurrence using a logistic regression: $$ y_{i} \sim \text{Bern}(p_i)\\ \text{link}(p_i) = \mathbf{x_i}'\beta + \alpha $$ * A _link_ function is used to map the linear predictor `\((\mathbf{x_i}'\beta + \alpha)\)` onto the support (0-1) for probabilities * Most `R` modelling packages use the logit link `\(ln\big(\frac{p}{1-p}\big)\)` as the default; interpreted as the _log-odds_ * Estimates of `\(\beta\)` can then be used to generate 'wall-to-wall' spatial predictions ] .pull-right[ <figure> <img src="img/14/Probit.png" alt="ZZZ" title="ZZZ" width="100%"> </figure> .caption[ From [Mendoza](https://www.ou.edu/faculty/M/Jorge.L.Mendoza-1/comparison_of_probit.htm) ] ] --- # Key assumptions of logistic regression * Dependent variable must be binary * Observations must be independent (important for spatial analyses) * Predictors should not be collinear * Predictors should be linearly related to the log-odds * __Sample Size__ --- # Alternatives to logistic regression .pull-left[ * Classification and regression trees * Random Forests * Support-Vector Machines * Artifical Neural Nets * Lots of info in [Introduction to Statistical Learning](https://web.stanford.edu/~hastie/ISLR2/ISLRv2_website.pdf) ] .pull-right[ <figure> <img src="img/14/randomforest.png" alt="ZZZ" title="ZZZ" width="100%"> </figure> .caption[ From [Misra et al. 2020](https://www.sciencedirect.com/topics/engineering/random-forest) ] ] --- name: maxent class: center middle main-title section-title-4 # Modelling Presence-Background Data --- # The sampling situation .pull-left[ <figure> <img src="img/14/maxentresult.png" alt="ZZZ" title="ZZZ" width="100%"> </figure> .caption[ From [Lentz et al. 2008](https://www.journals.uchicago.edu/doi/full/10.1086/528754) ] ] .pull-right[ * Opportunistic collection of presences only * Hypothesized predictors of occurrence are measured (or extracted) at each presence * Background points (or pseudoabsences) generated for comparison ] --- # Maximum Entropy models .pull-left[ * Commonly referred to as MaxEnt (after the original software) * Relies on the generation of _plausible_ background points across the remainder of the study area * Iterative fitting to maximize the distance between predictions generated by a spatially uniform model * Tuning parameters to account for differences in sampling effort, placement of background points, etc * Development of the model beyond the scope of this course, but see [Elith et al. 2010](https://web.stanford.edu/~hastie/Papers/maxent_explained.pdf) ] .pull-right[ <figure> <img src="img/14/maxentschem.png" alt="ZZZ" title="ZZZ" width="100%"> </figure> .caption[ From [Elith et al. 2010](https://web.stanford.edu/~hastie/Papers/maxent_explained.pdf) ] ] --- # Challenges with MaxEnt * Not measuring _probability_, but relative likelihood of occurrence * Sampling bias affects estimation (but can be mitigated using tuning parameters) * Theoretical issues with background points and the intercept * Recent developments relate MaxEnt (with cloglog links) to Inhomogenous Point Process models --- name: occupancy class: center middle main-title section-title-4 # Modelling data when detection is imperfect --- # The sampling situation .pull-left[ * Random or systematic sample of the study region * The presence (or absence) of the event is recorded for each point * Hypothesized predictors of occurrence are measured (or extracted) at each point * Imperfect detection makes _absences_ __ambiguous__ * Repeated measurements at the same location can help (time vs. space) ] .pull-right[ <figure> <img src="img/14/birds.png" alt="ZZZ" title="ZZZ" width="100%"> </figure> ] --- # Occupancy models .pull-left[ * Mixture of both and ecological process and an observation process <figure> <img src="img/14/occ1.png" alt="ZZZ" title="ZZZ" width="100%"> </figure> <figure> <img src="img/14/occ2.png" alt="ZZZ" title="ZZZ" width="80%"> </figure> ] .pull-right[ <figure> <img src="img/14/williamsonocc.png" alt="ZZZ" title="ZZZ" width="100%"> </figure> .caption[ From [Williamson et al. 2021](https://conbio.onlinelibrary.wiley.com/doi/full/10.1111/cobi.13673) ] ] * __Predictor effects__ estimated conditional on detection probability --- # Implications of accounting for detection <figure> <img src="img/14/mappeddif.png" alt="ZZZ" title="ZZZ" width="60%"> </figure> .caption[ From [Williamson et al. 2021](https://conbio.onlinelibrary.wiley.com/doi/full/10.1111/cobi.13673) ] --- # Parting thoughts * This is a _very_ brief (and rushed) introduction, lots of literature and new developments * None of these models may be any good (we'll look at that on Thurs and Mon) * Bayesian extensions; applications with telemetry; changes through time