Skip to content

Commit

Permalink
Add an extra vignette
Browse files Browse the repository at this point in the history
  • Loading branch information
Lei Song committed Dec 1, 2022
1 parent c529a40 commit a052fa2
Show file tree
Hide file tree
Showing 36 changed files with 595 additions and 3 deletions.
8 changes: 8 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,11 @@
# itsdm 0.2.0

- Convert Shapley values-based functions to usable by external models (as described in issue # 3), and add examples in function documentation and vignettes to show users how to use these functions.
- Add a function `detect_envi_change` to use Shapley values technique to analyze the potential impacts of changing environmental variables in space.
- Modify function `isotree_po` to take presence-absence dataset as well (as described in issue #7). To make this happen smoothly, another function `format_observation` is created to help the users to convert their data to fit into `itsdm` workflow.
- Reorganized reference section of the online documentation to make it user-friendly.
- Fix a few bugs in the functions.

# itsdm 0.1.3

- Fix a bug in function `print.VariableAnalysis` mentioned in issue #2: if any value is negative then it would fail.
Expand Down
2 changes: 1 addition & 1 deletion R/variable_contrib.R
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ variable_contrib <- function(model,
out <- explain(model, X = var_occ,
nsim = shap_nsim,
newdata = var_occ_analysis,
pred_wrapper = .pfun_shap)
pred_wrapper = pfun)
out <- list(shapley_values = out,
feature_values = var_occ_analysis)
class(out) <- append("VariableContribution", class(out))
Expand Down
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,13 @@
- A few functions to download environmental variables.
- Outlier tree-based suspicious environmental outliers detection.
- Isolation forest-based environmental suitability modeling.
- Response curves of environmental variable.
- Non-spatial response curves of environmental variables.
- Spatial response maps of environmental variables.
- Variable importance analysis.
- Presence-only model evaluation.
- Method to convert predicted suitability to presence-absence map.
- Variable contribution analysis for the target observations.
- Method to analyze the spatial impacts of changing environment.

## Installation

Expand Down Expand Up @@ -122,7 +124,7 @@ pfun <- function(X.model, newdata) {
}
# Use a fixed value
climate_changes <- detect_climate_change(
climate_changes <- detect_envi_change(
model = mod_rf,
var_occ = model_data %>% select(-occ),
variables = env_vars,
Expand Down
Binary file modified vignettes/intro-evaluation-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified vignettes/intro-independent_responses-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified vignettes/intro-marginal_responses-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified vignettes/intro-pa-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified vignettes/intro-prediction-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified vignettes/intro-raw_suit-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified vignettes/intro-remove_outliers-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified vignettes/intro-spatial_response-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified vignettes/intro-var_contrib-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified vignettes/intro-var_contrib_plot-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified vignettes/intro-var_inter_dependence-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified vignettes/intro-variable_analysis-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified vignettes/intro-variable_dependence-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified vignettes/intro-virtualspecies-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified vignettes/itsdm-eval-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified vignettes/itsdm-independent_responses-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified vignettes/itsdm-marginal_responses-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified vignettes/itsdm-outliers-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified vignettes/itsdm-pa-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified vignettes/itsdm-prediction-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified vignettes/itsdm-var_contrib-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified vignettes/itsdm-var_contrib_general-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified vignettes/itsdm-var_inter_dependence-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified vignettes/itsdm-variable_dependence-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added vignettes/shap-chanEnv-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added vignettes/shap-envCtr-1.png
Binary file added vignettes/shap-envCtr-2.png
Binary file added vignettes/shap-predSuit-1.png
Binary file added vignettes/shap-rspCurve-1.png
Binary file added vignettes/shap-rspCurve-2.png
Binary file added vignettes/shap-rspMap-1.png
302 changes: 302 additions & 0 deletions vignettes/shap_application.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,302 @@
---
title: "Applications of Shapley values on SDM explanation"
subtitle: "with an example of Random Forest model"
author: "Lei Song"
date: "2022-12-01"
output:
rmarkdown::html_document:
theme: readable
vignette: >
%\VignetteIndexEntry{Applications of Shapley values on SDM explanation}
%\VignetteEngine{knitr::rmarkdown_notangle}
%\VignetteEncoding{UTF-8}
---



## Introduction

In `itsdm`, Shapley values-based functions can be used both by internal model iForest and external models which is fitted outside of `itsdm`. These functions can analyze spatial and non-spatial variable responses, contributions of environmental variables to any observations or predictions, and potential areas that will be affected by changing variables.

In this vignette, we show how an external model can be used for these functions with an example of Random forest (RF) model on Baobab trees of Madagascar.

## Load libraries


```r
# Load libraries
library(itsdm)
library(dplyr)
library(stars)
library(virtualspecies)
library(dismo)
library(randomForest)
library(ggplot2)
library(ggpubr)
library(rnaturalearth)
library(rgbif)
library(lubridate)
select <- dplyr::select
```

## Baobab trees of Madagascar


```r
# Set study area, Madagascar
study_area <- ne_countries(
scale = 10,
continent = 'africa', returnclass = 'sf') %>%
filter(admin == 'Madagascar') %>%
select()

# Get training data
## Search via GBIF
occ <- occ_search(
scientificName = "Adansonia za Baill.",
hasCoordinate = TRUE,
limit = 200000,
hasGeospatialIssue = FALSE) %>%
`[[`("data") %>%
select(decimalLongitude, decimalLatitude)

## Clean the occurrences spatially
occ <- occ %>%
st_as_sf(coords = c('decimalLongitude', 'decimalLatitude'),
crs = 4326)
occ <- st_intersection(study_area, occ)
```

## Environmmental variables


```r
# Get environmental variables for current and future
bios_current <- worldclim2(
var = 'bio', res = 2.5,
bry = study_area,
path = tempdir(),
nm_mark = 'africa') %>%
st_normalize()

# Remove highly correlated variables
bios_current <- dim_reduce(
bios_current,
threshold = 0.7,
preferred_vars = c(paste0("bio", c(1:3, 13))))
bios_current <- bios_current$img_reduced

# Query the future variables
bios_future <- future_worldclim2(
var = 'bioc', res = 2.5,
bry = study_area,
interval = "2041-2060",
path = tempdir(),
nm_mark = 'sa') %>%
st_set_dimensions("band", values = paste0("bio", 1:19)) %>%
dplyr::slice("band", st_get_dimension_values(bios_current, "band")) %>%
st_normalize()
```

## Make training samples


```r
## Spatial deduction
template <- bios_current %>%
dplyr::slice("band", 1) %>%
mutate(reduced_image = NA)
occ <- st_rasterize(
occ, template) %>%
st_xy2sfc(as_points = T) %>% st_as_sf() %>%
select(geometry)
rm(template)

## Extract environmental values
training <- st_extract(
bios_current %>% split("band"), occ) %>%
st_drop_geometry() %>%
mutate(occ = 1)

## Get background values
set.seed(124)
background <- randomPoints(
as(bios_current %>% dplyr::slice("band", 1), "Raster"), 1000)
background <- st_extract(bios_current, background) %>%
as.data.frame() %>% na.omit() %>%
mutate(occ = 0)
names(background) <- c(st_get_dimension_values(bios_current, "band"), "occ")

# Put them together
training <- rbind(training, background) %>%
na.omit() %>%
select(c("occ", st_get_dimension_values(bios_current, "band")))
```

## Fit the model


```r
# Convert independent to factor for RF.
training$occ <- as.factor(training$occ)

# Calculate class frequency
prNum <- as.numeric(table(training$occ)["1"]) # number of presences
bgNum <- as.numeric(table(training$occ)["0"]) # number of backgrounds
samsize <- c("0" = prNum, "1" = prNum)

# Fit the down-sampling RF
set.seed(123)
mod_rf <- randomForest(
occ ~ .,
data = training,
ntree = 1000,
sampsize = samsize,
replace = TRUE)
```

## Make the predictions under current and future environment


```r
# Reformat the variables
bios_current <- bios_current %>% split("band")
bios_future <- bios_future %>% split("band")

# Suitability under current and future conditions
suit_current <- predict(bios_current, mod_rf, type = "prob")["1"]
suit_future <- predict(bios_future, mod_rf, type = "prob")["1"]

# Plot them
preds <- c(suit_current, suit_future)
names(preds) <- c("Current", "Future")

ggplot() +
geom_stars(data = preds %>% merge(name = "band"),
na.action = na.omit) +
scale_fill_viridis_c("Suitability") +
facet_wrap(~band) +
coord_equal() +
theme_void() +
theme(strip.text.x = element_text(size = 12))
```

<img src="shap-predSuit-1.png" alt="plot of chunk predSuit" style="display: block; margin: auto;" />

## Environmental response curves

### Preciction wrapper function

This is probably the most important argument to set in order to get proper result. Here is the example for Random Forest SDM used in this vignette:


```r
## Define the wrapper function for RF
## This is extremely important to get right results
pfun <- function(X.model, newdata) {
# for data.frame
predict(X.model, newdata, type = "prob")[, "1"]
}
```

As we could see, the wrapper function has to have at least two arguments: model object and the newdata. Then the function body has to make the proper prediction on the newdata. For instance, we have to set `type = "prob"` to let RF make probabilities and we have to subset the result to make it give us the probabilities of being presence.


```r
# Make the response curves
respones <- shap_dependence(
model = mod_rf,
var_occ = training[, 2:ncol(training)],
variables = bios_current,
pfun = pfun)

# Check bio13, Precipitation of Wettest Month, for example
plot(respones, target_var = "bio13")
```

<img src="shap-rspCurve-1.png" alt="plot of chunk rspCurve" style="display: block; margin: auto;" />

```r

# Check relationship between bio13 and bio2 for example
# These plots can be extended as they are ggplot2 plot, like this:
plot(respones, target_var = "bio13",
related_var = "bio2", smooth_line = FALSE) +
theme_bw() +
theme(text = element_text(size = 16))
```

<img src="shap-rspCurve-2.png" alt="plot of chunk rspCurve" style="display: block; margin: auto;" />

## Environmental response maps


```r
rsp_maps <- shap_spatial_response(
model = mod_rf,
var_occ = training[, 2:ncol(training)],
variables = bios_current,
pfun = pfun)

# Check the response map of bio13, for example
plot(rsp_maps, target_var = "bio13")
```

<img src="shap-rspMap-1.png" alt="plot of chunk rspMap" style="display: block; margin: auto;" />

## Analyze environmetnal contribution of observations


```r
# Take some observations for example
set.seed(124)
occ_to_check <- randomPoints(
as(bios_current %>% select("bio1"), "Raster"), 4)
vars_to_check <- st_extract(bios_current, occ_to_check) %>%
as.data.frame()

# Do the calculation
var_ctris <- variable_contrib(
model = mod_rf,
var_occ = training[, 2:ncol(training)],
var_occ_analysis = vars_to_check,
pfun = pfun)

# Check it
## Spatial locations
ggplot() +
geom_sf(data = study_area, fill = "transparent", color = "black",
linewidth = 0.8) +
geom_sf(data = st_as_sf(data.frame(occ_to_check),
coords = c("x", "y"), crs = 4326),
color = "blue") + theme_void()
```

<img src="shap-envCtr-1.png" alt="plot of chunk envCtr" style="display: block; margin: auto;" />

```r

# The contributions of variables to each observation
plot(var_ctris, plot_each_obs = TRUE, num_features = 6)
```

<img src="shap-envCtr-2.png" alt="plot of chunk envCtr" style="display: block; margin: auto;" />

## Affects of changing environment


```r
bio13_changes <- detect_envi_change(
model = mod_rf,
var_occ = training[, 2:ncol(training)],
variables = bios_current,
target_var = "bio13",
variables_future = bios_future,
pfun = pfun)

# Check the result
plot(bio13_changes)
```

<img src="shap-chanEnv-1.png" alt="plot of chunk chanEnv" style="display: block; margin: auto;" />

Loading

0 comments on commit a052fa2

Please sign in to comment.