GitHub - bcaradima/inv_sdm: Prepare data and apply Bayesian (joint hierarchical) generalized linear models to stream invertebrate communities in Switzerland

Background

This repository provides a complete set of requisite R and Python scripts that were used to produce results for two papers (aka, paper 1 and 2) published based on findings from the Human Impacts project at Eawag:

Caradima, B., Schuwirth, N., & Reichert, P. (2019). From individual to joint species distribution models: A comparison of model complexity and predictive performance. Journal of Biogeography, 46(10), 2260-2274. doi:10.1111/jbi.13668, Institutional Repository.
Caradima, B., Reichert, P., & Schuwirth, N. (2020). Effects of site selection and taxonomic resolution on the inference of stream invertebrate responses to environmental conditions. Freshwater Science, 39(3), 415-432. doi:10.1086/709024, Institutional Repository.

These published papers (along with others) are also included as chapters in the following doctoral research thesis:

Caradima, M. B. (2020). Statistical species distribution modelling of stream invertebrate and fish communities (Doctoral dissertation, ETH Zurich). Institutional Repository.

Note that these scripts are part of a wider RStudio project and R workspace referred to as model_dev in the documentation below. model_dev includes input data, intermediary and final outputs, documentation, and additional files. I have uploaded these scripts and readme for others to review.

Workspace Organization

model_dev includes the following folders:

/doc contains documentation needed for frequent reference
/scripts contains all scripts for supporting the entire model development process
- /python Python scripts for spatial data processing using ArcPy library in ArcGIS 10.1 SP1 with 64-bit geoprocessing installed. Input data is not included for these due to disk space limitations. Scripts were used to develop predictor variables (e.g., F.EDO, FRI).
/inputs contains all input data read by model_dev scripts (usually new pre-processed inputs were copied from Human Impacts folder on server and periodically updated based on date in filename)
/outputs contains model results and processed results (e.g., plots) that are manually copied to the respective paper folder:
- /sampled observations contains intermediate outputs used as observational data for the models: randomly sampled (with set.seed) stream invertebrate observations (i.e., of entire dataset and k-fold datasets) for all paper submissions as CSV files
- effective_parameters example workspaces for each model in Caradima et al. (2019) that can be used in calculating effective number of parameters (see Appendix for paper 1 and script test_pD.R)
- /paper 1 accepted results for “paper 1 accepted” paper
- /paper 2 accepted results for “paper 2 accepted” paper

Script Organization

Separate script files in /scripts are created for distinct steps in the model development process. First, input data is imported by 1-setup.R and observational and input data ready for modelling is prepared by 3-inputs.R. Second, user-defined custom functions are sourced from 2-functions.R and organized using comments. Exploratory and final results are generated in 4-results_p1.R and 5-results_P2.R. To generate these results, the setup script must be run first (to read data, and source the inputs and functions scripts) before turning to a results_# script.

Workflow

The scripts are run as follows for both papers:

1-setup.R:

source function.R to define all functions
install/load required packages with custom ipak() function
read all necessary /input datasets, with minimal or no processing

2-functions.R:

contains commonly used functions with full comments defining their usage
key functions: prepare.inputs, run.isdm, deploy.jsdm, cv.isdm, extract.jsdm, cv.jsdm, cv.jsdm.sample (see script for description and code for each function)
note that custom functions are used extensively in generating results, but that some functions are found in the results scripts, often because they have to be modified slightly before being called

3-inputs.R:

pre-process and calculate all potential influence factors for subsequent preparation for model inputs
prepares and samples observation data (for BDM, CFCH, and CFp datasets) for both papers

variable-selection.R:

given $p$ potential explanatory variables and $n$ sampled observations, performs an exhaustive search of individual models with $p$ parameters (e.g., 4-10 parameters) by doing 3-fold cross-validation for each individual model and taxon in the community
computes predictive performance for each model for individual taxa
excludes models with collinear variables from validation process
stores only minimum statistics required for model selection in /outputs
see paper 1 for detailed methodology

model_selection.R:

no longer used; most of its outputs are incorporated in the results_ scripts for specific papers
post-processing of output variable selection workspaces for final model selection
produces plots of mean predictive performance for community for each model

jsdm_inference.R:

prepares community and input data and calls Stan for Bayesian inference according to the hierarchical multi-species and joint models defined in Caradima et al. 2019
includes Stan definitions of hierarchical and joint generalized linear models
initial settings in script determine Stan code construction and final name of workspace file (this is important later on for processing results; see how the identify.jsdm() function works in the functions script)
identify.jsdm() is later called within extract.jsdm(), cv.jsdm(), cv.jsdm.sample().

4-results_p1.R:

generates results of paper 1 accepted, with comments (e.g., # Figure 1 ####) indicating code corresponding to the paper 1 submission 2 figures

5-results_p2.R:

generates results of paper 2 accepted, with comments indicating code corresponding to the paper 2 submission 1 figures

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
scripts		scripts
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Background

Workspace Organization

Script Organization

Workflow

About

Releases

Packages

Languages

bcaradima/inv_sdm

Folders and files

Latest commit

History

Repository files navigation

Background

Workspace Organization

Script Organization

Workflow

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages