analytics-demo

Maxim Jaffe's geospatial analytics demonstration

Introduction

This project showscases my data geospatial analytic skils with a case study species.

The case study is the great xenops Megaxenops parnaguae a typical furnariid bird of the Brazilian Caatinga (Wikipedia, BirdLife Factsheet).

Image source: Wikimedia (João Quental CC BY 2.0)

The project creates a Species Distribution Model (SDM) for the case study species. It uses a prototype tool MAXDM (Maxim's Species Distribution Models) specifically coded for this demonstration.

MAXDM SDMs predict patterns based on environmental variable similarity to occurence sites. It implements a geometric median similarity (GMS) and a k nearest neighbours similarity (KNNS) method.

These similarity methods are applicable to presence-only data and are relatively straightforward to calculate and reason about.

To better understand the author's choices for the project see justifications.

Example of map for a technical report:

Tasks

Summary of tasks and tools:

Setup project (folders, scripts, packages, GRASS GIS): Makefile, bash
Download base data (WorldClim, 'Natural Earth', GBIF): wget, bash
Process data: bash, GRASS GIS
Fit and apply model: python (MAXDM), GRASS GIS python API (grassscript)
Visualise model results (PNG map): bash, GRASS GIS

Setup

Current setup is for Linux Mint 20.3.

In project root run make in command-line. For specific tasks run:

make setup
make download
make process
make model
make visualise

To list subtasks make summary, for further details read Makefile.

Look in scripts folder for specific bash or python scripts, these have similar names to those defined in the Makefile.

External data is downloaded into data/external folder. Internal data is stored in data/internal folder, including GRASS GIS data.

Generated maps are saved into maps folder.

Most scripts are in bash as it integrates well with GRASS GIS. Python is used for complex components.

If you get any warnings due to GRASS GIS environment run GUI with the following inputs at startup:

Database directory: analytics-demo/data/internal/grassdata/
Location: WGS_84
Mapset: FFI

The entire setup can be cleaned up as follows:

make clean : removes data folder for clean data setup (still keeps installed packages)
make clean-grass: removes grassdata folder for clean GRASS GIS setup

Dependencies

GRASS GIS 7.8
python 3.8
pandas 0.25
xarray 0.16
scikit-learn 0.22
bash 5.0
wget 1.20
gawk 1:5.0.1

Porting

Porting to other operating systems should be possible:

Linux distributions:
- Adapt setup-packages in Makefile to use OS package manager (apt, yum, etc.)
Mac OS:
- Adapt setup-packages in Makefile to use MacPorts or other ports/package manager
Windows:
- install POSIX compliant subsystem/runtime (WSL, cygwin)
- adapt setup-packages in Makefile to use subsystem package manager

Project justification

Species choice

I choose the great xenops as I have a great interest in the Caatinga seasonally dry tropical forest and ornithology. This species is interesting as it is closely associated with both dense Caatinga, while tolerating degraded Caatinga. It is also an iconic Caatinga species.

Data choice

I choose data sources that have worldwide application to demonstrate how the project could be adapted for other target species/taxa. Worldclim 2.5 minutes data was selected as compromise between resolution and download time.

Geospatial analysis framework

This project uses GRASS GIS, the Python ecosystem, bash, make and other Linux/UNIX commands (e.g. wget, awk) for geospatial analysis. It is completely based on open source software and tools.

GRASS GIS

GRASS GIS is particularly apt for dealing with raster data which is common in SDMs. It has good integration with with python and bash, which makes it particularly suited for automated and reproducible data analysis.

It also provides a good user interface that is useful for interactive data analysis, for protyping batch analysis, and for veryfying batch analysis results

GRASS GIS provides a more robust, homogenous, and well integrated geospatial analysis experience when compared to using exclusively python ecosystem packages (e.g. fiona, geopandas, rasterio, xarray, cartopy, etc.). A similar argument can be made for R. Nevertheless it can integrate well with

GRASS GIS is also open source, which makes it particularly well suited for used in resource-constrained environments (conservation projects in the Global South)

Python

Python is particularly useful due to the following packages:

numerical computation (numpy, scipy, xarray)
data processing (pandas, numpy)
machine learning and statistical modelling (scikit-learn, etc.)

make

A make is a useful tool for organise data analysis pipelines as it allows to define different task and data dependencies using a Makefile.

This is more flexible then a 'task' script since specific tasks can easily run. When dependencies are met (downloaded data files) this also avoids repeating work.

Other unix tools

wget: easy to use tool for downloading data
awk: useful language for text/csv processing

MAXDM protoype

I prototyped MAXDM to demonstrate my ability to develop tools/models, in this case using a flexible package (scikit-learn) with off-the-shelf components. This similarity/distance based approach was selected as it could be implemented in a short period of time (2-3 days).

Note that in previous positions I have worked heavily with the following kinds of modelling techniques / tools:

Generalised Linear Models (GLMs) based on abundance monitoring data (using statsmodels and scikit-learn).
Hybrid ecological models linking GLMs to land use / land cover dynamic models agent-based / system dynamics models (using NetLogo and Stella)

Data sources

GBIF
- Megaxenops parnaguae Reiser, 1905 occurences with coordinates (presence-only)
WorldCLim 2.1 historical climate data 2.5 minutes resolution
- Bioclimatic variables
- Elevation
Natural Earth 1:10m
- Cultural Vectors: Admin 1 – States, Provinces

References

GBIF.org (15 April 2022) GBIF Occurrence Download https://doi.org/10.15468/dl.mcet5w
Fick, S.E. and R.J. Hijmans, 2017. WorldClim 2: new 1km spatial resolution climate surfaces for global land areas. International Journal of Climatology 37 (12): 4302-4315.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
maps		maps
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

analytics-demo

Introduction

Tasks

Setup

Dependencies

Porting

Project justification

Species choice

Data choice

Geospatial analysis framework

GRASS GIS

Python

make

Other unix tools

MAXDM protoype

Data sources

References

About

Releases

Packages

Contributors 2

Languages

License

quipa/analytics-demo

Folders and files

Latest commit

History

Repository files navigation

analytics-demo

Introduction

Tasks

Setup

Dependencies

Porting

Project justification

Species choice

Data choice

Geospatial analysis framework

GRASS GIS

Python

make

Other unix tools

MAXDM protoype

Data sources

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages