The 2min video at this link gives an overview of the idea behind the project, the Machine Learning and cloud computing applications, and the relevance for public policy.
It can be found in this pdf
Files are big for Github. In these links, you can download the databases and shapefiles from Google Drive.
You can find the code in this folder. The structure is the following:
i) 1_transition_mat: Computes the transition matrices using Dask
ii) 2_Model_EMR_: Computes the model using PySpark in AWS EMR cluster. Alternatively, the 2_Model.ipynb does it using local nodes.
iii) 3_small_area_estimation: Computes the transition matrices
The original idea for this project comes from my paper with J.Herrera (2016). This project was explored further by the Peruvian Census Bureau where I serve as a Technical Advisor. The vulnerability map is now a national indicator used for public policy. The official report can be found here