Scripts that combine historical emissions data records from several datasets like CEDS and GFED
to create complete historical emissions files
that are input to the IAM emissions harmonization algorithms in IAMconsortium/concordia
(regional harmonization and spatial gridding for ESMs)
and iiasa/climate-assessment
(global climate emulator workflow).
- prototype: the project is just starting up and the code is all prototype
We do all our environment management using pixi. To get started, you will need to make sure that pixi is installed (instructions here, we found that using the pixi provided script was best on a Mac).
To create the virtual environment, run
pixi install
pixi run pre-commit install
These steps are also captured in the Makefile
so if you want a single
command, you can instead simply run make virtual-enviroment
.
Having installed your virtual environment, you can now run commands in your virtual environment using
pixi run <command>
For example, to run Python within the virtual environment, run
pixi run python
As another example, to run a notebook server, run
pixi run jupyter lab
Some of our data is managed using git lfs. To install it, please follow the instructions here.
Then, before doing anything else, run
git lfs install
Once you have git lfs
installed, you can grab all the files we track with
git lfs fetch --all
To grab a specific file, use
git lfs pull --include="path/to/file"
# e.g.
git lfs pull --include="data/national/gfed/data_aux/iso_mask.nc"
For more info, see, for example, here.
Note that this repository focuses on processing data, and does not currently also (re)host input data files.
Files that need to be downloaded to make sure you can run the notebooks are specified in the relevant data
subfolders,
in README files, such as in \data\national\ceds\data_raw\README.txt
for the CEDS data download,
and in \data\national\gfed\data_raw\README.txt
for the GFED data download.
Data is processed by the jupyter notebooks (saved as .py scripts using jupytext, under the notebooks
folder).
The output paths are generally specified at the beginning of each notebook.
For instance, you find processed CEDS data at \data\national\ceds\processed
and processed GFED data at \data\national\gfed\processed
.
Install and run instructions are the same as the above (this is a simple repository, without tests etc. so there are no development-only dependencies).
If there is a dependency missing, you can add it with pixi.
Please only add dependencies with pixi,
as this ensures that all the other developers will get the same dependencies as you
(if you add dependencies directly with conda or pip,
then they are not added to the pixi.lock
file
so other developers will not realise they are needed!).
To add a conda dependency,
pixi add <dependency-name>
To add a PyPI/pip dependency,
pixi add --pypi <dependency-name>
The full documentation can be found here in case you have a more exotic use case.
These are the main processing scripts.
They are saved as plain .py
files using jupytext.
Jupytext will let you open the plain .py
files as Jupyter notebooks.
In general, you should run the notebooks in numerical order.
We do not have a comprehensive way of capturing the dependencies between notebooks implemented at this stage.
We try and make it so that notebooks in each YY**
series are independent
(i.e. you can run 02**
without running 01**
),
but we do not guarantee this.
Hence, if in doubt, run the notebooks in numerical order.
Overview of notebooks:
01**
: preparing input data forIAMconsortium/concordia
.02**
: preparing input data foriiasa/climate-assessment
.
We have a local package, emissions_harmonization_historical
,
that lives in src
, which we use to share general functions across the notebooks.
All data files should be saved in data
.
We divide data sources into national
i.e. those that are used for country-level data (e.g. CEDS, GFED)
and global
i.e. those that are used for global-level data (e.g. GCB).
Within each data source's folder, we use data_raw
for raw data.
Where raw data is not included, we include a README.txt
file which explains how to generate the data.
In this repository, we use the following tools:
- git for version-control (for more on version control, see
general principles: version control)
- for these purposes, git is a great version-control system so we don't complicate things any further. For an introduction to Git, see this introduction from Software Carpentry.
- Pixi for environment management
(for more on environment management, see
general principles: environment management)
- there are lots of environment management systems. Pixi works well in our experience and, for projects that need conda, it is the only solution we have tried that worked really well.
- we track the
pixi.lock
file so that the environment is completely reproducible on other machines or by other people (e.g. if you want a colleague to take a look at what you've done)
- pre-commit with some very basic settings to get some
easy wins in terms of maintenance, specifically:
- code formatting with ruff
- basic file checks (removing unneeded whitespace, not committing large files etc.)
- (for more thoughts on the usefulness of pre-commit, see general principles: automation
- track your notebooks using
jupytext
(for more thoughts on the usefulness of Jupytext, see
tips and tricks: Jupytext)
- this avoids nasty merge conflicts and incomprehensible diffs
This project was generated from this template: basic python repository. copier is used to manage and distribute this template.