You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This expands on comments by @glatterf42here and @giddenhere. We update the description if there are alternate proposals or to link to PRs.
Summary
The validity of outputs from MESSAGEix-GLOBIOM scenarios (produced with the current repo message-ix-models and various branches of message_data) depend on the particular GAMS implementation of MESSAGE (and/or MACRO) in the message_ix package.
Self-contained tests that are narrowly targeted to certain behaviours:
These tests use the suite of simplified models (Dantzig, Austria, Westeros) that are contained with the MESSAGE code.
These tests are run automatically.
PRs that touch the GAMS code expand or adjust these tests.
There are (still) some "nightly" tests that download and run MESSAGEix-GLOBIOM scenarios, and make certain checks against their outputs. These scenarios are by now quite old.
message-ix-models has a test suite with high coverage.
message_data branch dev has low test coverage, and other branches even lower.
For message_ix GAMS PRs, there is often a concern that the following could happen:
The message_ix test suite, including expanded/added tests for the specific changes in a PR, passes, but
message-ix-models or message_data tests are broken, or
Un-tested message_data or message-ix-model behaviour (the "particular outputs" mentioned in the first bullet) changes in a way that's not obvious.
Things to do
There are a variety of things to do about this.
Manual checks
This is more or less what we have done thus far.
The message_ix PR author(s) or reviewer(s) say: I think this PR may have consequences "further up the stack" / "downstream", and describe what those impacts could be.
Someone (manually) re-runs some code or (manual) workflows or steps and makes some (manual) checks to ensure there are no unexpected impacts. To be clear, this is done in an ad hoc way every time; there is no HOWTO for these: which branches/code to use, which command(s) to run, which checks to make.
Comments and discussion on the PR either determine (a) there is no impact, and the PR is good to merge, or (b) there are impacts, and the PR is adjusted.
Practices, e.g. pip freeze
iiasa/message_data#546 points to another option: using pip freeze. This is currently documented under one particular "Known issue" in the message_data Install instructions, here, but we could maybe move this to a more prominent location, like the “Reproducibility” page of the message-ix-models docs.
As a general rule, we could express it like this:
If:
A particular branch/workflow of message_data/message-ix-models is known to work with specific versions of other packages in the stack (message-ix-models, message_ix, ixmp, genno, pandas, any others) and
either:
There are no tests of that message_data/message-ix-models branch/workflow; or
The versions of dependencies are on branches other than main, not yet merged or associated with a PR; or
More recent versions of the dependencies are known not to work;
Then:
One or more requirements.txt file(s) should be created for that workflow/branch, recording the version(s) of upstream packages that are known to work; and
The meaning of “known to work”—i.e. expected features of the model outputs—should be explicitly documented.
This allows upstream improvements to go forward. Developers working on particular project or model variant code then have a few options:
Continue to use the ‘frozen’ versions recorded in the requirements.txt files they have created.
Check if their code works with newer versions of dependencies; update or remove the requirements.txt.
Add tests so that compatibility of their code and specific outputs can be automatically validated as dependencies update.
Semi-automated checks
For workflows with tests and/or a defined CLI entry-point, a workflow like transport.yaml can be established.
This workflow can be used in a semi-automated way within the “manual checks” process described above:
The message_ix PR author(s)/reviewer(s) must still identify: This change may impact 1 or more downstream workflow(s), in [specific ways].
They, or someone else, then follows some documented steps to trigger the CI workflow, including providing an input to force it to use message_ix (GAMS) code from the PR branch under review—rather than main or the released version.
The results of the workflow either directly include checks for validity, or there are further documented steps to inspect the outputs for signs of undesired impacts.
The text was updated successfully, but these errors were encountered:
A CI workflow that accepts "an input to force it to use message_ix code from the branch under review—rather than main"; thus also not "documented steps" to trigger it. Someone would need to develop these.
"Direct checks for validity" from that workflow, or "documented steps to inspect the outputs"
Hence my estimate of the time to gather the necessary info and implement these. I previously tried something similar (and spent similar time) with iiasa/message_data#411, but people did not have bandwidth to validate and review, so it was not merged.
Again, I agree it's important we move in this direction, but I think we shouldn't underestimate the work required, and until there's a concrete decision to prioritize that work we should choose pragmatic alternatives.
This expands on comments by @glatterf42 here and @gidden here. We update the description if there are alternate proposals or to link to PRs.
Summary
message-ix-models
and various branches ofmessage_data
) depend on the particular GAMS implementation of MESSAGE (and/or MACRO) in themessage_ix
package.message_ix
that modify the GAMS code, for instance Correct MACRO GDP reporting & update docs message_ix#430, Adjust calculation of PRICE_EMISSION message_ix#726, and others. Often these are to correct known bugs.message_ix
contains:message-ix-models
has a test suite with high coverage.message_data
branchdev
has low test coverage, and other branches even lower.message_ix
GAMS PRs, there is often a concern that the following could happen:message_ix
test suite, including expanded/added tests for the specific changes in a PR, passes, butmessage-ix-models
ormessage_data
tests are broken, ormessage_data
ormessage-ix-model
behaviour (the "particular outputs" mentioned in the first bullet) changes in a way that's not obvious.Things to do
There are a variety of things to do about this.
Manual checks
This is more or less what we have done thus far.
message_ix
PR author(s) or reviewer(s) say: I think this PR may have consequences "further up the stack" / "downstream", and describe what those impacts could be.Practices, e.g.
pip freeze
pip freeze
. This is currently documented under one particular "Known issue" in themessage_data
Install instructions, here, but we could maybe move this to a more prominent location, like the “Reproducibility” page of themessage-ix-models
docs.message_data
/message-ix-models
is known to work with specific versions of other packages in the stack (message-ix-models
,message_ix
,ixmp
,genno
,pandas
, any others) andmessage_data
/message-ix-models
branch/workflow; ormain
, not yet merged or associated with a PR; orrequirements.txt
file(s) should be created for that workflow/branch, recording the version(s) of upstream packages that are known to work; andSemi-automated checks
message_ix
PR author(s)/reviewer(s) must still identify: This change may impact 1 or more downstream workflow(s), in [specific ways].message_ix
(GAMS) code from the PR branch under review—rather thanmain
or the released version.The text was updated successfully, but these errors were encountered: