Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Feature Proposal: compute Kaya identity factors #875

Open
zacharyschmidt opened this issue Aug 8, 2024 · 6 comments
Open

New Feature Proposal: compute Kaya identity factors #875

zacharyschmidt opened this issue Aug 8, 2024 · 6 comments

Comments

@zacharyschmidt
Copy link

zacharyschmidt commented Aug 8, 2024

This feature would add methods to the IamDataFrame to compute Kaya identity factors according to the methodology described in Koomey et al 2019 and 2022.

KoomeyExploringBlackBox2022FINAL.pdf
SupplementalinformationKoomeyExploringBlackBox-FINAL.docx

InsidetheblackboxFINAL2019.pdf
AppendicesInsidetheBlackBox-v61.docx

Our idea is to add three methods to the public api of the compute module which return Kaya variables, Kaya factors, and an LMDI decomposition. Please let me know if the compute module is not the right place for this feature!

Kaya Variables
These are produced by simple transformations of the input data variables, mostly doing arithmetic with the emissions and CCS input variables to get the quantities we're interested in.

Kaya Factors
These are the terms of the Expanded Kaya identity, calculated from the Kaya variables.

LMDI Decomposition
The Log-Mean Divisia Index method attributes the a portion of the total change in emissions from the reference scenario to the intervention scenario to each Kaya Factor.

Below are example tests for the method to compute Kaya variables. I hope this is enough to get the discussion started. As I progress with the development I'll update this thread with questions that come up.

import pandas as pd
import input_variable_names 
import kaya_variable_names 
import pytest

from pyam import IamDataFrame
from pyam.testing import assert_iamframe_equal
from pyam.utils import IAMC_IDX

TEST_DF = IamDataFrame(
    pd.DataFrame(
        [
            [input_variable_names.POPULATION, "million", 1000],
            [input_variable_names.GDP_PPP, "billion USD_2005/yr", 6],
            [input_variable_names.GDP_MER, "billion USD_2005/yr", 5],
            [input_variable_names.FINAL_ENERGY, "EJ/yr", 8],
            [input_variable_names.PRIMARY_ENERGY, "EJ/yr", 10],
            [input_variable_names.PRIMARY_ENERGY_COAL, "EJ/yr", 5],
            [input_variable_names.PRIMARY_ENERGY_GAS, "EJ/yr", 2],
            [input_variable_names.PRIMARY_ENERGY_OIL, "EJ/yr", 2],
            [input_variable_names.EMISSIONS_CO2_FOSSIL_FUELS_AND_INDUSTRY, "Mt CO2/yr", 10],
            [input_variable_names.EMISSIONS_CO2_INDUSTRIAL_PROCESSES, "Mt CO2/yr", 1],
            [input_variable_names.EMISSIONS_CO2_CARBON_CAPTURE_AND_STORAGE, "Mt CO2/yr", 4],
            [input_variable_names.EMISSIONS_CO2_CARBON_CAPTURE_AND_STORAGE_BIOMASS, "Mt CO2/yr", 1],
            [input_variable_names.CCS_FOSSIL_ENERGY, "Mt CO2/yr", 2],
            [input_variable_names.CCS_FOSSIL_INDUSTRY, "Mt CO2/yr", 1],
            [input_variable_names.CCS_BIOMASS_ENERGY, "Mt CO2/yr", 0.5],
            [input_variable_names.CCS_BIOMASS_INDUSTRY, "Mt CO2/yr", 0.5],
        ],
        columns=["variable", "unit", 2010],
    ),
    model="model_a",
    scenario="scen_a",
    region="World", 
)

EXP_DF = IamDataFrame(
    pd.DataFrame(
        [   
            [kaya_variable_names.POPULATION, "billion", 1.0],
            [kaya_variable_names.GNP, "billion USD_2010/yr", 6.6],
            [kaya_variable_names.FINAL_ENERGY, "EJ/yr", 8.0],
            [kaya_variable_names.PRIMARY_ENERGY, "EJ/yr", 10.0],
            [kaya_variable_names.PRIMARY_ENERGY_FF, "EJ/yr", 9.0],
            [kaya_variable_names.TFC, "Mt CO2/yr", 12.0],
            [kaya_variable_names.NFC, "Mt CO2/yr", 10.0],
        ],
        columns=["variable", "unit", 2010],
    ),
    model="model_a",
    scenario="scen_a",
    region="World", 
)


@pytest.mark.parametrize("append", (False, True))
def test_kaya_variables(append):
    """Test computing kaya variables"""

    if append:
        obs = TEST_DF.copy()
        obs.compute.kaya_variables(scenarios=['scen_a'], append=True)
        assert_iamframe_equal(TEST_DF.append(EXP_DF), obs)
    else:
        obs = TEST_DF.compute.kaya_variables(scenarios=['scenario_a'])
        assert_iamframe_equal(EXP_DF, obs)


@pytest.mark.parametrize("append", (False, True))
def test_kaya_variables_empty_when_input_variables_missing(append):
    """Assert that computing kaya variables with missing input variables returns empty"""

    if append:
        obs = TEST_DF.copy()
        (obs.filter(variable=input_variable_names.POPULATION)  # select subset of required input variables
         .compute.kaya_variables(scenarios=['scen_a'], append=True)
        )
        assert_iamframe_equal(TEST_DF, obs)  # assert that no data was added
    else:
        obs = TEST_DF.compute.kaya_variables(scenarios=['scen_a'])
        assert obs.empty
@danielhuppmann
Copy link
Member

Thanks @zacharyschmidt for the proposal! I took the liberty of editing your issue-description to 1) add the model, scenario and region dimensions directly when initializing the IamDataFrame (not as data columns), and 2) format the code as python, both to improve readability. I'll follow up with more comments later.

@zacharyschmidt
Copy link
Author

Thanks @danielhuppmann! Glad you could take a first look at it.

@danielhuppmann
Copy link
Member

Now with a bit more time to think this through...

  1. I don't think it's a good idea to change the units associated with variables (looking at population) - better to keep the IAMC convention and do the conversion only in the methods (you can easily use convert_unit() to do that on the fly).
  2. For implementation, you can use
    df.aggregate("Primary Energy|Fossil", ["Primary Energy|Coal", "Primary Energy|Oil", "Primary Energy"])
    to do the aggregation.
  3. You can also do mathematical operations to do the computations, see this tutorial. Basically it works like
    df.<method>(a, b, c) => a <op> b = c
    where a, b and c are variables (or other dimensions if you use the dimension argument). And pyam will make sure that this works with multiple models/scenarios/regions in one go, and even keeping the units correct...
  4. You can use require_data() to check whether a scenario has all relevant information before even starting the processing...
  5. Making the variable names configurable is a nice feature, but I suggest to default to the common IAMC variables, see https://github.com/iamconsortium/common-definitions - all there except for TFC and NFC, which can quickly be added to the common-definitions repo.

@zacharyschmidt
Copy link
Author

Thanks for those recommendations! I am using all of them for the implementation.

For point 5 I have a few questions. My intention with the input_variable_names module was simply to avoid repeating string literals throughout the source code. I defined the variable names as constants so I can use autocomplete instead of copy/paste. Here's the input_variable_names module.

POPULATION = "Population"
GDP_MER = "GDP|MER"
GDP_PPP = "GDP|PPP"
FINAL_ENERGY = "Final Energy"
PRIMARY_ENERGY = "Primary Energy"
PRIMARY_ENERGY_FF = "Primary Energy (fossil fuels)"
PRIMARY_ENERGY_COAL = "Primary Energy|Coal"
PRIMARY_ENERGY_OIL = "Primary Energy|Oil"
PRIMARY_ENERGY_GAS = "Primary Energy|Gas"
EMISSIONS_CO2_INDUSTRIAL_PROCESSES = "Emissions|CO2|Industrial Processes"
EMISSIONS_CO2_CARBON_CAPTURE_AND_STORAGE = "Emissions|CO2|Carbon Capture and Storage"
EMISSIONS_CO2_CARBON_CAPTURE_AND_STORAGE_BIOMASS = "Emissions|CO2|Carbon Capture and Storage|Biomass"
EMISSIONS_CO2_FOSSIL_FUELS_AND_INDUSTRY = "Emissions|CO2|Fossil Fuels and Industry"
EMISSIONS_CO2_AFOLU = "Emissions|CO2|AFOLU"
CCS_FOSSIL_ENERGY = "Carbon Sequestration|CCS|Fossil|Energy"
CCS_FOSSIL_INDUSTRY = "Carbon Sequestration|CCS|Fossil|Industrial Processes"
CCS_BIOMASS_ENERGY = "Carbon Sequestration|CCS|Biomass|Energy"
CCS_BIOMASS_INDUSTRY = "Carbon Sequestration|CCS|Biomass|Industrial Processes"

I think that specific variable names are not used at all in the existing pyam source code (except for test data), so I don't have an example to look at. Let me know what's preferred in terms of defined constants vs direct use of strings and I'll follow that.

Also, thanks for pointing me to the common-definitions repo. I'll make a pull request there to add TFC and NFC.

@danielhuppmann
Copy link
Member

Right, we shouldn't hard-code anything in the actual source code - I only meant that the input_variables_names module should be consistent with common-definitions. The one conflict I see is "Primary Energy (fossil fuels)", which is usually "Primary Energy|Fossil" in IAM reporting.

@zacharyschmidt
Copy link
Author

Do you think I can assume that the common-definitions are standard and will be used for most model outputs in the future?

I want to make sure I'm expecting standard variable names in my input data. For example, a common variable for models from 4-5 years ago is "Emissions|CO2|Fossil Fuels and Industry". I can't find that in the common-definitions--I think the current equivalent would be "Emissions|CO2|Energy and Industrial Processes".

Do you think it's reasonable to require all input data contain only variables from common-definitions? Then if a user is working with older or non-standard model outputs they will be responsible for standardizing the input data before calling kaya functions.

zacharyschmidt added a commit to zacharyschmidt/pyam that referenced this issue Oct 30, 2024
…e module. Also add the kaya subdirectory that contains the implementation for the kaya methods. (IAMconsortium#875)
zacharyschmidt added a commit to zacharyschmidt/pyam that referenced this issue Oct 30, 2024
…e module. Also add the kaya subdirectory that contains the implementation for the kaya methods. (IAMconsortium#875)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants