add country classifications #158

mabudz · 2024-11-07T15:56:24Z

we use coco already in our bonsai project. :-)
Since we use data from several data providers with their own country codes, we would like to extend the country_data.tsv by the following columns (classifications):

baci: http://www.cepii.fr/CEPII/en/bdd_modele/bdd_modele_item.asp?id=37
prodcom
undata_energy: https://unstats.un.org/unsd/energystats/api/
unido_indstat: https://stat.unido.org/portal/dataset/getDataset/COUNTRY_PROFILE
bonsai (in development)
exiohybrid4
eurostat: https://ec.europa.eu/eurostat/web/metadata

The mappings to ISO2 are already in another format in one of our repos.

If you agree, we would open a branch to implement these classifications.

konstantinstadler · 2024-11-08T13:21:40Z

Hi,
Great, please just go ahead.
For a new classification, your really just need to:

add a new column in the data file (.tsv)
add the classification with some source link to the readme at https://github.com/IndEcol/country_converter/tree/master?tab=readme-ov-file#classification-schemes
add a test case to check if the new classification is present and gives country in the format you expect. Put these tests in
/test_functionality.py (see test_IOC for a minimal example). These might seem trivial, but we had cases where column shifted by one or disappeared and these tests would catch this kind of issues

Just one question: a lot of these data seems to be based on UN numeric and/or ISO2 - these are already included. Please make sure to not accidentally add entries which are already in there with another name. If you rather need synonyms you can define them in "_validate_input_para" in the main file. Again, please add tests if you do

mabudz · 2024-11-08T16:12:16Z

Thanks a lot.

Indeed many of the codes are based on existing classifications. E.g. "unido_indstat" and "baci" use ISOnumeric such as "051" for Armenia.
In this case we should not add these codes as an additional column, but just add it to "_validate_input_para" in the following manner? Although the code is "051" and not "51".

        alt_valid_names = {
            "ISOnumeric": ["isocode", "unido_indstat", "baci"],

Another example is "eurostat" which is based on multiple in coco existing classifications:

"ISO2",
"EU27_2007"
something like "EU27_2020" (which is not yet in coco; and could be added)
subregions ISO2 codes e.g "BE234" for Gent in Belgium , which are not countries. (probably not to be included in coco?)

Other classifications such as "prodcom", which uses the codes from Geonomenclature (GEONOM) (I guess another name for "prodcom" is more appropriate) or "hybridexiobase4" would be new classifications for coco.

konstantinstadler · 2024-11-12T10:18:38Z

Regarding EU, there this EU27 which is the "official" name for the new one. We have a section about that in the readme:

The situation for the EU got complicated due to the Brexit process. For the naming, coco follows the Eurostat glossary, thus EU27 refers to the EU without UK, whereas EU27_2007 refers to the EU without Croatia (the status after the 2007 enlargement). The shortcut EU always links to the most recent classification. The EEA agreements for the UK ended by 2021-01-01 (which also affects Guernsey, Isle of Man, Jersey and Gibraltar). Switzerland is not part of the EEA but member of the single market.

Generally, I would like to avoid just adding columns by data provider if they actually explicitly saying they are using one of the exiting ones. UN will probably use UN numeric in most cases (there is just a question of comparing with int or str).

Hybridexio4 definetly make sense.

Subregions are tricky. It is a bit more complicated then just adding a row for the subregion. The regular expression probably stop to work or get exponentially more complicated. Also, the linking of subregions to countries is not trivial (disputed areas, different classifications across countries, etc). I think there we would very much push the limit of what is possbile with a simple table. This region/subregion seems to best be handled in some kind of graph database? I would guess somethign like this must exist already.

mabudz · 2024-11-13T12:33:35Z

Alright, I created a PR #159

Regarding the subregions issue, from our point of view, it can be postponed. So the PR does not address it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add country classifications #158

add country classifications #158

mabudz commented Nov 7, 2024

konstantinstadler commented Nov 8, 2024

mabudz commented Nov 8, 2024 •

edited

Loading

konstantinstadler commented Nov 12, 2024

mabudz commented Nov 13, 2024

add country classifications #158

add country classifications #158

Comments

mabudz commented Nov 7, 2024

konstantinstadler commented Nov 8, 2024

mabudz commented Nov 8, 2024 • edited Loading

konstantinstadler commented Nov 12, 2024

mabudz commented Nov 13, 2024

mabudz commented Nov 8, 2024 •

edited

Loading