Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

taxonomy: boolean added for european countries + ecobalyse id is adapted for organic ingredients #10929

Open
wants to merge 33 commits into
base: main
Choose a base branch
from

Conversation

very-smartin
Copy link
Collaborator

What

In order to get more Ecobalyse ids more precisely, a boolean for european countries has been added in the taxonomy.
Furthermore, the ecobalyse id en:ecobalyse_label_organic is retrieved for organic ingredients.

@very-smartin very-smartin requested a review from a team as a code owner October 24, 2024 08:06
@github-actions github-actions bot added 🧬 Taxonomies https://wiki.openfoodfacts.org/Global_taxonomies 🧪 tests 🥗🔍 Ingredients analysis https://wiki.openfoodfacts.org/Ingredients_Extraction_and_Analysis 🥗 Ingredients labels Oct 24, 2024
@very-smartin very-smartin enabled auto-merge (squash) October 24, 2024 08:07
@@ -3332,6 +3332,7 @@ country_code_3:en: AUT
language_codes:en: de
osm_relation:en: 16239
wikidata:en: Q40
ecobalyse_is_part_of_eu: 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for boolean, by convention in taxonomies we use properties like "vegan:en: yes"
can you replace "ecobalyse_is_part_of_eu: 1" by "is_part_of_eu:en: yes" ?

Unless Ecobalyse has specific countries it considers to be part of EU, let's make the property name is_part_of_eu:en:

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright i'll change this !

Indeed, Ecobalyse considers specific countries to be part of EU sometimes in order to get realistic LCA (as an example the UK is not anymore in the EU but for transport impact it's considered so) so i would recommand to keep 'ecobalyse_is_part_of_eu'.

@@ -3322,6 +3339,86 @@ sub get_missing_ciqual_codes ($ingredients_ref) {
return @ingredients_without_ciqual_codes;
}

=head2 get_missing_ecobalyse_ids ($ingredients_ref)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@very-smartin can you update your branch to merge the current main branch in it, so that we can see only the differences with main?

e.g. here is a test PR where I took the ecobalyse branch, merged the main branch (and resolved the conflict). It's much easier to see the new changes only: https://github.com/openfoodfacts/openfoodfacts-server/pull/10930/files

@github-actions github-actions bot added the 💥 Merge Conflicts 💥 Merge Conflicts label Oct 24, 2024
@very-smartin
Copy link
Collaborator Author

/update_tests_results

@github-actions github-actions bot removed the 💥 Merge Conflicts 💥 Merge Conflicts label Oct 24, 2024
@stephanegigandet
Copy link
Contributor

/lint

if (defined $ecobalyse_code) {
$ingredient_ref->{ecobalyse_code} = $ecobalyse_code;
# If the ingredient is organic, the 'ecobalyse_labels_en_organic:en' id is used
if ($ingredient_ref->{labels} =~ /\ben:organic\b/) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach won't work if we have an organic ingredient for which we have a general ecobalyse match, but no organic ecobalyse match.

e.g. if we have "organic pear", in the ingredients, we don't have it in Ecobalyse:

ecobalyse:en: pear-eu
ecobalyse_origins_en_france:en: pear-fr

but we need to fallback on pear-eu.

So I suggest a different approach:

Based on the labels, origins etc. of the product, construct a list of prioritized ecobalyse suffixes.

e.g. if the product is "organic pear from france", we might want to get:

  1. ecobalyse_origins_france_label_organic
  2. ecobalyse_origins_european-union_label_organic
  3. ecobalyse_label_organic
  4. ecobalyse_origins_france
  5. ecobalyse_origins_european-uinion
  6. ecobalyse

(or maybe a different order, depending on what we consider more important: the origin or the label)

Then we check if we have those as ecobalyse properties for "en:pear", and we stop at the first one we find.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point indeed, i'll change my code consequently

@very-smartin
Copy link
Collaborator Author

@stephanegigandet Note : for now i've only added the code in case of organic ingredients, the code for the origin will be done later

if ($ingredient_ref->{labels} =~ /\ben:organic\b/) {

# Retrieve the correct ecobalyse code
my $ecobalyse_code = get_inherited_property("ingredients", $ingredient_ref->{id}, "ecobalyse_labels_en_organic:en");
Copy link
Contributor

@stephanegigandet stephanegigandet Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's easier to build a list of the possible suffixes, because even if we just want to support organic, we need to check those:

  1. ecobalyse_labels_en_organic:en
  2. ecobalyse_proxy_labels_en_organic:en
  3. ecobalyse_en_organic:en
  4. ecobalyse_proxy_en_organic:en

Here your current code checks 1. and 4.

You could add more if clauses to cover 2 and 3, but it's going to become difficult to read with a lot of duplicate code.

Instead just do something like this (pseudo code):

my @suffixes = ();
if (label organic) : push @suffixes, "_labels_organic"
push @suffixes, ''; # no suffix when we don't have anything

then a foreach loop on suffixes:

foreach my $suffix (@suffixes) {

foreach my $proxy ("", "_proxy") {

my $ecobalyse_code = get_inherited_property("ingredients", $ingredients_ref->{id}, "ecobalyse" . $proxy . $suffix)
if defined $ecobalyse_code {

..
last;
}

˚

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed it's a much more better solution, i hadn't seen it like this : new pr incoming !

@very-smartin
Copy link
Collaborator Author

/lint

Copy link

sonarcloud bot commented Nov 12, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ecobalyse 🥗🔍 Ingredients analysis https://wiki.openfoodfacts.org/Ingredients_Extraction_and_Analysis 🥗 Ingredients 🧬 Taxonomies https://wiki.openfoodfacts.org/Global_taxonomies 🧪 tests
Projects
Status: In progress
Development

Successfully merging this pull request may close these issues.

2 participants