Skip to content

Commit

Permalink
taxonomy: synonyms, stop-words, extended json exports (#10951)
Browse files Browse the repository at this point in the history
### Taxonomies improvements

* Include {tag}.extended.json files in exported taxonomies (#10742)
* Some more synonym and stop-words in Italian translations
* Align github PR script to fetch origin/main (for linting of
taxonomies)
  • Loading branch information
McCio authored Oct 30, 2024
1 parent 2ae1768 commit 36516d8
Show file tree
Hide file tree
Showing 10 changed files with 102 additions and 36 deletions.
31 changes: 25 additions & 6 deletions .github/workflows/pull_request.yml
Original file line number Diff line number Diff line change
Expand Up @@ -64,12 +64,18 @@ jobs:
uses: ishworkh/docker-image-artifact-upload@v1
with:
image: "openfoodfacts-server/backend:dev"
- name: Rebuild taxonomies
- name: Fetch origin/main
# we need origin/main to have comparison linting work
run: |
git remote set-branches --add origin main
git fetch --no-tags --prune --progress --no-recurse-submodules --depth=5 origin main
- name: Restore taxonomies dates
# here we first restore dates from git for taxonomies to avoid build them all
# see https://stackoverflow.com/a/60984318/2886726
run: |
git ls-files taxonomies/ | xargs -I{} git log -1 --date=format:%Y%m%d%H%M.%S --format='touch -t %ad "{}"' "{}" | bash
make build_taxonomies GITHUB_TOKEN="${{ secrets.TAXONOMY_CACHE_GITHUB_TOKEN }}"
- name: Rebuild taxonomies
run: make build_taxonomies GITHUB_TOKEN="${{ secrets.TAXONOMY_CACHE_GITHUB_TOKEN }}"

check_perl:
name: 🐪 Check Perl
Expand All @@ -80,17 +86,20 @@ jobs:
- uses: actions/checkout@v4
with:
fetch-depth: 1
# we need origin/main to have comparison linting work !
- name: Fetch origin/main
# we need origin/main to have comparison linting work
run: |
git remote set-branches --add origin main
git fetch origin
git fetch --no-tags --prune --progress --no-recurse-submodules --depth=5 origin main
- name: Restore taxonomies dates
# here we first restore dates from git for taxonomies to avoid build them all
# see https://stackoverflow.com/a/60984318/2886726
run: |
git ls-files taxonomies/ | xargs -I{} git log -1 --date=format:%Y%m%d%H%M.%S --format='touch -t %ad "{}"' "{}" | bash
- name: Download backend image from artifacts
uses: ishworkh/docker-image-artifact-download@v1
with:
image: "openfoodfacts-server/backend:dev"
- name: ensure branch origin/main is fetched
run: git fetch --no-tags --prune --progress --no-recurse-submodules --depth=5 origin main
- name: build taxonomies (should use cache)
run: make build_taxonomies GITHUB_TOKEN="${{ secrets.TAXONOMY_CACHE_GITHUB_TOKEN }}"
- name: check taxonomies
Expand All @@ -110,6 +119,16 @@ jobs:
- uses: actions/checkout@v4
with:
fetch-depth: 1
- name: Fetch origin/main
# we need origin/main to have comparison linting work
run: |
git remote set-branches --add origin main
git fetch --no-tags --prune --progress --no-recurse-submodules --depth=5 origin main
- name: Restore taxonomies dates
# here we first restore dates from git for taxonomies to avoid build them all
# see https://stackoverflow.com/a/60984318/2886726
run: |
git ls-files taxonomies/ | xargs -I{} git log -1 --date=format:%Y%m%d%H%M.%S --format='touch -t %ad "{}"' "{}" | bash
- name: Download backend image from artifacts
uses: ishworkh/docker-image-artifact-download@v1
with:
Expand Down
4 changes: 4 additions & 0 deletions lib/ProductOpener/Tags.pm
Original file line number Diff line number Diff line change
Expand Up @@ -1139,6 +1139,9 @@ sub get_from_cache ($tagtype, @files) {
if ($got_from_cache) {
$got_from_cache = get_file_from_cache("$cache_prefix.full.json", "$tag_www_root.full.json");
}
if ($got_from_cache) {
$got_from_cache = get_file_from_cache("$cache_prefix.extended.json", "$tag_www_root.extended.json");
}
if ($got_from_cache) {
print "obtained taxonomy for $tagtype from " . ('', 'local', 'GitHub')[$got_from_cache] . " cache.\n";
$cache_prefix = '';
Expand Down Expand Up @@ -1187,6 +1190,7 @@ sub put_to_cache ($tagtype, $cache_prefix) {

put_file_to_cache("$tag_www_root.json", "$cache_prefix.json");
put_file_to_cache("$tag_www_root.full.json", "$cache_prefix.full.json");
put_file_to_cache("$tag_www_root.extended.json", "$cache_prefix.extended.json");
put_file_to_cache("$tag_data_root.result.txt", "$cache_prefix.result.txt");
put_file_to_cache("$tag_data_root.result.sto", "$cache_prefix.result.sto");
# note: we don't put errors to cache as it is a non sense, errors are to be fixed before
Expand Down
2 changes: 1 addition & 1 deletion taxonomies/data_quality.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3495,8 +3495,8 @@ en: low-fat label claim but fat above limitation
description:en: In EU, a claim that a food is low fat may only be made where the product contains no more than 3 g of fat per 100 g for solids or 1,5 g of fat per 100 ml for liquids. 1,8 g of fat per 100 ml for semi-skimmed milk. See: https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A02006R1924-20141213

< en:Data quality warnings
en: no fat label claim but fat above 0.5
description:en: In EU, a claim that a food is fat free may only be made where the product contains no more than 0,5 g of fat per 100 g or 100 ml. See: https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A02006R1924-20141213
en:en: no fat label claim but fat above 0.5

< en:Data quality warnings
en: low saturated-fat label claim but fat above limitation
Expand Down
36 changes: 21 additions & 15 deletions taxonomies/food/ingredients.txt
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,12 @@ synonyms:fr: complet, intégral

synonyms:fr: complète, intégrale

synonyms:it: derivati, derivato

synonyms:it: fermentato, fermentati

synonyms:it: lecitina, lecitine

synonyms:lt: ląsteliena, skaidulos

synonyms:lt: pilno grūdo, viso grūdo
Expand Down Expand Up @@ -146,7 +152,7 @@ stopwords:hu: tartalmaz, változó arányban, min, zsírtartalom, összetevő,
stopwords:id: mengandung
stopwords:is: úr
# example: COCCO in proporzione variabile
stopwords:it: contiene, nella, in proporzione variabile
stopwords:it: di, del, dal, contiene, nella, in proporzione variabile, prodotto, prodotti, il, la, le, i, gli
# Japanese stopwords are matched without word boundaries, do not put as stopwords characters or words that could be part of an actual ingredient entry
stopwords:ja: 等, その他
stopwords:lt: iš, su, su pridėtiniu, mažiausiai, įskaitant
Expand Down Expand Up @@ -501,7 +507,7 @@ he: לציטין סויה
hr: emulgator soja lecitin, emulgator sojin lecitin, soja lecitin, emulgatori sojin lecitin, sojin lecithin, sojin lecitin, sojini lecitini
hu: szójalecitin, szójalecitinek
is: sojalesitin
it: lecitina di soia, lecitine di soia
it: lecitina di soia
lt: sojos lecitinas, sojų lecitinai, sojų pupelių lecitinai, emulsikliai sojos lecitinas
nb: soyalecitin, soyalecitiner
nl: sojalecithine, soja lecithine, sojalecithinen
Expand Down Expand Up @@ -550,7 +556,7 @@ fi: auringonkukkalesitiini, auringonkukkalesitiiniä
fr: lécithine de tournesol
hr: suncokretov lecitin, E322 suncokretov lecitin, lecitin suncokreta
hu: napraforgó lecitin
it: lecitina girasole, lecitina di girasole, lecitine di girasole
it: lecitina di girasole
lt: saulėgrąžų lecitinas, saulėgrąžų lecitinai, emulsiklis lecitinai (saulėgrąžų), lecitinai iš saulėgrąžų
lv: lecitinai iš saulègrąžų, saulespuku lecitīns, saulespuķu lecitīns, saulespuķu lecitīni
nb: solsikkelecithin, solsikkelecitin
Expand All @@ -570,7 +576,7 @@ wikidata:en: Q57271747
< en:sunflower lecithin
en: non-gmo sunflower lecithin
hr: ne-gmo suncokretov lecitin
it: lecitina di girasole non ogm, lecitina di girasole non-ogm, lecitine di girasole non ogm, lecitine di girasole non-ogm
it: lecitina di girasole non ogm

# <en:emulsifier
< en:sunflower lecithin
Expand Down Expand Up @@ -605,7 +611,7 @@ fi: rapsilesitiini, rapsilesitiiniä, rypsilesitiini, rypsilesitiiniä
fr: lécithine de colza, lécithines de colza
hr: lecitin repica, lecitin uljane repice
hu: repce lecitin, lecitini (repica)
it: Lecitina di colza
it: lecitina di colza
lt: rapsų lecitinai, emulsiklis rapsų lecitinas
nb: rapslecitin, rapslecithin
nl: raapzaadlecithine
Expand Down Expand Up @@ -2136,7 +2142,7 @@ hu: tejtermék, tejtermékek
hy: Կաթնամթերք
id: Produk susu
is: Mjólkurafurð
it: Latticini, derivati del latte
it: Latticini, derivati del latte, latte e derivati
ja: 乳製品
kab: Ifarisen n yiɣi
kk: Сүт тағамдары
Expand Down Expand Up @@ -2188,7 +2194,7 @@ fr: produits laitiers fermentés, produits laitiers de culture, produits issus d
hr: fermentirani mliječni proizvodi
hu: fermentált tejtermékek, savanyított tejtermékek, tejkultúrák
hy: Կաթնաթթվային մթերք
it: prodotti lattici fermentati, derivati del latte fermentati, prodotti derivati del latte fermentati, prodotti derivati da latte fermentato, prodotto derivato da latte fermentato
it: latticini fermentati
ko: 발효유
lv: Skābpiena produkti
ru: Кисломолочные продукты
Expand All @@ -2209,7 +2215,7 @@ es: producto lácteo en polvo, producto a base de leche en polvo, productos lác
#fr:produits laitiers en poudre
hr: mliječni proizvodi u prahu
hu: tejpor alapú termékek
it: prodotti a base di latte in polvere, prodotto a base di latte in polvere, a base di latte in polvere
it: prodotti a base di latte in polvere, a base di latte in polvere
nova:en: 4

en: First age baby milk powder, First milk powder, First infant milk formula in powder
Expand Down Expand Up @@ -5302,7 +5308,7 @@ fi: heratuote, herapohjainen korvike
fr: produit de lactosérum, produit de petit-lait, produit à base de lactosérum, préparation à base de lactosérum, dérivé de lactosérum
hr: mliječni derivat
hu: tejsavó alapú termék
it: derivati del siero di latte, la preparazione del siero di latte
it: derivati del siero di latte, preparazione del siero di latte
nl: Wei
pl: produkt serwatkowy
sv: vassleprodukt
Expand Down Expand Up @@ -16963,7 +16969,7 @@ bg: кокосово масло
de: Kokosbutter
es: manteca de coco
hr: kokosov maslac
it: Burro cocco, Burro cocco puro
it: Burro cocco
pl: masło kokosowe
ro: unt de cocos

Expand Down Expand Up @@ -41307,7 +41313,7 @@ fr: agrumes à base de concentré

# description:en:cashew apple is the fruit of the cashew tree which also produces the cashew nut

< en:fruits
< en:fruit
en: cashew apples
bg: плодове на кашу
cs: plodů kešu
Expand Down Expand Up @@ -60904,7 +60910,7 @@ de: Junge Markerbsen
fr: farine de petits pois
en: garden peas flour

< en:carrots
< en:carrot
< en:garden peas
de: Junge Erbsen mit Möhren

Expand Down Expand Up @@ -63460,7 +63466,7 @@ wikipedia:en: https://en.wikipedia.org/wiki/Cumin
#fi:jauhettu kumina
#it:cumino macinato
#nn:malt kummin
openfoodfacts:en: https://world.openfoodfacts.org/ingredient/fr:cumin-moulu
#openfoodfacts:en: https://world.openfoodfacts.org/ingredient/fr:cumin-moulu
# 52 products in 3 languages @2018-10-07

#<en:seed
Expand Down Expand Up @@ -72105,7 +72111,7 @@ it: semi di vaniglia estratti
en: vanilla preparation
hr: pripravak vanilija, krema sa okusom vanilije, puding s okusom vanilije

< en:herbs
< en:herb
en: za'atar
xx: za'atar

Expand Down Expand Up @@ -78208,7 +78214,7 @@ agribalyse_food_code:en: 26071
ciqual_food_code:en: 26180
ciqual_food_name:fr: Thon à l'huile de tournesol, entier, appertisé, égoutté

< en:Canned tunas
< en:Canned tuna
en: Tuna chunks
bg: Парченца риба тон
de: Thunfisch Stücke
Expand Down
13 changes: 9 additions & 4 deletions taxonomies/labels.txt
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ stopwords:es: por, la, las, los, el, apto, para, con, de, verificado, certificad
stopwords:fi: ainesosat, vahvistettu, sertifioitu, hyväksytty, 100%, puhdas, tuote, standardi, valvottu, valvottua, tuotettu
stopwords:fr: ingrédient, ingrédients, agricole, agricoles, issu, issue, issus, issues, d', du, de, l', le, la, convient, peut convenir, au, aux, régime, régimes, conforme, validé, vérifié, certifié, par, un, une, approuvé, 100%, pur, garantie, garanti, produit garanti, produit certifié, sas, en, durant, pendant, phase, période, dès, label, marque, logo, produit issu, produits issus, tous, les, sont, est, issus de la production
stopwords:hu: összetevők, mezőgazdaság, ellenőrzött, tanúsított, 100%, természetes, termék, garantált
stopwords:it: il, la, i, gli, le, di, del, dei, della, delle, nel, nei, nella, nelle, con, da, dal, dalla, dai, dagli, dalle
stopwords:pl: z, certyfikowane, certyfikowany, certyfikowana, gwarancja
stopwords:pt: ingredientes, agrícola, de, com, por, a, o, as, os, apto, para, verificado, certificado, conforme, validado, verificado, aprovado, controlado, 100%, puro, garantia, garantido, produto garantido, produto certificado, etiqueta, marca
stopwords:ru: ингредиенты, из, с, проверено, сертифицировано, подтверждено, 100%, чистый, продукт, гарантия, гарантировано, стандарт
Expand All @@ -49,14 +50,15 @@ synonyms:es: OGMs, Organismos genéticamente modificados, transgénicos, OGM
synonyms:bg: ГМО, Генетично модифицирани организми
synonyms:fi: GMO:ita, muuntogeenisiä organismeja, muuntogeeninen, GMO, GM
synonyms:fr: OGMs, Organismes génétiquement modifiés, OGM
synonyms:it: OGM, OGMs, Organismo geneticamente modificato, Organismi geneticamente modificati
synonyms:pt: OGMs, Organismos geneticamente modificados, transgénicos, transgênicos, OGM
synonyms:ru: ГМО, Генетически Модифицированные Организмы

synonyms:en: Rich in, High in
synonyms:es: Rico en, Alto contenido de, Alto contenido en
synonyms:fi: runsaasti, paljon
synonyms:fr: Riche en, Fort en
synonyms:it: Ricco in, Alto contenuto di, Ricchi in
synonyms:it: Ricco, ricchi, alto contenuto
synonyms:pt: Rico em, Alto conteúdo de, Alto teor em, Alto teor de, Enriquecido com
synonyms:ru: Обогащено

Expand Down Expand Up @@ -137,8 +139,11 @@ synonyms:fr: dès la fin, après
synonyms:pt: do fim de, depois, depois de

synonyms:fr: antibiotique, anti-biotique, anti-biotiques, anti-biotiques, traitement antibiotique, utilisation d'antibiotique
synonyms:it: antibiotici, antibiotico, uso di antibiotici, trattamento con antibiotici
synonyms:pt: antibiótico, anti-biótico, antibióticos, anti-bióticos, antibiotico, anti-biotico, antibioticos, anti-bioticos

synonyms:it: allevamento, allevamenti, allevate, allevata, allevati, allevato

synonyms:fr: phase de ponte, période de ponte, la ponte

synonyms:fr: premier, 1er
Expand Down Expand Up @@ -276,7 +281,7 @@ fi: vapaana laiduntava, ulkoilma
fr: Plein air
he: גידול חופשי
hu: Szabad tartású
it: Allevamento a terra, allevate a terra, allevata a terra, allevati a terra, allevato a terra
it: Allevamento a terra
nl: Vrije uitloop
pt: Em liberdade, ao ar livre
ru: свободный выгул, открытый воздух
Expand Down Expand Up @@ -309,7 +314,7 @@ fr: Œufs de poules élevées en plein air, Œufs de plein air, Œufs plein air,
he: ביצי חופש
hr: iz slobodnog uzgoja, iz štalskog uzgoja, iz štalskog (podnog) uzgoja, iz podnog uzgoja
hu: Szabad tartású tojás
it: Uova da allevamento all'aperto
it: Uova da allevamento all'aperto, uova da galline allevate all'aperto
nl: Vrije uitloop eieren
pl: jaja z chowu na wolnym wybiegu, 1 - jajka z chowu na wolnym wybiegu, 1 - jaja z chowu na wolnym wybiegu
pt: Ovos de galinhas criadas em liberdade, ovos de galinhas em liberdade, ovos de galinhas livres de gaiolas, ovos de galinhas criadas ao ar livre, ovos de galinhas ao ar livre
Expand Down Expand Up @@ -358,7 +363,7 @@ en: Free range chicken
de: Freilandhühnchen
fi: ulkokana, vapaana elänyt kana
hu: Szabad tartású csirke, Szabad tartású tyúk
it: Polli allevati a terra
it: Polli allevati a terra, galline allevate a terra
pt: Galinhas criadas em liberdade, galinhas em liberdade, galinhas livres de gaiolas, galinhas criadas ao ar livre, galinhas ao ar livre

< en:Free range
Expand Down
2 changes: 1 addition & 1 deletion taxonomies/nutrients.txt
Original file line number Diff line number Diff line change
Expand Up @@ -750,7 +750,7 @@ fi: Psikoosi
fr: Psicose
fy: Psikoaze
gl: Psicosa
it: Allulosio
it: Allulosio, D-allulosio, psicosio, D-psicosio
ja: プシコース
ko: 프시코스
pt: Psicose
Expand Down
16 changes: 16 additions & 0 deletions taxonomies/origins.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5409,6 +5409,22 @@ nl: Sicilië
pl: Sycylia
tr: Sicilya

< en:italy
en: Apulia
xx: Puglia

< en:italy
en: Lazio
xx: Lazio

< en:lazio
en: Rome
xx: Roma

< en:italy
en: Tuscany
xx: Toscana

############################### Netherlands

< en: Netherlands
Expand Down
13 changes: 11 additions & 2 deletions taxonomies/packaging_materials.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,14 @@ synonyms:fr: transparent, transparente

synonyms:fr: coloré, colorée

synonyms:it: materiale, materiali

synonyms:it: mista, misto, miste, misti

synonyms:it: riciclabili, riciclabile

synonyms:it: riciclato, riciclata, riciclati, riciclate

stopwords:en: 100%
stopwords:fr: 100%
stopwords:xx: 100%, RC, Recycling Code
Expand Down Expand Up @@ -80,7 +88,7 @@ he: פלסטיק
hr: plastika
hu: Műanyag
is: plast
it: Plastica
it: Plastica, plastiche, plastici
ja: プラスチック
ko: 플라스틱
lt: plastikas
Expand Down Expand Up @@ -377,7 +385,7 @@ de: Biobasierter Kunststoff
es: Plástico de base biológica
fr: Plastique biosourcé, Plastiques / Autres plastiques Biosourcés (base amidon)
hu: Biobázisú műanyag
it: Plastica biobased
it: Plastica biologica, bioplastica
pt: Plástico biobaseado

< xx:PET
Expand All @@ -386,6 +394,7 @@ xx: bPET, bio-PET
de: bPET - Biobasiertes PET, Biobasiertes PET
fr: bPET - PET biosourcé, PET biosourcé
hu: bPET - Biobázisú PET, Biobázisú PET
it: bPET - PET biologico, bioPET
nl: bPET - Biobased PET, Biobased PET
pt: bPET - Bioplástico PET, Bioplástico PET
description:en: Biobased PET is plastic partially produced using annually renewable biomass like different sugars obtained from agricultural activities.
Expand Down
Loading

0 comments on commit 36516d8

Please sign in to comment.