-
-
Notifications
You must be signed in to change notification settings - Fork 387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dutch/Estonian: dash is not always a separator #6122
Comments
It's strange, if I remove the extra space in "gemengde EU - en niet-EU-honing" (before the first EU), it's added back. |
Another example:
|
In Estonian the same language construction is used:
See also https://et.wiktionary.org/wiki/flavoring Both mean flavouring of some sort |
This issue is stale because it has been open 90 days with no activity. |
Another interesting product: https://nl.openfoodfacts.org/product/8712800025665/brownie-mona . |
Oups sorry I inadvertently removed ingredients on above product, but restore them thereafter ! |
Describe the bug
The dash should not always been interpreted as a separator. In dutch it is a way to limit repetitions.
Other examples:
mono- en diglyceriden van vetzuren veresterd met mono- en diacetylwijnsteenzuur
To Reproduce
See: https://nl.openfoodfacts.org/product/8718907369589/bloemenhoning-albert-heijn
Expected behavior
For instance:
EU- en niet-EU-honing
, should not be expanded toEU, niet-EU-honing
, but should left untouched. In dutch this is interpreted asEU-honing, niet-EU-honing
.Additional context
A parse rule could be based on the surroundings of the dash:
en
;Number of products impacted
Happens quite often.
Part of
The text was updated successfully, but these errors were encountered: