-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
German "ein" ("one") used as a numeral #1061
Comments
It seems to me that there will be some cases where one tag or the other is more intuitive, but there may be a lot of gray area in between. Do other German treebanks make a distinction, and if so, what tests do they give? (I don't know if an analogy to English one is helpful because it cannot be an indefinite article, but there are 3 different tags that can apply.) |
GSD has one occurrence of "ein" tagged as NUM (in an unamibigous context as described above) but also several validation errors because of numeral "ein" tagged as DET. The other two have no "ein" as NUM. |
The other German treebanks follow the language-specific guidelines as well, with the one exception Leonie pointed out: GSD sentence train-s4486 "Die Behaarung besteht aus ein - oder vielzelligen und nichtdrüsigen oder aber mit einem ein - oder mehrzelligen Drüsenkopf versehenen Trichomen." ("The coat of hair consists of uni- or multicellular and non-glandular trichomes or trichomes with a uni- or multicellular glandular head."). Curiously enough, the first "ein" is treated as a NUM and the second one is treated as a DET although the context looks basically identical (I don't think there is a difference between "mehrzellig" and "vielzellig" (both: "multicellular", literally "multiple/several-celled" and "many-celled"), but I can't say for sure). |
+1 for distinguishing NUM from DET in unambiguous environments, if it's possible to implement... I guess when it's modified like that it's a clear indication. |
In German, the numeral "one" can have the same form as the indefinite article (incl. Being inflected). The German UD guidelines say about this:
This causes several inconsistencies and a validator complaint:
(HDT also contains extremely similar structures that are clearly marked as numerals, e.g. “Ihm droht nun eine Gefängnisstrafe von bis zu fünf Jahren [...]” “He is now facing a prison sentence of up to five years” -- annotated with the same tree structure, but “fünf”/“five” is a NUM/nummod.)
It’s even possible to think of sentences where a DET vs NUM analysis makes a difference in meaning: “Es dauert nicht nur eine_NUM Minute (sondern zwei Minuten) / Es dauert nicht nur eine_DET Minute (sondern eine Stunde).” (“It doesn’t take only one minute (but two minutes). / It doesn’t take only a minute (but an hour).”)
As a side note, both Dutch treebanks have plenty of entries where “een” is tagged as NUM, and all three Swedish treebanks have instances of “en” or “ett” as NUM.
Can we relax the strong requirement of “ein(e)” needing to be a determiner in German UD analyses?
The text was updated successfully, but these errors were encountered: