-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Checking the source dictionary #5
Comments
Separating the dict in Done:
|
@jaumeortola There are entries in the src dictionary that seem controversial to me: I'm looking at the
This comes from src-pending.txt. But this entry will create just two verb forms for this word. This is not a good idea, since if the verb exists, we should tag it, or drop it from the dictionary. In this case, it is likely a typo, no clean corpus and dictionary has it (checked common corpus from pleias: https://huggingface.co/datasets/PleIAs/common_corpus/viewer?sql_console=true&sql=SELECT+*%0AFROM+train%0AWHERE+text+LIKE+%27%25formulaize%25%27%0ALIMIT+10%3B&views%5B%5D=train ) Such lines should not be admissible, as they will create incorrect entries. There are plenty of such verbs there in this file. I already have some checks for myself (in another project) in a Python unittest to highlight inconsistencies and gaps. |
The first version of the source dictionary is here: https://github.com/languagetool-org/english-pos-dict/tree/main/src-dict
I will be adding some comments and ideas here. We can open new issues for some parts of the work.
survivorshipably... survivorshipry
).us-large
come from a Hunspell US dictionary that we didn't use until now. It is mentioned in Explore differences between en-US and en-US-large #2recharge=verb=all
. (We use a few rules to cover more cases of regular verbs. See here). It would be useful to have something similar for nouns: a simple and quick way to tag a noun. We would need to define the format, and ways to write exceptions.The text was updated successfully, but these errors were encountered: