You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A simplified format makes it easier to edit and maintain the dictionary.
We need:
to define a format
rules to expand the inflected forms from the simplified format (inflected forms for regular verbs, plurals for nouns, etc.)
ways to write the exceptions (everything that doesn't fit the regular inflected forms).
To be sure that everything works as expected, we need scripts to convert from simplified format to expanded format, and vice versa. The results must be identical.
Verbs
simplified format: recharge=verb=all
expanded format: recharge=recharge/VB,recharged/VBD,recharging/VBG,recharged/VBN,recharge/VBP,recharges/VBZ=all
The rules are defined here (I will re-write and improve those rules)
Nouns
All tagging possibilities for nouns are here: NN-counted.txt
If we come up with a format for the 8 first common cases, we cover 99% of the nouns in the dict. [But only of those that are regular, or that can be derived with simple rules.]
For nouns with only one form and one tag (lines 2, 5 and 6), we can use just the actual tag
NN Noun, singular count noun: bicycle, earthquake, zipper
NNS Noun, plural: bicycles, earthquakes, zippers
NN:U Nouns that are always uncountable #new tag - deviation from Penn, examples: admiration, Afrikaans
NN:UN Nouns that might be used in the plural form and with an indefinite article, depending on their meaning #new tag - deviation from Penn, examples: establishment, wax, afternoon
NNP Proper noun, singular: Denver, DORAN, Alexandra
NNPS Proper noun, plural: Buddhists, Englishmen
The text was updated successfully, but these errors were encountered:
=noun= nouns NN with a regular plural NNS =noun_UN= nouns NN:UN with a regular plural NNS =noun_U= nouns NN:U with a regular plural NNS (this is contradictory?: U means always uncountable)
For lemmas with only one form, use just the POS tag: =NN= (Most of these words are usually adjectives, tagged as nouns as well) =NN:U= =NN:UN=
For all other cases (irregular plurals, more than one plural, etc.) use the full inflected forms with tags. addendum=addendum/NN,addenda/NNS,addendums/NNS=all
=noun_U= nouns NN:U with a regular plural NNS (this is contradictory?: U means always uncountable)
It is contradictory. By the book, anything tagged with NN:U should not have a plural form, otherwise it's NN:UN.
But in reality, our dictionary has plenty of NN:U/NNS pairs, so having a label for that makes sense.
Is there a label for proper nouns (NNP/NNPS), or do those fall under =noun=?
The distinction between NN and NNP, NNS and NNPS has always been very useful.
We would need to distinguish the proper nouns some way. This would be coherent? =proper_noun= nouns NNP with a regular plural NNPS
For lemmas with only one form, use just the POS tag: =NNP= =NNPS=
But if NNP+NNPS is not so frequent, maybe it is misleading. Then just write out both forms.
A simplified format makes it easier to edit and maintain the dictionary.
We need:
To be sure that everything works as expected, we need scripts to convert from simplified format to expanded format, and vice versa. The results must be identical.
Verbs
simplified format:
recharge=verb=all
expanded format:
recharge=recharge/VB,recharged/VBD,recharging/VBG,recharged/VBN,recharge/VBP,recharges/VBZ=all
The rules are defined here (I will re-write and improve those rules)
Nouns
All tagging possibilities for nouns are here: NN-counted.txt
If we come up with a format for the 8 first common cases, we cover 99% of the nouns in the dict. [But only of those that are regular, or that can be derived with simple rules.]
For nouns with only one form and one tag (lines 2, 5 and 6), we can use just the actual tag
The text was updated successfully, but these errors were encountered: