Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update British Dictionary to V3.3.5 (1-AUG-2024) #28

Open
marcoagpinto opened this issue Aug 1, 2024 · 5 comments
Open

Update British Dictionary to V3.3.5 (1-AUG-2024) #28

marcoagpinto opened this issue Aug 1, 2024 · 5 comments

Comments

@marcoagpinto
Copy link
Member

Heya,

Could someone update the British dictionary for the next release of LanguageTool?

I still haven't uploaded the extensions for Mozilla, LibreOffice, etc., but I have attached here the zip with the files.

There is no need to unmunch it since the wordlist is already in the .txt (UTF8-BOM).

Thanks!
2024-08-01gb.zip

@marcoagpinto
Copy link
Member Author

Heya,

Two weeks have gone by, is there any feedback?

The GitHub where the dictionary files (unzipped) are is: https://github.com/marcoagpinto/aoo-mozilla-en-dict

@marcoagpinto
Copy link
Member Author

@jaumeortola @danielnaber

Is anyone willing to update the dictionary?

😛 😛 😛 😛 😛 😛 😛 😛

@jaumeortola
Copy link
Member

Hello @marcoagpinto.
This project has been paused for some time. Sorry for taking so long to respond.

Now all new words have to be added to a source file common to all language variants, as explained in this README file. They have to be properly tagged and labelled with language variants.

A problem with your British English dictionary is that you are generating many word forms (hundreds of thousands of forms) that are probably wrong or nonsense. This seems to happen because of suffixes. Most of the word forms in src-discarded.txt come from the British spelling dict. There are still many wrong word forms in src-pending.txt.

To add new words from the British spelling dict, we'll need a difference between the current version and the previous one. Do this with the lemmas (not the word forms with all the suffixes). Then we wil check, tag and label them.

@jaumeortola
Copy link
Member

@marcoagpinto I have already extracted the diffs from the en-GB dictionary. Now I need to think about ways of tagging these words.

@marcoagpinto
Copy link
Member Author

marcoagpinto commented Jan 14, 2025

@jaumeortola

You could have downloaded the wordlist directly from my GitHub folder:
wordlist_marcoagpinto_20250101_277695w.txt

This avoids having to decode suffixes and prefixes for LanguageTool.

😛 😛 😛

Also notice that I am now the maintainer of ZA (South Africa) since the two maintainers vanished from the map (their website and e-mail no longer exist).

Before 2026 I will try to implement features into my software (Proofing Tool GUI) that will allow converting the GB Dictionary to US (also CA and AU) and select whether to remove -ise or -ize. After this happens, I will “take over” US+CA+AU, since Kevin Atkinson takes years to just add a dozen of words.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants