Cb changes 20241121 #11051

tiff · 2024-11-21T15:54:07Z

Summary by CodeRabbit

Release Notes

New Features
- Introduced a new rule for ignoring spelling errors related to WPA wireless security protocols.
- Expanded vocabulary with new scientific names, proper nouns, and contemporary terms in multiple languages.
- Enhanced German language support with new compound words and updated spelling rules according to the 2024 reform.
Bug Fixes
- Corrected spelling of various names and terms to comply with updated German spelling standards.
Improvements
- Updated ignore lists to accommodate new terms and acronyms, enhancing spellcheck accuracy.
- Enhanced multitoken suggestions for better language processing.

coderabbitai · 2024-11-21T15:54:15Z

Walkthrough

The pull request introduces several updates across various files in the LanguageTool project. A new disambiguation rule "WPA2" is added to ignore spelling errors for specific wireless security protocol identifiers. Additionally, numerous entries are added to spelling and ignore lists, enhancing the vocabulary for both English and German language modules. Changes also include updates to compound words, multitoken suggestions, and spelling corrections according to recent reforms. The overall structure of the files remains intact, with a focus on expanding the language model's knowledge base.

Changes

File	Change Summary
`languagetool-core/src/main/resources/org/languagetool/resource/disambiguation-global.xml`	New rule added: `<rule name="WPA2" id="WPA2">` to ignore spelling for tokens matching `WPA[1-3]`.
`languagetool-core/src/main/resources/org/languagetool/resource/spelling_global.txt`	Added numerous scientific names, proper nouns, and terms related to biology, geography, and popular culture.
`languagetool-language-modules/de/src/main/resources/org/languagetool/resource/de/compounds.txt`	Extensive additions of new German compound words with formatting symbols for suggestion behavior.
`languagetool-language-modules/de/src/main/resources/org/languagetool/resource/de/hunspell/ignore.txt`	New entries added to ignore list: `Abijith/S`, `Ekitiké/S`, `WPA #abk`, `Embedded #eng`, `VAT #abk`.
`languagetool-language-modules/de/src/main/resources/org/languagetool/resource/de/hunspell/spelling.txt`	Added multiple new adjectives, nouns, and verb forms to enhance the German lexicon.
`languagetool-language-modules/de/src/main/resources/org/languagetool/resource/de/multitoken-suggest.txt`	New entries added and existing ones updated with grammatical suffixes; some entries removed.
`languagetool-language-modules/de/src/main/resources/org/languagetool/rules/de/replace.txt`	Updates to replacement rules for names and terms according to the 2024 German spelling reform.
`languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/added.txt`	New entries added, including proper nouns and terms related to technology and culture.
`languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/hunspell/ignore.txt`	Extensive additions to the ignore list, including contemporary terms and proper nouns.
`languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/hunspell/spelling.txt`	New entries added across various categories, including scientific terminology and colloquial expressions.
`languagetool-language-modules/en/src/main/resources/org/languagetool/rules/en/diacritics.txt`	Updates to include correct diacritical marks for various terms and proper nouns.
`languagetool-language-modules/en/src/main/resources/org/languagetool/rules/en/grammar.xml`	Updated antipattern definition for "thesis" to include multiple related terms.

Possibly related PRs

Cb changes 20240830 #10856: Adds new entries to spelling_global.txt, potentially related to the new rule for ignoring specific terms.
Cb changes 20240905 #10865: Similar to Cb changes 20240830 #10856, this PR adds entries to spelling_global.txt, relevant to the main PR changes.
[pt] Added words to added.txt and spelling.txt #10917: Introduces new entries to added.txt and spelling.txt, which may be impacted by the new rule.
[de] add words + gGEC AP + removed currencies #10961: Adds new entries to ignore.txt, potentially connected to the new rule for ignoring specific terms.
Jdk19 regexp fix #10972: Modifies regex patterns, which may relate to handling specific terms in the main PR.
[pt] Fix in Premium — 2024-11-06 #10994: Adds new entries to spelling_global.txt, potentially related to the new rule for ignoring specific terms.
Cb changes 20241111 #11010: Introduces a large number of new words to spelling_global.txt, possibly relevant to the new rule.
Cb changes 20241113 #11021: Adds numerous new entries to spelling_global.txt, which may relate to the new rule.
[pt] Added words to added.txt and spelling.txt #11026: Adds new entries to added.txt and spelling.txt, which could be affected by the new rule.
[nl] add rules/spelling #11045: Introduces new entries to multiwords.txt, potentially relevant to the spelling rule changes.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (4)

languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/hunspell/spelling.txt (1)
Line range hint 544-549: Resolve inconsistent hyphenation and remove duplicate entries.

The degradable compounds show inconsistent hyphenation patterns and contain duplicate entries:

Inconsistent hyphenation: "UV-degradable", "oxy-degradable", "bio-degradable" vs "biodegradable"

Duplicate entries: "bio-degradable"/"biodegradable" and "bio-degradables"/"biodegradables"

Apply this diff to maintain consistency and remove duplicates:
 UV-degradable
 oxy-degradable
-bio-degradable
-bio-degradables
 biodegradable
 biodegradables
🧰 Tools

🪛 LanguageTool

[misspelling] ~545-~545: This word is normally spelled as one.
Context: ...ntrusively UV-degradable oxy-degradable bio-degradable bio-degradables biodegradable biodegrad...

(EN_COMPOUNDS_BIO_DEGRADABLE)
languagetool-language-modules/en/src/main/resources/org/languagetool/rules/en/diacritics.txt (1)
Diacritic mappings need alignment between language modules

The verification revealed some inconsistencies that need attention:

German replace.txt contains identical diacritic mappings as the English module, but also includes an additional form "Hugo Ekitikes=Hugo Ekitikés"

Spanish module uses "Morata" in different contexts (place names) without diacritics

The same names appear in different contexts: place names (Spanish) vs. person names (English/German)

Recommendations:

Align the diacritic mappings between English and German modules, particularly for "Hugo Ekitike" forms

Consider whether place names with "Morata" in Spanish should retain their non-diacritic form to match local usage

Document the context-specific handling of diacritics for names that can refer to both places and people

🔗 Analysis chain

Line range hint 1397-1401: Verify consistency with other language modules.

Please ensure these new diacritic mappings are consistent with other language modules where these terms might appear.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Check for potential conflicts or inconsistencies in other language modules

# Search for these terms in other language modules
echo "Checking for potential conflicts in other language modules..."
rg -l "Ekitike|Morata|Bollore|Defense Arena|Calhanoglu|Mulino" "languagetool-language-modules/*/src/main/resources/"
Length of output: 343

Script:
#!/bin/bash
# Let's try a different approach using fd to find the correct paths first
echo "Finding language module resource directories..."
fd -t d "resources" "languagetool-language-modules"

echo -e "\nChecking for these terms in all language modules..."
fd -t d "resources" "languagetool-language-modules" -x rg -l "Ekitike|Morata|Bollore|Defense Arena|Calhanoglu"
Length of output: 2906

Script:
#!/bin/bash
# Let's check the actual content in these files to see if there are any conflicts
echo "Content in Spanish multiwords.txt:"
cat "languagetool-language-modules/es/src/main/resources/org/languagetool/resource/es/multiwords.txt" | grep -E "Ekitike|Morata|Bollore|Defense Arena|Calhanoglu"

echo -e "\nContent in German replace.txt:"
cat "languagetool-language-modules/de/src/main/resources/org/languagetool/rules/de/replace.txt" | grep -E "Ekitike|Morata|Bollore|Defense Arena|Calhanoglu"

echo -e "\nContent in German ignore.txt:"
cat "languagetool-language-modules/de/src/main/resources/org/languagetool/resource/de/hunspell/ignore.txt" | grep -E "Ekitike|Morata|Bollore|Defense Arena|Calhanoglu"
Length of output: 1085
languagetool-core/src/main/resources/org/languagetool/resource/disambiguation-global.xml (2)
400-405: Consider renaming the rule to better reflect its coverage.

While the implementation is correct, the rule name "WPA2" is slightly misleading as it handles WPA1 and WPA3 as well. Consider renaming it to "WPA_VERSIONS" or similar to better reflect its actual coverage.
-    <rule name="WPA2" id="WPA2">
+    <rule name="WPA_VERSIONS" id="WPA_VERSIONS">
399-406: Consider grouping technology-related rules together.

The WPA rule seems out of place between chemical formulas and function names. Consider:

Creating a new rulegroup for technology standards/protocols

Moving this rule into that group
+    <rulegroup id="GLOBAL_TECHNOLOGY_STANDARDS" name="Technology standards and protocols">
+        <rule name="WPA_VERSIONS" id="WPA_VERSIONS">
+            <pattern>
+                <token regexp="yes" case_sensitive="yes">WPA[1-3]</token>
+            </pattern>
+            <disambig action="ignore_spelling"/>
+        </rule>
+    </rulegroup>
+
     <rulegroup id="GLOBAL_IGNORE_FUNCTION_NAMES" name="ignore function names">

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 7061254 and ccd8620.

📒 Files selected for processing (12)

languagetool-core/src/main/resources/org/languagetool/resource/disambiguation-global.xml (1 hunks)
languagetool-core/src/main/resources/org/languagetool/resource/spelling_global.txt (2 hunks)
languagetool-language-modules/de/src/main/resources/org/languagetool/resource/de/compounds.txt (1 hunks)
languagetool-language-modules/de/src/main/resources/org/languagetool/resource/de/hunspell/ignore.txt (1 hunks)
languagetool-language-modules/de/src/main/resources/org/languagetool/resource/de/hunspell/spelling.txt (1 hunks)
languagetool-language-modules/de/src/main/resources/org/languagetool/resource/de/multitoken-suggest.txt (1 hunks)
languagetool-language-modules/de/src/main/resources/org/languagetool/rules/de/replace.txt (1 hunks)
languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/added.txt (1 hunks)
languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/hunspell/ignore.txt (1 hunks)
languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/hunspell/spelling.txt (1 hunks)
languagetool-language-modules/en/src/main/resources/org/languagetool/rules/en/diacritics.txt (1 hunks)
languagetool-language-modules/en/src/main/resources/org/languagetool/rules/en/grammar.xml (1 hunks)

✅ Files skipped from review due to trivial changes (2)

languagetool-language-modules/de/src/main/resources/org/languagetool/resource/de/hunspell/ignore.txt
languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/hunspell/ignore.txt

🧰 Additional context used

🪛 LanguageTool

languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/added.txt

[duplication] ~1067-~1067: Possible typo: you repeated a word
Context: ...hijith Abhijith NNP Abijith Abijith NNP Ting-Lu Ting-Lu NNP Chi-Yu Chi-Yu NNP Koraidon Koraidon...

(ENGLISH_WORD_REPEAT_RULE)

🔇 Additional comments (18)

languagetool-language-modules/de/src/main/resources/org/languagetool/resource/de/hunspell/spelling.txt (3)

65275-65281: LGTM: Adjective entries are well-formed

The new adjective entries follow proper German compound word formation rules and use consistent /A suffix annotations.

Also applies to: 65283-65284, 65290-65291

65282-65282: LGTM: Noun entries are properly annotated

The noun entries are correctly formed with:

Proper capitalization (German nouns)
Correct /S suffix for plural forms
Appropriate #eng annotation for English loanword

Also applies to: 65285-65287

65288-65289: LGTM: Separable verb entries are correctly formatted

The separable verb entries use proper underscore notation to mark separable prefixes (über_, unter_, rein_).

Also applies to: 65292-65292

languagetool-language-modules/en/src/main/resources/org/languagetool/rules/en/grammar.xml (2)

72202-72202: LGTM! Enhanced academic context detection.

The expanded token pattern provides better coverage for academic-related terms while maintaining semantic consistency.

72202-72202: Verify impact on related grammar rules.

The broadened pattern might affect other grammar rules that rely on these academic terms.

languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/hunspell/spelling.txt (2)

541-543: LGTM!

The following entries are correctly formed and consistent with existing patterns in the file:

"culturewise" follows the pattern of other "-wise" compounds
"unintrusive" and "unintrusively" are valid "un-" prefix forms
"outflux" and "outfluxes" are valid singular and plural forms

Also applies to: 551-552

Line range hint 550-550: Verify capitalization of proper adjective.

Please verify if "Braudelian" (derived from historian Fernand Braudel) should always be capitalized.

✅ Verification successful

Capitalization of "Braudelian" is correct

The capitalization is verified to be correct based on the following evidence:

The word is derived from a proper name (Fernand Braudel)
Other proper adjectives and name-derived terms in the spelling dictionary consistently follow capitalization (e.g., Oaxacan, Lamanite)
This follows standard English rules where adjectives derived from proper nouns are capitalized

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for existing usage patterns of "Braudelian" in the codebase
# and academic sources to confirm capitalization rules

# Search for any variations of the word in the codebase
rg -i 'braudelian'

# Search for other proper adjectives in the spelling list for consistency
rg '^\w+ian$' "languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/hunspell/spelling.txt"

Length of output: 272

Script:

#!/bin/bash
# Let's check for other similar proper adjectives derived from names
# to establish a consistent pattern in the spelling dictionary

rg -i '^[A-Z][a-z]+ian$' "languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/hunspell/spelling.txt"

# Also check for other capitalized words in the spelling dictionary
# to understand the general capitalization rules
rg '^[A-Z]' "languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/hunspell/spelling.txt" | head -n 20

Length of output: 490

🧰 Tools

🪛 LanguageTool

[misspelling] ~545-~545: This word is normally spelled as one.
Context: ...ntrusively UV-degradable oxy-degradable bio-degradable bio-degradables biodegradable biodegrad...

(EN_COMPOUNDS_BIO_DEGRADABLE)

languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/added.txt (3)

Line range hint 1-3: LGTM! Clear and informative header comments.

The file header effectively communicates the purpose and format of the dictionary entries.

🧰 Tools

🪛 LanguageTool

[duplication] ~1064-~1064: Possible typo: you repeated a word
Context: ...ngo Gholdengo NNP Wo-Chien Wo-Chien NNP Chien-Pao Chien-Pao NNP Abhijith Abhijith NNP Abijith Abiji...

(ENGLISH_WORD_REPEAT_RULE)

[duplication] ~1065-~1065: Possible typo: you repeated a word
Context: ...en Wo-Chien NNP Chien-Pao Chien-Pao NNP Abhijith Abhijith NNP Abijith Abijith NNP Ting-Lu Ting-Lu...

(ENGLISH_WORD_REPEAT_RULE)

[duplication] ~1066-~1066: Possible typo: you repeated a word
Context: ...Pao Chien-Pao NNP Abhijith Abhijith NNP Abijith Abijith NNP Ting-Lu Ting-Lu NNP Chi-Yu Chi-Yu N...

(ENGLISH_WORD_REPEAT_RULE)

[duplication] ~1067-~1067: Possible typo: you repeated a word
Context: ...hijith Abhijith NNP Abijith Abijith NNP Ting-Lu Ting-Lu NNP Chi-Yu Chi-Yu NNP Koraidon Koraidon...

(ENGLISH_WORD_REPEAT_RULE)

[duplication] ~1068-~1068: Possible typo: you repeated a word
Context: ...Abijith Abijith NNP Ting-Lu Ting-Lu NNP Chi-Yu Chi-Yu NNP Koraidon Koraidon NNP Dipplin Dippl...

(ENGLISH_WORD_REPEAT_RULE)

[duplication] ~1069-~1069: Possible typo: you repeated a word
Context: ...P Ting-Lu Ting-Lu NNP Chi-Yu Chi-Yu NNP Koraidon Koraidon NNP Dipplin Dipplin NNP Poltchageist Po...

(ENGLISH_WORD_REPEAT_RULE)

[duplication] ~1070-~1070: Possible typo: you repeated a word
Context: ...Chi-Yu Chi-Yu NNP Koraidon Koraidon NNP Dipplin Dipplin NNP Poltchageist Poltchageist NNP Sinis...

(ENGLISH_WORD_REPEAT_RULE)

Line range hint 4-1066: LGTM! Consistent and well-structured dictionary entries.

The new entries maintain consistent formatting and use appropriate part-of-speech tags. The additions significantly enhance the dictionary's coverage of:

Proper nouns (Pokémon names, geographic locations)
Technical terms
Common nouns and adjectives

🧰 Tools

🪛 LanguageTool

[duplication] ~1064-~1064: Possible typo: you repeated a word
Context: ...ngo Gholdengo NNP Wo-Chien Wo-Chien NNP Chien-Pao Chien-Pao NNP Abhijith Abhijith NNP Abijith Abiji...

(ENGLISH_WORD_REPEAT_RULE)

[duplication] ~1065-~1065: Possible typo: you repeated a word
Context: ...en Wo-Chien NNP Chien-Pao Chien-Pao NNP Abhijith Abhijith NNP Abijith Abijith NNP Ting-Lu Ting-Lu...

(ENGLISH_WORD_REPEAT_RULE)

[duplication] ~1066-~1066: Possible typo: you repeated a word
Context: ...Pao Chien-Pao NNP Abhijith Abhijith NNP Abijith Abijith NNP Ting-Lu Ting-Lu NNP Chi-Yu Chi-Yu N...

(ENGLISH_WORD_REPEAT_RULE)

[duplication] ~1067-~1067: Possible typo: you repeated a word
Context: ...hijith Abhijith NNP Abijith Abijith NNP Ting-Lu Ting-Lu NNP Chi-Yu Chi-Yu NNP Koraidon Koraidon...

(ENGLISH_WORD_REPEAT_RULE)

[duplication] ~1068-~1068: Possible typo: you repeated a word
Context: ...Abijith Abijith NNP Ting-Lu Ting-Lu NNP Chi-Yu Chi-Yu NNP Koraidon Koraidon NNP Dipplin Dippl...

(ENGLISH_WORD_REPEAT_RULE)

[duplication] ~1069-~1069: Possible typo: you repeated a word
Context: ...P Ting-Lu Ting-Lu NNP Chi-Yu Chi-Yu NNP Koraidon Koraidon NNP Dipplin Dipplin NNP Poltchageist Po...

(ENGLISH_WORD_REPEAT_RULE)

[duplication] ~1070-~1070: Possible typo: you repeated a word
Context: ...Chi-Yu Chi-Yu NNP Koraidon Koraidon NNP Dipplin Dipplin NNP Poltchageist Poltchageist NNP Sinis...

(ENGLISH_WORD_REPEAT_RULE)

1067-1067: Verify the intentional inclusion of both name variants.

The addition of "Abijith" alongside "Abhijith" appears to be intentional to support variant spellings of the same name. Both are correctly tagged as proper nouns (NNP).

✅ Verification successful

Both name variants are consistently handled across dictionaries

The addition is intentional and consistent with the dictionary ecosystem:

Both "Abhijith" and "Abijith" variants are properly recognized in:
- English added.txt (as proper nouns)
- English hunspell ignore list
- German hunspell ignore list (marked as names)

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the usage and context of both name variants

# Check if these variants are used in other dictionaries or ignore lists
rg -l "Abh?ijith" --type-add 'dict:*.{txt,dic}' -t dict

Length of output: 350

Script:

#!/bin/bash
# Check the actual entries in these files to understand the context
echo "=== English added.txt ==="
rg "Abh?ijith" languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/added.txt

echo -e "\n=== English ignore.txt ==="
rg "Abh?ijith" languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/hunspell/ignore.txt

echo -e "\n=== German ignore.txt ==="
rg "Abh?ijith" languagetool-language-modules/de/src/main/resources/org/languagetool/resource/de/hunspell/ignore.txt

Length of output: 632

🧰 Tools

🪛 LanguageTool

[duplication] ~1067-~1067: Possible typo: you repeated a word
Context: ...hijith Abhijith NNP Abijith Abijith NNP Ting-Lu Ting-Lu NNP Chi-Yu Chi-Yu NNP Koraidon Koraidon...

(ENGLISH_WORD_REPEAT_RULE)

languagetool-language-modules/de/src/main/resources/org/languagetool/rules/de/replace.txt (2)

Line range hint 1-4: Well-documented format and clear organization.

The file header clearly explains:

The purpose of the rules
The expected UTF-8 format
The structure of entries (phrase=replacement[tab]message)
Usage guidelines

815-816: LGTM! The new entries follow the established format.

The rules for "Hugo Ekitike" → "Hugo Ekitiké" are correctly formatted and handle both forms appropriately.

Let's verify the consistency with related files:

✅ Verification successful

The name correction entries are properly formatted and consistent with similar patterns

The changes are well-integrated into the replace.txt file, following the same pattern as other name corrections with diacritics (e.g., "Gérald Darmanin", "Ousmane Dembélé"). The entries handle both the base form and possessive form consistently.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for related entries in diacritics.txt and multitoken-suggest.txt
# as mentioned in the AI summary

# Search for related entries in other files
rg -l "Hugo Ekitik[ée]" "languagetool-language-modules/de/src/main/resources/org/languagetool/rules/de/"

Length of output: 194

Script:

#!/bin/bash
# Let's check the content of replace.txt to see if there are any other related entries
# or similar patterns for name corrections with diacritics

rg -B2 -A2 "Hugo Ekitik[ée]" "languagetool-language-modules/de/src/main/resources/org/languagetool/rules/de/replace.txt"

# Also check if there are any other files that might need similar updates
fd ".*\.txt$" "languagetool-language-modules/de/src/main/resources/org/languagetool/rules/de/" --exec rg -l "[ée]" {}

Length of output: 1175

languagetool-language-modules/en/src/main/resources/org/languagetool/rules/en/diacritics.txt (2)

Line range hint 1-1397: Format consistency maintained.

The new entries follow the established pattern of incorrect_form=correct_form and maintain consistent formatting throughout the file.

1397-1398: Confirmed changes from summary.

The specific changes mentioned in the summary are correctly implemented:

Hugo Ekitike=Hugo Ekitiké
Alvaro Morata=Álvaro Morata

languagetool-language-modules/de/src/main/resources/org/languagetool/resource/de/multitoken-suggest.txt (1)

3120-3123: LGTM! The additions follow the correct format.

The new entries are properly formatted with appropriate suffixes and tags:

"Sexiest Man/Woman Alive" entries use /S suffix for plural forms and #eng tag
"Hugo Ekitiké" entry uses /S suffix and #name tag
"Embedded Systems" entry uses #eng tag

languagetool-language-modules/de/src/main/resources/org/languagetool/resource/de/compounds.txt (2)

11606-11606: LGTM! New entries follow the documented format.

The new compound words have been added with appropriate formatting markers that align with the file's documentation.

Line range hint 1-11606: LGTM! File maintains consistent formatting and structure.

The file maintains:

Consistent use of special markers ($, *, ?, +)
Proper alphabetical ordering
Consistent hyphenation patterns

languagetool-core/src/main/resources/org/languagetool/resource/spelling_global.txt (1)

26764-26764: LGTM: WPA2 entry aligns with disambiguation rule

The addition of "WPA2" complements the new disambiguation rule in disambiguation-global.xml for handling wireless security protocol identifiers.

Let's verify the related disambiguation rule:

✅ Verification successful

Verified: WPA2 entry correctly aligns with existing disambiguation rule

The disambiguation rule for WPA2 is present in disambiguation-global.xml with pattern WPA[1-3] which covers WPA2 along with WPA1 and WPA3 variants. The spelling entry and disambiguation rule are properly synchronized.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the WPA2 disambiguation rule exists
rg -A 5 "WPA2" "**/disambiguation-global.xml"

Length of output: 114

Script:

#!/bin/bash
# Let's first find the correct path for disambiguation files
fd "disambiguation.*\.xml"

Length of output: 2402

Script:

#!/bin/bash
# Now let's check the global disambiguation file for WPA2 rules
cat languagetool-core/src/main/resources/org/languagetool/resource/disambiguation-global.xml | grep -A 5 -B 5 "WPA2"

Length of output: 457

coderabbitai · 2024-11-21T15:58:12Z

...ol-language-modules/de/src/main/resources/org/languagetool/resource/de/hunspell/spelling.txt

+superpragmatisch/A
+regierungsoffiziell/A
+umsatzfördernd/A
+gemeinschaftsfördernd/A
+projektfördernd/A
+muskelfördernd/A
+muskelaufbaufördernd/A
+Downsizing/S #eng
+sanktionsstranguliert/A
+klimawahnsinnig/A
+Expresszugverbindung
+Expresszugverbindungen
+Expresszug/S
+über_interpretieren
+unter_interpretieren
+teilzufrieden/A
+dividendenlos/A
+rein_trainieren


💡 Codebase verification

Alphabetical ordering is not maintained in this section

The entries are not properly alphabetically ordered. The word "superpragmatisch" should come after "umsatzfördernd" based on alphabetical order. Current order:

SIM-Lock/S

superpragmatisch/A

regierungsoffiziell/A

umsatzfördernd/A

gemeinschaftsfördernd/A

projektfördernd/A

muskelfördernd/A

Correct alphabetical order should be:

SIM-Lock/S

regierungsoffiziell/A

superpragmatisch/A

umsatzfördernd/A

gemeinschaftsfördernd/A

projektfördernd/A

muskelfördernd/A

🔗 Analysis chain

Verify alphabetical ordering with surrounding entries

While the new entries appear to be alphabetically ordered among themselves, we should verify they maintain proper ordering with surrounding entries in the file.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash # Description: Check if the new entries maintain alphabetical ordering with surrounding context # Get 5 lines before and after the changed section for context rg -B 5 -A 5 "^superpragmatisch/A" "languagetool-language-modules/de/src/main/resources/org/languagetool/resource/de/hunspell/spelling.txt"

Length of output: 337

coderabbitai · 2024-11-21T15:58:12Z

languagetool-core/src/main/resources/org/languagetool/resource/spelling_global.txt

+Mini Cooper E
+MINI Cooper E


🛠️ Refactor suggestion

Standardize MINI brand capitalization

There's an inconsistency in the capitalization of the MINI brand name:

Line 26762: "Mini Cooper E"

Line 26763: "MINI Cooper E"

According to the brand guidelines, MINI should be capitalized.

-Mini Cooper E +MINI Cooper E

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

Mini Cooper E

MINI Cooper E

MINI Cooper E

MINI Cooper E

tiff added 4 commits November 21, 2024 16:49

[core] add global spelling

c65cc0e

[de] improve rules

1fde36e

[de] add spelling

9fcd35e

[en] improve rules

ccd8620

coderabbitai bot reviewed Nov 21, 2024

View reviewed changes

tiff merged commit 953d1e6 into master Nov 21, 2024
5 checks passed

tiff deleted the cb-changes-20241121 branch November 21, 2024 20:01

This was referenced Nov 26, 2024

Optimizations, part 2 #11064

Merged

Cb changes 20241128 #11069

Merged

This was referenced Dec 4, 2024

[nl] add spelling, multiwords #11079

Merged

Cb changes 20241203 #11081

Merged

coderabbitai bot mentioned this pull request Jan 6, 2025

Cb changes 20240106 #11164

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cb changes 20241121 #11051

Cb changes 20241121 #11051

tiff commented Nov 21, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 21, 2024 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot Nov 21, 2024

coderabbitai bot Nov 21, 2024

Cb changes 20241121 #11051

Cb changes 20241121 #11051

Conversation

tiff commented Nov 21, 2024 • edited by coderabbitai bot Loading

Summary by CodeRabbit

Release Notes

coderabbitai bot commented Nov 21, 2024 • edited Loading

Walkthrough

Changes

Possibly related PRs

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Nov 21, 2024

Choose a reason for hiding this comment

coderabbitai bot Nov 21, 2024

Choose a reason for hiding this comment

tiff commented Nov 21, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 21, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)