-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fr] Ignore words starting with capital letter in "VERBE_SUIVI_D_UN_NOM" ruleset #10422
base: master
Are you sure you want to change the base?
[fr] Ignore words starting with capital letter in "VERBE_SUIVI_D_UN_NOM" ruleset #10422
Conversation
The disambiguation processor seem to have many flaws with this kind of structure. Anyway, this rule will just ignore the mistake when the disambiguation is not good, and when the processor will improve, it will correctly catch them all. For the record, an example of disambiguation problem: "Il prend café"
There is no way that café could be an adjective. There is not even a single name in the sentence... "Il prend pelle"
How can a conjugated verb follow a conjugated verb that is not an auxiliary? What could be the subject of "V sub pres 1 s"? That doesn't seem to make any sense. |
This is the continuation of the PR: #10385 I'll keep tracking down the diffs but we should be close to the end. |
@jaumeortola This is ready for merge |
@jaumeortola @LucieSteib Can we merge this? So I can check if every case is 100% handled? |
@@ -116344,7 +116351,7 @@ Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA | |||
<suggestion>un \2</suggestion> | |||
<suggestion>une \2</suggestion> | |||
<suggestion>des \2</suggestion> | |||
<example correction="un rhume|une rhume|des rhume">J'attrape <marker>rhume</marker>.</example> | |||
<example correction="un lapin|une lapin|des lapin">J'attrape <marker>lapin</marker>.</example> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: I still don't get why it's not possible to get the right suggestion here (feminine or plural forms...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would love it if it was possible. Do you know how? I thought that LanguageTool had no solution for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, we have a verb followed by a name. The easiest and one of the likeliest solution is to add the missing article. But we need to know the gender and number of the name in order to generate the correct article. But I don't think that the suggestions can be conditional to the gender or number of that name. At least I found nothing of the sort in the documentation and ChatGPT didn't know one either...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, the original problem I was trying to solve was with the sentence: |
And indeed you are right, both these sentences don't get corrected properly yet: The "rules" you see starting with |
The original rule CONFUSION_ER_E_PAR is wrong and is supposed to be replaced with CONFUSION_ER_E_PAR2. But you're right, this is not related to this PR and we can discuss it in the other one. About the fixes made by the AI, I understand that this part might not be available in the open source project, but in this case, that means that there is no rule for those in the current project and thus, this PR is important. |
The models operations are available for non-Premium users, on the Editor and the web extension. |
I remember that you want to replace CONFUSION_ER_E_PAR withCONFUSION_ER_E_PAR2 entirely, yes. |
@LucieSteib About CONFUSION_ER_E_PAR2, I'm confident to arrive at a quite good result and I promise to analyze very carefully the diffs until I get something satisfying. However, I had to give up some detections to work around errors made in the disambiguation process, as mentioned here. In the future, I might try to improve the disambiguation to achieve even better detection. But the current version should already bring quite an improvement. I'm not sure about how to do the replace of one rule by the other and see the diff. I just know how to see the diffs introduced by new rules being added. Maybe, when the rule is ready, I should open a PR that does the replacement?
I don't think that this is available for self hosted instances, is it? We use it for offline usage. If it's not available, then I still insist that we need a rule in the project to cover those problems. I have no problem trying to improve the rule, though, if you don't like the way it is now. |
No, you're right, the model would be far too heavy for a self-hosted instance, it's not accessible like that. About
About CONFUSION_ER_E_PAR2: I've nothing against the rule :) as long as it's at least as good as the current one, see the process we could try: About the actual replacement, when the rule is ready (meaning:
|
<exception>godot</exception> | ||
<exception postag="[^N].*" postag_regexp="yes"/> | ||
<exception regexp="yes" case_sensitive="yes">[A-ZÉÈÀÙÂÊÎÔÛÄËÏÖÜÇ].*</exception> | ||
<exception regexp="yes">réparation|confirmation|famille|godot|lundi|mardi|mercredi|jeudi|vendredi|samedi|dimanche|janvier|février|mars|avril|mai|juin|juillet|août|septembre|octobre|novembre|décembre|début|mi-.*|fin</exception> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion(format): maybe here you could use ENTITY like mois_annee
and jours_semaine
(but maybe also unites_temps
, parties_journee
...) all the ENTITIES are at the top of the grammar.xml file.
You can call them in the rule with structures like:
<token regexp="yes">&mois_annee;</token>
Fixup for VERBE_SUIVI_D_UN_NOM based on the last diff.
We will fix 2 things:
I noticed that the rule is still triggered when there is a newline character between the 2 tokens. I don't know how to ignore that. For now, the check about capital letters should do the trick, but that's not ideal.