-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
QE for HTML input doesn't work #355
Comments
Because I translated |
Ah, I totally missed that. I was under the impression that QualityEstimation scores uses those same offsets that are stored in the I'll see whether I can make the |
QE uses words (not continuous subwords), I don't expect the above approach to fare well. I have always wanted QE to use Annotation after relaxing the continuity constraints. Then we should be able to uniformly update QE "annotation" (based on words, no continuity) and plaintext vocab "annotation" (based on subwords, sentence piece guaranteed continuity). This will break the negotiated API between QE and Mozilla, but I strongly believe is the proper way to go. Especially since #298 also points towards this. In effect, this would mean pushing down some of the "insert" tag (opening or closing) before token I do not expect the above to be trivial. |
QE is slated to be implemented in W5 (March 7-11). HTML is expected to be implemented in W3 (February 21-25), which is next week. I guess it's reasonable to keep this soft at W4 here and prioritize based on how HTML completes or not by W3. |
My idea was to remove the
That call just turns the token ranges that QualityEstimator already uses internally into byte ranges: bergamot-translator/src/translator/quality_estimator.cpp Lines 270 to 285 in 2844ced
That step is something a client can easily do itself, it has access to the AnnotatedText. Hell, I could even integrate it in a way similar to 83e869c at that point, and have quality estimation scores be part of the output HTML. Then the client wouldn't need to do any work at all. But that all depends on the scenario in which we need both HTML and quality scores at the same time. |
The whole idea of filing this issue was based on @kpu's suggestion (btw @kpu thanks a lot for this suggestion 👍🏾) in last plenary to start integrating and find bugs quickly so that technical support from involving partners could be solicited asap. QE feature is definitely going to be a part of the final delivery and so is the HTML. In my opinion, this fix doesn't have to wait till W4. We need to test QE again after this bug is fixed and that might resurface some new issues which further might take some time to fix. All of this goes in general for anything that is becoming a part of the final delivery. |
@jelmervdl is working on it. |
@jelmervdl Thanks for working on it. I would add one thing. If I remember correctly, the translation models (and perhaps supervised QE models as well) provide scores of the "subwords" and then some post-processing is done to convert the "subword" scores to "word" scores in the engine. We discussed in the beginning with the QE team and @jerinphilip that it is better for the client to receive the quality scores at "word" level so that client doesn't need to do this post-processing. It is more efficient to do it in the engine as it has greater control and access to more information, plus avoiding replicating the same logic for every client. The client was kind of made oblivious to the concept of "subword" when the html translation came inside the engine. Please correct me if I am wrong somewhere. Open for discussion here if keeping it the same way for html text poses a technical challenge 👍🏾 |
@abhi-agg @andrenatal Please take a look at #357 and #358 to express an opinion about API. Of course QE words and HTML tags are unsynchronized i.e. a QE word can span over an HTML open or closing tag. So #358 repeats the QE score tag as necessary to cover the QE word. |
I agree with the reasoning here. In both #357 I effectively moved the In #358 you get this behaviour for free since I assign the same |
#358 looks promising. To carry forward, we need to reach synchronization on the following:
|
|
I found the following underline based proposal, and a few variants in Section 5 relating to QE. From meetings, I think it's important to discuss the following. I think @jelmervdl's render might have to do the thresholding (in C++) that was earlier meant for @abhi-agg to do to get the probability values to classes (Section 5: Major/Critical, Major/Minor/Critical, Poor/Ok). Right now we're exporting real values as data. I'm not sure if the decision rule can be applied in CSS (I've lost track of how capable it has become), @jelmervdl's currently using continuous colouring. There is the alternate option of picking the data from attributes and doing the attaching the class to the element in JS. |
All mock-ups seem to assume plain text (e.g. a form input), not HTML. In that case, what is currently in main will be sufficient. #357 won't hurt, but the offsets weren't broken to begin with in this scenario. To clarify a bit further: #357 just fixes the byte offsets in case the input is marked as HTML. If the input was text, there was no issue. So for form translation (which I assume are always plain text unless you're going to attempt auto translation in #358 adds the quality scores as HTML to HTML output, but this only works if the input was HTML to begin with. Of course, it could also be used with plain text if you'd encode all entities in the text before sending it to the translator (and mark it as HTML). The output can easily be rendered like the mock-ups. Thresholding can easily be done in the extension using Javascript or CSS. The added tags that don't meet the threshold won't affect rendering. You might need to be a bit creative with horizontal padding to fill in the gaps between consecutive words that do meet the threshold though. The way I insert the tags, it will leave the spaces that are not part of the word outside any of the score tags. But that's not insurmountable. |
Written to @abhi-agg and @andrenatal seeking comment on #357 and #358 and am coming to the notion #358 is a better option. Blocked awaiting their feedback. |
I have few queries here:
Am I right? |
@abhi-agg Plain text remains as it was already working. However, what I would recommend as the easiest thing for you is to follow the suggestion in: #355 (comment) specifically: HTML: pick #358 and you get the tags back. Then you don't have to handle byte ranges in either case. |
Using this approach means always setting ResponseOptions::HTML flag to |
I suppose one could always have HTML mode on then implement text/plain translation by encode -> pass as HTML -> decode. That would be less efficient though. For QE, to render the colors, you are presumably doing HTML rendering in all cases anyway, hence the suggestion of having us generate the HTML output even if the input is (escaped) text/plain. |
Little Javascript example of how to encode plain text to HTML for when you want to submit plain text to the engine but want HTML as output: function encodePlainTextAsHTML(text) {
const div = document.createElement('div');
div.appendChild(document.createTextNode(text));
return div.innerHTML;
} And then also set responseOptions.html = true. |
If plaintext (textarea, document) is being translated, I don't think there's a way to annotate it with reds or colors (https://stackoverflow.com/a/12831155/4565794) without converting to HTML first. Suggest taking a look at Grammarly for an example. It's essentially the same problem, highlight poor confidence tokens and leave the others be. The textarea or input field is kept hidden, a |
@jerinphilip I assume that people type in the translation input field, and the quality is shown in a separate translation output field (which isn't editable when the UI is in this state…). Only once the translation is accepted you'd need to convert the translation output field to plain text to be inserted in the actual form field. |
Oh. Thanks for correcting 👍🏾
@jelmervdl With |
@abhi-agg Yep! With |
@jelmervdl Thanks for clarifying. Please correct me if I am wrong but with the approach #358 (comment), extension will still have to parse the html translation result to show color-coded words based on thresholds. |
@abhi-agg How exactly are you showing the color-coded words? Wouldn't the natural way to do that be rendering the HTML with some CSS? |
I've added an example of how the output of #358 can be used to the demo page in that branch. That example treats the input field as plain text, escapes it, and renders the translation output as HTML. Then with a little bit of Javascript for thresholding and CSS I render sentence and word level quality indicators. (Based on thresholds that have no meaning, I just wanted some thresholds that are met often enough for screenshot purposes.) |
Quality scores for HTML translation exposed as <font x-bergamot-sentence-score=""> and <font x-bergamot-word-score=""> tags in the HTML output. While this increases the size of the HTML returned, the resulting rendered HTML can easily be styled to show the scores. With Javascript or CSS, developers can easily have some interface based on these extra attributes. Also includes updates to the test page to show a proof-of-concept demonstration. Fixes: #355
A quick experiment with wasm test page for html text translation shows weird byte ranges for words.
An example:
Looks like, QE ignores tags completely (treating them as if they are non-existent) in the translated sentences and compute the byte ranges of the words in the translated sentences. Is it something that needs to be fixed or am I doing something wrong at my end?
Attaching the image for detailed results.
cc @kpu @jerinphilip @abarbosa94 @felipesantosk @mfomicheva
The text was updated successfully, but these errors were encountered: