-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Embed quality-scores as HTML tag attributes #358
Conversation
The bindings still expose the byteranges as they previously did, because otherwise I would also have to expose AnnotatedText etc. which is apparently not something we're doing right now. Because the bindings create a new score vector on the fly, you'll have to delete it when you no longer use it, unfortunately.
Wraps sentences and words in `<font>` tags to annotate them with x-bergamot-sentence-score and x-bergamot-word-score
I'm using |
Note that this does not work for HTML as the closing tags of the previous word will be placed before the spaces, so the spaces won't be skipped unfortunately.
# Conflicts: # src/translator/html.cpp # src/translator/html.h
I was experimenting a bit with this. If you're planning on making a popup highlight the full word or sentence on hover, i.e. like Google Translate does, it's difficult to find all It's not an issue if you just want to highlight good or sentences or words, non-interactively. Then just the scores are sufficient information. All the mock-ups I've seen show this type of output. |
@jelmervdl Just confirming as I had missed it before. Does this PR build on top of #357? Meaning, apart from exposing quality scores for "word" byte ranges for html text, this PR additionally provides quality scores embedded into the translated result (using font attribute) as well? |
@abhi-agg yes, when processing a translation request in HTML mode, the scores are available both in the HTML itself as well as through byte offsets. (And when not in HTML mode byte offsets are the only way they're available.) |
Thanks for the clarification 👍🏾 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As was the general agreement in the plenary, I believe the current proposal (based on HTML) is different but solves the problem better reducing work at the extension.
Leaving the below review based on what I understood from the plenary. These are towards not creating an unnecessary failure mode (broken HTML as byte-range).
Correction. There was no final agreement from Mozilla side in the Plenary regarding which of the 2 approaches (#357 or #358) will the extension use. We did say that we are evaluating #358 option as this looks promising. However, @andrenatal confirmed few hours later via email that the extension will use this approach. In response to #358 (comment): I agree about the maintenance overhead but even though we have a proof of concept available from @jelmervdl, we could have deferred this removal once QE integration is complete in the extension. However, I am not adamant about it. @jelmervdl This PR is in draft state. Is it up for review? |
@abhi-agg I kept it in a draft state waiting for your and @andrenatal's approval to take this approach. It should be ready for review now. |
@jelmervdl I did a quick test and overall it looks good to me. I can approve it after you rebase to latest main and resolve the #358 (comment). Thanks for the work 👍🏾 EDIT: |
Is this a variable naming nit or are there performance implications of this somewhere? |
What are they too long for? I intentionally made them this verbose to make sure they'd never overlap with any attribute that exists (or will exist) in a website. I think with just |
An obvious implication is the increased memory footprint which is evident in the example used in the description of this PR. Plus, all of this data will be copied from wasm to JS side during runtime. However, please feel free to ignore it as it was just a suggestion after reading #358 (comment). I can approve the PR as soon as the rest of the comments are resolved 👍🏾 |
Extended fix for #355.
This is #357 but with the quality scores also exposed as
<font x-bergamot-sentence-score="">
and<font x-bergamot-word-score="">
tags in the HTML output. It makes the HTML output itself look pretty horrific, but the resulting rendered HTML can easily be styled to show the scores.Example:
Have you seen my <strong>car</strong> recently? I lost it.
Italian:
<font x-bergamot-sentence-score="-0.112859"><font x-bergamot-word-score="-0.273196">Hai</font> <font x-bergamot-word-score="-0.0245442">visto</font> <font x-bergamot-word-score="-0.193084">la</font> <font x-bergamot-word-score="-0.00968757">mia</font></font> <strong><font x-bergamot-sentence-score="-0.112859"><font x-bergamot-word-score="-0.131326">auto</font></font></strong> <font x-bergamot-sentence-score="-0.112859"><font x-bergamot-word-score="-0.157974">di</font> <font x-bergamot-word-score="-0.000199795">recente?</font></font> <font x-bergamot-sentence-score="-0.18783"><font x-bergamot-word-score="-0.224375">L'ho</font> <font x-bergamot-word-score="-0.151286">perso.</font></font>
With Javascript or CSS you could pretty easily have some interface based on those extra attributes. Or we could change the attributes to something like
style="--x-bergamot-word-score: -0.4"
and you can use CSS to directly have it control a background colour or something. But that would be a bit harder to parse with Javascript.Example: