Embed quality-scores as HTML tag attributes #358

jelmervdl · 2022-02-17T23:48:38Z

Extended fix for #355.

This is #357 but with the quality scores also exposed as  and  tags in the HTML output. It makes the HTML output itself look pretty horrific, but the resulting rendered HTML can easily be styled to show the scores.

Example: Have you seen my car recently? I lost it.

Italian: Hai visto la mia auto di recente? L'ho perso.

With Javascript or CSS you could pretty easily have some interface based on those extra attributes. Or we could change the attributes to something like style="--x-bergamot-word-score: -0.4" and you can use CSS to directly have it control a background colour or something. But that would be a bit harder to parse with Javascript.

Example:

The bindings still expose the byteranges as they previously did, because otherwise I would also have to expose AnnotatedText etc. which is apparently not something we're doing right now. Because the bindings create a new score vector on the fly, you'll have to delete it when you no longer use it, unfortunately.

Wraps sentences and words in `` tags to annotate them with x-bergamot-sentence-score and x-bergamot-word-score

jelmervdl · 2022-02-18T10:14:07Z

I'm using  here as an element (even though span would be the semantically correct one) because I'd expect nobody in their right mind would have added rules to their stylesheet to style those by default. But I'm open for better suggestions. Also regarding the attribute names.

src/translator/html.cpp

Note that this does not work for HTML as the closing tags of the previous word will be placed before the spaces, so the spaces won't be skipped unfortunately.

# Conflicts: # src/translator/html.cpp # src/translator/html.h

jelmervdl · 2022-02-23T09:46:40Z

I was experimenting a bit with this. If you're planning on making a popup highlight the full word or sentence on hover, i.e. like Google Translate does, it's difficult to find all  tags that together form a word or sentence. They have the same score, but there is no unique ID or anything to find them. That's something that can be added though.

It's not an issue if you just want to highlight good or sentences or words, non-interactively. Then just the scores are sufficient information. All the mock-ups I've seen show this type of output.

src/translator/html.cpp

abhi-agg · 2022-02-23T15:32:01Z

Extended fix for #355.

This is #357 but with the quality scores also exposed as  and  tags in the HTML output.

@jelmervdl Just confirming as I had missed it before. Does this PR build on top of #357? Meaning, apart from exposing quality scores for "word" byte ranges for html text, this PR additionally provides quality scores embedded into the translated result (using font attribute) as well?

jelmervdl · 2022-02-23T15:38:55Z

@abhi-agg yes, when processing a translation request in HTML mode, the scores are available both in the HTML itself as well as through byte offsets. (And when not in HTML mode byte offsets are the only way they're available.)

abhi-agg · 2022-02-23T16:00:42Z

Thanks for the clarification 👍🏾

jerinphilip

As was the general agreement in the plenary, I believe the current proposal (based on HTML) is different but solves the problem better reducing work at the extension.

Leaving the below review based on what I understood from the plenary. These are towards not creating an unnecessary failure mode (broken HTML as byte-range).

wasm/bindings/response_bindings.cpp

wasm/test_page/js/worker.js

src/translator/response.h

src/translator/response.cpp

src/translator/quality_estimator.cpp

…ings

abhi-agg · 2022-02-25T09:39:20Z

As was the general agreement in the plenary, I believe the current proposal (based on HTML) is different but solves the problem better reducing work at the extension.

Correction. There was no final agreement from Mozilla side in the Plenary regarding which of the 2 approaches (#357 or #358) will the extension use. We did say that we are evaluating #358 option as this looks promising.

However, @andrenatal confirmed few hours later via email that the extension will use this approach.

In response to #358 (comment): I agree about the maintenance overhead but even though we have a proof of concept available from @jelmervdl, we could have deferred this removal once QE integration is complete in the extension. However, I am not adamant about it.

@jelmervdl This PR is in draft state. Is it up for review?

jelmervdl · 2022-02-25T09:45:03Z

@abhi-agg I kept it in a draft state waiting for your and @andrenatal's approval to take this approach. It should be ready for review now.

wasm/test_page/js/index.js

abhi-agg · 2022-02-25T13:31:00Z

@jelmervdl I did a quick test and overall it looks good to me. I can approve it after you rebase to latest main and resolve the #358 (comment). Thanks for the work 👍🏾

EDIT:
x-bergamot-sentence-score and x-bergamot-word-score strings are too big. Do we need x- in the beginning of each one of them? Plus, how about getting rid of bergamot all together or using some shorter string for it?

jerinphilip · 2022-02-25T14:25:02Z

x-bergamot-sentence-score and x-bergamot-word-score strings are too big.

Is this a variable naming nit or are there performance implications of this somewhere?

jelmervdl · 2022-02-25T14:30:14Z

x-bergamot-sentence-score and x-bergamot-word-score strings are too big.

What are they too long for? I intentionally made them this verbose to make sure they'd never overlap with any attribute that exists (or will exist) in a website. I think with just sentence-score and word-score that might occur. (It's a common string on Github.) Adding the x-bergamot makes it pretty unique. The x- prefix comes from https://stackoverflow.com/a/17902387 (the SO post summarises it better than the docs do)

abhi-agg · 2022-02-25T16:12:46Z

x-bergamot-sentence-score and x-bergamot-word-score strings are too big.

Is this a variable naming nit or are there performance implications of this somewhere?

What are they too long for? I intentionally made them this verbose to make sure they'd never overlap with any attribute that exists (or will exist) in a website. I think with just sentence-score and word-score that might occur. (It's a common string on Github.) Adding the x-bergamot makes it pretty unique. The x- prefix comes from https://stackoverflow.com/a/17902387 (the SO post summarises it better than the docs do)

An obvious implication is the increased memory footprint which is evident in the example used in the description of this PR. Plus, all of this data will be copied from wasm to JS side during runtime. However, please feel free to ignore it as it was just a suggestion after reading #358 (comment).

I can approve the PR as soon as the rest of the comments are resolved 👍🏾

jelmervdl added 3 commits February 17, 2022 22:16

Fix tests implementation

bce30cf

Embed input sentence and quality scores in translation output

baeb396

Wraps sentences and words in `` tags to annotate them with x-bergamot-sentence-score and x-bergamot-word-score

jelmervdl added the experimental Experimental stuff, might make it in might not label Feb 17, 2022

This was referenced Feb 18, 2022

QE for HTML input doesn't work #355

Closed

Store subword indices for quality scores in Response #357

Closed

jelmervdl linked an issue Feb 18, 2022 that may be closed by this pull request

QE for HTML input doesn't work #355

Closed

jerinphilip reviewed Feb 18, 2022

View reviewed changes

src/translator/html.cpp Outdated Show resolved Hide resolved

jelmervdl self-assigned this Feb 18, 2022

jelmervdl added 6 commits February 21, 2022 10:46

Match tight wrapping around words of original implementation

f760e0f

Note that this does not work for HTML as the closing tags of the previous word will be placed before the spaces, so the spaces won't be skipped unfortunately.

Use SubwordRange -> ByteRange conversion for both bindings and tests

d4d625d

Merge branch 'qe-for-html' into qe-in-html

f3da34c

Add example of sentence and word quality output to test page

922c2d1

Merge branch 'main' into qe-in-html

5392e33

# Conflicts: # src/translator/html.cpp # src/translator/html.h

No need to escape quotes if we're never inserting text into attributes

d888198

jerinphilip reviewed Feb 23, 2022

View reviewed changes

src/translator/html.cpp Outdated Show resolved Hide resolved

jerinphilip reviewed Feb 24, 2022

View reviewed changes

jelmervdl added 3 commits February 25, 2022 09:43

Add sentence and word index to quality score tags

228ca33

Remove byterange based sentence quality score bindings from WASM bind…

e30d080

…ings

Highlight sentence on hover

a416fe3

jerinphilip marked this pull request as ready for review February 25, 2022 09:41

abhi-agg reviewed Feb 25, 2022

View reviewed changes

wasm/test_page/js/index.js Outdated Show resolved Hide resolved

abhi-agg reviewed Feb 25, 2022

View reviewed changes

wasm/test_page/js/index.js Show resolved Hide resolved

Restore \n for batch splitting

06dd4ff

Merge remote-tracking branch 'origin/main' into qe-in-html

6be8902

jerinphilip approved these changes Feb 25, 2022

View reviewed changes

abhi-agg self-requested a review February 25, 2022 17:07

abhi-agg approved these changes Feb 25, 2022

View reviewed changes

jerinphilip changed the title ~~Return quality scores in HTML translation~~ Embed quality-scores as HTML tag attributes Feb 25, 2022

jerinphilip merged commit fe3f398 into main Feb 25, 2022

jerinphilip deleted the qe-in-html branch February 25, 2022 22:01

This was referenced Mar 2, 2022

Integrate Basic Quality Estimation mozilla/firefox-translations#132

Closed

[meta] Implement Basic Quality Estimation mozilla/firefox-translations#26

Closed

Bump version to 0.4.2 #371

Merged

abhi-agg mentioned this pull request Mar 11, 2022

Integrate Basic QE feature mozilla/firefox-translations#144

Merged

jerinphilip mentioned this pull request Mar 23, 2022

JS: Fix swap button on test-page #388

Merged

abhi-agg mentioned this pull request Mar 25, 2022

Show colors for in-page translation from the quality scores returned by the engine mozilla/firefox-translations#179

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Embed quality-scores as HTML tag attributes #358

Embed quality-scores as HTML tag attributes #358

jelmervdl commented Feb 17, 2022

jelmervdl commented Feb 18, 2022

jelmervdl commented Feb 23, 2022 •

edited

Loading

abhi-agg commented Feb 23, 2022

jelmervdl commented Feb 23, 2022

abhi-agg commented Feb 23, 2022

jerinphilip left a comment

abhi-agg commented Feb 25, 2022 •

edited

Loading

jelmervdl commented Feb 25, 2022

abhi-agg commented Feb 25, 2022 •

edited

Loading

jerinphilip commented Feb 25, 2022

jelmervdl commented Feb 25, 2022

abhi-agg commented Feb 25, 2022 •

edited

Loading

Embed quality-scores as HTML tag attributes #358

Embed quality-scores as HTML tag attributes #358

Conversation

jelmervdl commented Feb 17, 2022

jelmervdl commented Feb 18, 2022

jelmervdl commented Feb 23, 2022 • edited Loading

abhi-agg commented Feb 23, 2022

jelmervdl commented Feb 23, 2022

abhi-agg commented Feb 23, 2022

jerinphilip left a comment

Choose a reason for hiding this comment

abhi-agg commented Feb 25, 2022 • edited Loading

jelmervdl commented Feb 25, 2022

abhi-agg commented Feb 25, 2022 • edited Loading

jerinphilip commented Feb 25, 2022

jelmervdl commented Feb 25, 2022

abhi-agg commented Feb 25, 2022 • edited Loading

jelmervdl commented Feb 23, 2022 •

edited

Loading

abhi-agg commented Feb 25, 2022 •

edited

Loading

abhi-agg commented Feb 25, 2022 •

edited

Loading

abhi-agg commented Feb 25, 2022 •

edited

Loading