Feature/replace-get-highlighted-text #313

RobHelgeson · 2024-12-11T18:50:52Z

This pr is completely experimental and if it never goes further than a couple of people taking a look, I'm all good with that :)

(This pr also includes all changes in #308. if / when that has merged this is a very small pr.

What is the current behavior?

get_highlighted_text is currently implemented with some nice hand written looping methods. It does however have a small issue with integrating the morph highlighting and html rubies, as outlined in the docs: https://mortii.github.io/anki-morphs/user_guide/known-problems.html under the heading Ruby characters (furigana, etc.) are displayed wrong in am-highlighted

What is the new behavior?

the get_highlighted_text process has been completely rewritten to address this issue.

Contrasting the implementations:

This one takes more bytes to get the job done. -- we need to work ruby and span tags together so there are more bytes
This one allows correct display of rubies in languages that use them.
Perf testing indicates that this impl is faster than main:

46.7K cards SpaCy japanese large model (Mac M1)

New:
Recalc duration: 18.992 seconds
Recalc duration: 14.782 seconds
Recalc duration: 14.563 seconds
Recalc duration: 14.792 seconds
Recalc duration: 14.476 seconds

Main:
Recalc duration: 19.884 seconds
Recalc duration: 15.912 seconds
Recalc duration: 17.181 seconds
Recalc duration: 17.442 seconds
Recalc duration: 16.230 seconds

What kind of changes does this PR introduce?

Bug fix (non-breaking change which fixes an issue)
[] New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Checklist:

My code successfully passes pre-commit
DOES NOT PASS TESTS THAT USE THE TEST DECKS THEY NEED TO BE REGENERATED AND I DONT KNOW HOW -- I fully anticipate that once regenerated, they'll all pass.
I have written tests for all of the new code, based on the tests for the old code. all of these pass.
I have commented my code, particularly in hard-to-understand areas
I am willing to help create documentation/guides for the changes made in this PR
This PR can be rebased onto main without conflicts (create a backup branch before attempting a rebase)
I have squashed my commits into sensible portions

…filters

… read flag

mortii · 2024-12-12T17:49:07Z

When evaluating morphs based on lemmas, it looks like only the lemmas are wrapped, and not the whole word like the current implementation, is that intended?

main:

pr:

Perf testing indicates that this impl is faster than main

amazing

ankimorphs/text_highlighting.py

ankimorphs/highlight_morphs_jit.py

ankimorphs/text_highlighting.py

mortii · 2025-01-04T15:14:32Z

nr. 7 is a essentially 5 + 6. Since 6 -> undefined, then 7 -> undefined.

RobHelgeson · 2025-01-06T15:23:25Z

Worked on the different filters.

This template results in ....

<div class="Reading ruby">{{am-highlight:Reading}}</div>
<div class="Reading ruby">{{am-highlight-kanji:kanji:Reading}}</div>
<div class="Reading ruby">{{am-highlight-kana:kana:Reading}}</div>
<div class="Reading ruby">{{am-highlight-furigana:furigana:Reading}}</div>

<div class="Reading ruby">{{^am-highlighted}}{{am-highlight:furigana:Reading}}{{/am-highlighted}}{{#am-highlighted}}{{furigana:am-highlighted}}{{/am-highlighted}}</div>

... this output ...

I think it works well. I'm interested in your thoughts on the impl, since I ship styles to the template to accomplish it. This allows the user to add any local overrides they want in their styles.

The first should be exactly equvelent to the path that the bulk processor outputs, in case they want that.
The rest should be self explanatory. (do note however that we precede each with the built-in, just in case AM is not available.

The final one is just there to show how I use it on my cards -- If there is no data in the am-highlighted field, use the jit processor.

let's keep these active for the remainder of this pr

mortii · 2025-01-07T15:30:16Z

ankimorphs/highlight_morphs_jit.py

+    return (
+        f"<span class='{filter_name}'>"
+        + styles
+        + " "


I don't think you can inline styles like this... It doesn't work for me at least.

Odd, this worked for me.

I must have some other things allowing it to work. I'll double check.

Can you send me either a screenshot, or the html, produced by (something like):

<div>{{am-highlight:Reading}}</div> <div>{{am-highlight-furigana:Reading}}</div> <div>{{am-highlight-kana:Reading}}</div> <div>{{am-highlight-kanji:Reading}}</div>

I'm trying to break it over here but it looks like it works for me.

Does the font size change if you do this?

if filter_name == "am-highlight-kana": return """<style> .am-highlight-kana { & ruby { display: inline-block; visibility: hidden; & rt { font-size: 999; margin-top: -1em; visibility: visible; } } } </style>"""

do you have a css-injector add-on activated?

font-size: 999; the 999 needs a measurement like rem, px or %.

Edit: yes changing font size does affect the output.

font-size: 999; the 999 needs a measurement like rem, px or %

right, does a valid change like that work for you?

I get the same result if I remove the styles:

highlighted_jit_text = ( f"testinggg<span class='{filter_name}'>" + " " + text_highlighting.get_highlighted_text( am_config=am_config, morphemes=card_morphs, text=_dehtml(field_text), use_html_rubies=filter_name != "am-highlight", ) + "</span>" )

If i change the font size nothing happens.

😮

that's... odd.

is it possible that you may have some of the css that we were experimenting with before, still active somewhere?

I'll attach some screenshots, if you can try this, we can see where your styles are coming from vs where mine are coming from.

grab: AnkiWebView Inspector https://ankiweb.net/shared/info/31746032

inspect kana element by right clicking and inspecting:

click Computed in the right panel of the inspector:

click the circled arrow icon on the property you want to see where it "came from" (I cannot screenshot the circled icon because it's on mouseover only, but its a grey circle next to the value of the property):

this will take you to the styles tab, where you can see what styles are loaded and applied:

and by clicking on the top right <style> here, it will highlight where that style got loaded in.

for me, that is this block:

I know that's a HUGE pain in the butt. :D but I don't know why this would not be working for you but is for me (so far)

Aha, I realize why.

When using {{am-highlight-kana:kana:Front}} I don't get any ruby/rt elements at all:

but I do get them when only using {{am-highlight-kana:Front}}

When using {{am-highlight-kana:kana:Front}} I don't get any ruby/rt elements at all:

aha, this makes sense. The built-in filter removes all kanji from the string before we even "see it" so we process a "kana only" string.

This has the negative side effect of not allowing us to use {{am-highlight-kana:kana:Front}} for backward compatibility (i.e. when on mobile).

The built in furigana and kanji filters will still work because the morphs match on kanji (most of the time) with kana readings removed from the parsed string.

I can't speak for you the project, but I'll suggest that having that as a known issue may be the appropriate action, with the recommendation to either:

set your card with {{am-highlight-kana:furigana:Front}} and have kana locally and furigana on mobile.

set your card template to be "mobile aware" and use {{am-highlight-kana:Front}} locally and {{kana:Front}} on mobile (I do not know how to do this at the moment)

mortii · 2025-01-08T17:28:20Z

I think the current kana/kanji approach is a little off right now. If we emulate the native filters i.e.:

2. Furigana

Converts brackets into html (current implementation), used by am-highlight-furigana:

3. Kanji

Strips the brackets and its content, used by am-highlight-kanji

4. Kana

Strips the base, just leave the ruby text, used by am-highlight-kana:

Originally posted by @mortii in #313 (comment)

then we wouldn't need any special stylings, because there wouldn't be any ruby elements to manipulate in place.

Could we solve this by making {Kanji|Kana}RubyRanges that inject something different from HtmlRubyRange?

RobHelgeson · 2025-01-08T17:48:40Z

Could we solve this by making {Kanji|Kana}RubyRanges that inject something different from HtmlRubyRange?

not exactly. it would not solve the issue at hand. (but agree that it would be a cleaner implementation).

If the user chooses {{am-highlight-kana:kana:Front}}, so that they get kana on desktop and kana on mobile, am-highlight-kana will not have access to the string with kanji, only the post-processed string, after the built-in kana filter. am-highlight-kana would "technically work" but I think not be what the user expected, because morphs in the database are built from the kanji-based string, so found morphs would be hit or miss.

So as it stands now, all filters work as expected by themselves (I do agree with your point that we could just fully pre-process them on the "back end" and not have to mess with styling on the "front end"). And all filters except kana work stacked together for mobile and desktop support. But both of these solutions will suffer from the "kana does not work on mobile" issue.

This is why I recommend taking it as a "known issue". We could also consider pulling am-highlight-kana out.

RobHelgeson · 2025-01-08T17:50:45Z

just sayin..... this issue makes me want to understand why build in filters cannot run after custom ones, and raise a PR against Anki itself. that would make this SO easy.....

mortii · 2025-01-08T19:14:52Z

If the user chooses {{am-highlight-kana:kana:Front}}, so that they get kana on desktop and kana on mobile, am-highlight-kana will not have access to the string with kanji, only the post-processed string, after the built-in kana filter. am-highlight-kana would "technically work" but I think not be what the user expected, because morphs in the database are built from the kanji-based string, so found morphs would be hit or miss.

Ah, that's true, I didn't even consider that... I was just focused on the problem of injecting styles.

So as it stands now, all filters work as expected by themselves (I do agree with your point that we could just fully pre-process them on the "back end" and not have to mess with styling on the "front end").

Yeah, injecting styles would be a completely hidden side effect, so we would either have to complicate the guide by explaining it, or risk users encountering very mysterious bugs. So I think this is a worthwhile change even though it doesn't fix the filter chaining issue.

This is why I recommend taking it as a "known issue". We could also consider pulling am-highlight-kana out.

I agree, let's note it as a known problem. Yesterday I found myself being overly reliant on kanji for understanding some words, and I realized why a kana filter could be very useful, so I definitely don't want to drop it completely.

just sayin..... this issue makes me want to understand why build in filters cannot run after custom ones, and raise a PR against Anki itself. that would make this SO easy.....

Yeah, it does feel somewhat arbitrary...

RobHelgeson · 2025-01-10T14:21:02Z

Could we solve this by making {Kanji|Kana}RubyRanges that inject something different from HtmlRubyRange?

I have an implementation for this, I'll do some manual testing tonight and then start on unit tests.

RobHelgeson · 2025-01-13T14:08:28Z

ankimorphs/text_highlighting.py

+class FuriganaRubyRange(RubyRange):
+    """Represents an html ruby and its range in parent string."""
+
+    def open(self) -> str:
+        return "<ruby>"
+
+    def close(self) -> str:
+        return "</ruby>"
+
+    def rt(self) -> str:
+        return f"<rt>{self.ruby}</rt>"
+
+
+class KanjiRubyRange(RubyRange):
+    """Represents a kanji ruby and its range in parent string."""
+
+    def open(self) -> str:
+        return ""
+
+    def close(self) -> str:
+        return ""
+
+    def rt(self) -> str:
+        return ""
+
+
+class KanaRubyRange(RubyRange):
+    """Represents a kana ruby and its range in parent string."""
+
+    def open(self) -> str:
+        return ""
+
+    def close(self) -> str:
+        return ""
+
+    def rt(self) -> str:
+        return ""
+
+    def __str__(self) -> str:
+        return self.ruby
+
+
+class TextRubyRange(RubyRange):
+    """Represents a text ruby and its range in parent string."""
+
+    def open(self) -> str:
+        return " "
+
+    def close(self) -> str:
+        return ""
+
+    def rt(self) -> str:
+        return f"[{self.ruby}]"


New subclasses for each formatting type.

RobHelgeson · 2025-01-13T14:09:39Z

ankimorphs/text_highlighting.py

+                debug_utils.dev_print(
+                    "Scenario 6: The status is completely inside the ruby."
+                )
+                if isinstance(ruby, FuriganaRubyRange):


now furigana is the "special case" and all others are text-like. (i.e. they do not need special processing for <ruby> html tags.

RobHelgeson · 2025-01-13T14:10:30Z

ankimorphs/text_highlighting.py

-            span_elements.append(
-                SpanElement(morph_match.group(), morph_status, start_index, end_index)
-            )
+    ruby_range_type: type[RubyRange] = TextRubyRange


Default to text, incase we get garbage input.

RobHelgeson · 2025-01-13T14:12:10Z

test/tests/text_highlighting_test.py

 ) -> None:
    am_config = AnkiMorphsConfig()
    highlighted_text: str = text_highlighting.get_highlighted_text(
-        am_config, card_morphs, input_text
+        am_config, card_morphs, input_text, ruby_type


Added test cases such that all tests are tested against all 4 formatting options. (even non-Japanese, just to ensure no regressions there!)

RobHelgeson · 2025-01-13T14:14:33Z

With the exception of the debug flag that I left on (and some vulture errors re: some of the debugging utils). I think this is ready for your review (again ;) )

mortii · 2025-01-13T18:30:24Z

I'm sick so it might take a couple of days 🙏

RobHelgeson · 2025-01-13T18:33:56Z

I'm sick so it might take a couple of days 🙏

ugh!! Take your time! Feel better soon! 🤞🏻

mortii · 2025-01-16T13:07:52Z

test/tests/text_highlighting_test.py

+##############################################################################################
+#                                    CASE: Morph and ruby interaction
+##############################################################################################
+# This third example sets all the cases where morphs and rubies coexist.
+# |-mmm---| |--mmm--| |---mmm-| |-mmm---| |-mmmmm-| |-mmmmm-| |--mmm--| |--m-m--|
+# |----rrr| |--rrr--| |--rrr--| |--rrr--| |--rrr--| |--r-r--| |-rrrrr-| |-rrrrr-|
+# Collection choice is arbitrary.
+# Database choice is arbitrary.
+##############################################################################################


This is only intended to test the 7 path variations right? I just want to make absolutely sure before I update these tests since it could be interpreted that the morph and ruby length has significance, i.e.

|-mmmmm-| != |-mmmm-| != |-mmmmmm-| |--rrr--| |--rr--| |---rr---|

I want to move away from this particular abstraction because of its ambiguity--iq tests generally don't make for good documentation, hehe.

yes. This is a set of synthetic test cases that provide full coverage of all 7 paths.

the only relevant pieces of information in the ascii "diagrams" are stop and start positions of morphs and rubies in the same string. i.e. the lengths themselves are not relevant

RobHelgeson added 29 commits November 27, 2024 09:08

Add JIT highlighting of morphs

663b04e

Add Rubification of sting - built in filters cannot run after custom …

05ff0b6

…filters

Allow filter to run after built-in filter

2df17f9

Add documentation

9056db7

Fix am-highlight-morphs parse html issue

d84799b

Allow cards where you want to use am-highlight-morphs to not have the…

1117111

… read flag

Address PR comments

3401040

Update documentation

bd73b95

Update documentation

d2e2cf4

Updates for code review

0e2573a

Fix bad morph highlighting in ruby html

fdeb76e

Fix bad morph highlighting in ruby html - WIP

832a255

experemental

b23ebd8

Reimpl of highlight text using regexes.

005f2dd

Update highlight comments.

eb34b3a

Update documentation.

ea14846

WIP - WIP

6072576

Update jit formatter with changes from experimental branch

fb5ce61

Checkpoint commit - WIP

7189712

Checkpoint speeeeedy! - WIP

7a3ae5f

Checkpoint - WIP

f7841ee

Checkpoint - WIP

c8e8883

Checkpoint - WIP

994355f

Checkpoint - WIP

fdc6a07

Checkpoint - feature complete

c16ba70

Add JIT highlighting

78aee2e

Documentation and dead code

eb57cba

Cleanup

f2b702f

Replace get_highlighted_text

47f7c14