Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/replace-get-highlighted-text #313

Open
wants to merge 42 commits into
base: v4
Choose a base branch
from

Conversation

RobHelgeson
Copy link

This pr is completely experimental and if it never goes further than a couple of people taking a look, I'm all good with that :)

(This pr also includes all changes in #308. if / when that has merged this is a very small pr.

What is the current behavior?

get_highlighted_text is currently implemented with some nice hand written looping methods. It does however have a small issue with integrating the morph highlighting and html rubies, as outlined in the docs: https://mortii.github.io/anki-morphs/user_guide/known-problems.html under the heading Ruby characters (furigana, etc.) are displayed wrong in am-highlighted

What is the new behavior?

the get_highlighted_text process has been completely rewritten to address this issue.

Contrasting the implementations:

  • This one takes more bytes to get the job done. -- we need to work ruby and span tags together so there are more bytes
  • This one allows correct display of rubies in languages that use them.
  • Perf testing indicates that this impl is faster than main:

46.7K cards SpaCy japanese large model (Mac M1)

New:
Recalc duration: 18.992 seconds
Recalc duration: 14.782 seconds
Recalc duration: 14.563 seconds
Recalc duration: 14.792 seconds
Recalc duration: 14.476 seconds

Main:
Recalc duration: 19.884 seconds
Recalc duration: 15.912 seconds
Recalc duration: 17.181 seconds
Recalc duration: 17.442 seconds
Recalc duration: 16.230 seconds

What kind of changes does this PR introduce?

  • Bug fix (non-breaking change which fixes an issue)
  • [] New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Checklist:

  • My code successfully passes pre-commit
    DOES NOT PASS TESTS THAT USE THE TEST DECKS THEY NEED TO BE REGENERATED AND I DONT KNOW HOW -- I fully anticipate that once regenerated, they'll all pass.
    I have written tests for all of the new code, based on the tests for the old code. all of these pass.
  • I have commented my code, particularly in hard-to-understand areas
  • I am willing to help create documentation/guides for the changes made in this PR
  • This PR can be rebased onto main without conflicts (create a backup branch before attempting a rebase)
  • I have squashed my commits into sensible portions

@mortii
Copy link
Owner

mortii commented Dec 12, 2024

When evaluating morphs based on lemmas, it looks like only the lemmas are wrapped, and not the whole word like the current implementation, is that intended?

main:
Screenshot from 2024-12-12 18-36-56

pr:
Screenshot from 2024-12-12 18-37-46

Perf testing indicates that this impl is faster than main

amazing

@mortii
Copy link
Owner

mortii commented Jan 4, 2025

nr. 7 is a essentially 5 + 6. Since 6 -> undefined, then 7 -> undefined.

@RobHelgeson
Copy link
Author

Worked on the different filters.

This template results in ....

<div class="Reading ruby">{{am-highlight:Reading}}</div>
<div class="Reading ruby">{{am-highlight-kanji:kanji:Reading}}</div>
<div class="Reading ruby">{{am-highlight-kana:kana:Reading}}</div>
<div class="Reading ruby">{{am-highlight-furigana:furigana:Reading}}</div>

<div class="Reading ruby">{{^am-highlighted}}{{am-highlight:furigana:Reading}}{{/am-highlighted}}{{#am-highlighted}}{{furigana:am-highlighted}}{{/am-highlighted}}</div>

... this output ...

image

I think it works well. I'm interested in your thoughts on the impl, since I ship styles to the template to accomplish it. This allows the user to add any local overrides they want in their styles.

The first should be exactly equvelent to the path that the bulk processor outputs, in case they want that.
The rest should be self explanatory. (do note however that we precede each with the built-in, just in case AM is not available.

The final one is just there to show how I use it on my cards -- If there is no data in the am-highlighted field, use the jit processor.

let's keep these active for the remainder of this pr
Comment on lines 70 to 73
return (
f"<span class='{filter_name}'>"
+ styles
+ " "
Copy link
Owner

@mortii mortii Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you can inline styles like this... It doesn't work for me at least.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Odd, this worked for me.

I must have some other things allowing it to work. I'll double check.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you send me either a screenshot, or the html, produced by (something like):

<div>{{am-highlight:Reading}}</div>
<div>{{am-highlight-furigana:Reading}}</div>
<div>{{am-highlight-kana:Reading}}</div>
<div>{{am-highlight-kanji:Reading}}</div>

I'm trying to break it over here but it looks like it works for me.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot from 2025-01-07 17-07-50

Does the font size change if you do this?

    if filter_name == "am-highlight-kana":
        return """<style>
.am-highlight-kana {
    & ruby {
        display: inline-block;
        visibility: hidden;

        & rt {
            font-size: 999;
            margin-top: -1em;
            visibility: visible;
        }
    }
}
</style>"""

do you have a css-injector add-on activated?

Copy link
Author

@RobHelgeson RobHelgeson Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

font-size: 999; the 999 needs a measurement like rem, px or %.

Edit: yes changing font size does affect the output.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

font-size: 999; the 999 needs a measurement like rem, px or %

right, does a valid change like that work for you?

Copy link
Owner

@mortii mortii Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the same result if I remove the styles:

    highlighted_jit_text = (
        f"testinggg<span class='{filter_name}'>"
        + " "
        + text_highlighting.get_highlighted_text(
            am_config=am_config,
            morphemes=card_morphs,
            text=_dehtml(field_text),
            use_html_rubies=filter_name != "am-highlight",
        )
        + "</span>"
    )

Screenshot from 2025-01-07 17-31-55

If i change the font size nothing happens.

Copy link
Author

@RobHelgeson RobHelgeson Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😮

that's... odd.

is it possible that you may have some of the css that we were experimenting with before, still active somewhere?


I'll attach some screenshots, if you can try this, we can see where your styles are coming from vs where mine are coming from.

grab: AnkiWebView Inspector https://ankiweb.net/shared/info/31746032

inspect kana element by right clicking and inspecting:
image

click Computed in the right panel of the inspector:
image

click the circled arrow icon on the property you want to see where it "came from" (I cannot screenshot the circled icon because it's on mouseover only, but its a grey circle next to the value of the property):
image

this will take you to the styles tab, where you can see what styles are loaded and applied:
image

and by clicking on the top right <style> here, it will highlight where that style got loaded in.
image

for me, that is this block:
image

I know that's a HUGE pain in the butt. :D but I don't know why this would not be working for you but is for me (so far)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, I realize why.

When using {{am-highlight-kana:kana:Front}} I don't get any ruby/rt elements at all:

Screenshot from 2025-01-07 19-23-58

but I do get them when only using {{am-highlight-kana:Front}}
Screenshot from 2025-01-07 19-26-52

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using {{am-highlight-kana:kana:Front}} I don't get any ruby/rt elements at all:

aha, this makes sense. The built-in filter removes all kanji from the string before we even "see it" so we process a "kana only" string.

This has the negative side effect of not allowing us to use {{am-highlight-kana:kana:Front}} for backward compatibility (i.e. when on mobile).

The built in furigana and kanji filters will still work because the morphs match on kanji (most of the time) with kana readings removed from the parsed string.

I can't speak for you the project, but I'll suggest that having that as a known issue may be the appropriate action, with the recommendation to either:

  • set your card with {{am-highlight-kana:furigana:Front}} and have kana locally and furigana on mobile.
  • set your card template to be "mobile aware" and use {{am-highlight-kana:Front}} locally and {{kana:Front}} on mobile (I do not know how to do this at the moment)

@mortii
Copy link
Owner

mortii commented Jan 8, 2025

I think the current kana/kanji approach is a little off right now. If we emulate the native filters i.e.:

2. Furigana

Converts brackets into html (current implementation), used by am-highlight-furigana:

3. Kanji

Strips the brackets and its content, used by am-highlight-kanji

4. Kana

Strips the base, just leave the ruby text, used by am-highlight-kana:

Originally posted by @mortii in #313 (comment)

then we wouldn't need any special stylings, because there wouldn't be any ruby elements to manipulate in place.

Could we solve this by making {Kanji|Kana}RubyRanges that inject something different from HtmlRubyRange?

@RobHelgeson
Copy link
Author

RobHelgeson commented Jan 8, 2025

Could we solve this by making {Kanji|Kana}RubyRanges that inject something different from HtmlRubyRange?

not exactly. it would not solve the issue at hand. (but agree that it would be a cleaner implementation).

If the user chooses {{am-highlight-kana:kana:Front}}, so that they get kana on desktop and kana on mobile, am-highlight-kana will not have access to the string with kanji, only the post-processed string, after the built-in kana filter. am-highlight-kana would "technically work" but I think not be what the user expected, because morphs in the database are built from the kanji-based string, so found morphs would be hit or miss.

So as it stands now, all filters work as expected by themselves (I do agree with your point that we could just fully pre-process them on the "back end" and not have to mess with styling on the "front end"). And all filters except kana work stacked together for mobile and desktop support. But both of these solutions will suffer from the "kana does not work on mobile" issue.

This is why I recommend taking it as a "known issue". We could also consider pulling am-highlight-kana out.

@RobHelgeson
Copy link
Author

just sayin..... this issue makes me want to understand why build in filters cannot run after custom ones, and raise a PR against Anki itself. that would make this SO easy.....

@mortii
Copy link
Owner

mortii commented Jan 8, 2025

If the user chooses {{am-highlight-kana:kana:Front}}, so that they get kana on desktop and kana on mobile, am-highlight-kana will not have access to the string with kanji, only the post-processed string, after the built-in kana filter. am-highlight-kana would "technically work" but I think not be what the user expected, because morphs in the database are built from the kanji-based string, so found morphs would be hit or miss.

Ah, that's true, I didn't even consider that... I was just focused on the problem of injecting styles.

So as it stands now, all filters work as expected by themselves (I do agree with your point that we could just fully pre-process them on the "back end" and not have to mess with styling on the "front end").

Yeah, injecting styles would be a completely hidden side effect, so we would either have to complicate the guide by explaining it, or risk users encountering very mysterious bugs. So I think this is a worthwhile change even though it doesn't fix the filter chaining issue.

This is why I recommend taking it as a "known issue". We could also consider pulling am-highlight-kana out.

I agree, let's note it as a known problem. Yesterday I found myself being overly reliant on kanji for understanding some words, and I realized why a kana filter could be very useful, so I definitely don't want to drop it completely.

just sayin..... this issue makes me want to understand why build in filters cannot run after custom ones, and raise a PR against Anki itself. that would make this SO easy.....

Yeah, it does feel somewhat arbitrary...

@RobHelgeson
Copy link
Author

Could we solve this by making {Kanji|Kana}RubyRanges that inject something different from HtmlRubyRange?

I have an implementation for this, I'll do some manual testing tonight and then start on unit tests.

Comment on lines +52 to +104
class FuriganaRubyRange(RubyRange):
"""Represents an html ruby and its range in parent string."""

def open(self) -> str:
return "<ruby>"

def close(self) -> str:
return "</ruby>"

def rt(self) -> str:
return f"<rt>{self.ruby}</rt>"


class KanjiRubyRange(RubyRange):
"""Represents a kanji ruby and its range in parent string."""

def open(self) -> str:
return ""

def close(self) -> str:
return ""

def rt(self) -> str:
return ""


class KanaRubyRange(RubyRange):
"""Represents a kana ruby and its range in parent string."""

def open(self) -> str:
return ""

def close(self) -> str:
return ""

def rt(self) -> str:
return ""

def __str__(self) -> str:
return self.ruby


class TextRubyRange(RubyRange):
"""Represents a text ruby and its range in parent string."""

def open(self) -> str:
return " "

def close(self) -> str:
return ""

def rt(self) -> str:
return f"[{self.ruby}]"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New subclasses for each formatting type.

debug_utils.dev_print(
"Scenario 6: The status is completely inside the ruby."
)
if isinstance(ruby, FuriganaRubyRange):
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now furigana is the "special case" and all others are text-like. (i.e. they do not need special processing for <ruby> html tags.

span_elements.append(
SpanElement(morph_match.group(), morph_status, start_index, end_index)
)
ruby_range_type: type[RubyRange] = TextRubyRange
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default to text, incase we get garbage input.

) -> None:
am_config = AnkiMorphsConfig()
highlighted_text: str = text_highlighting.get_highlighted_text(
am_config, card_morphs, input_text
am_config, card_morphs, input_text, ruby_type
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added test cases such that all tests are tested against all 4 formatting options. (even non-Japanese, just to ensure no regressions there!)

@RobHelgeson RobHelgeson requested a review from mortii January 13, 2025 14:13
@RobHelgeson
Copy link
Author

With the exception of the debug flag that I left on (and some vulture errors re: some of the debugging utils). I think this is ready for your review (again ;) )

@mortii
Copy link
Owner

mortii commented Jan 13, 2025

I'm sick so it might take a couple of days 🙏

@RobHelgeson
Copy link
Author

I'm sick so it might take a couple of days 🙏

ugh!! Take your time! Feel better soon! 🤞🏻

Comment on lines +129 to +137
##############################################################################################
# CASE: Morph and ruby interaction
##############################################################################################
# This third example sets all the cases where morphs and rubies coexist.
# |-mmm---| |--mmm--| |---mmm-| |-mmm---| |-mmmmm-| |-mmmmm-| |--mmm--| |--m-m--|
# |----rrr| |--rrr--| |--rrr--| |--rrr--| |--rrr--| |--r-r--| |-rrrrr-| |-rrrrr-|
# Collection choice is arbitrary.
# Database choice is arbitrary.
##############################################################################################
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only intended to test the 7 path variations right? I just want to make absolutely sure before I update these tests since it could be interpreted that the morph and ruby length has significance, i.e.

|-mmmmm-| != |-mmmm-| != |-mmmmmm-|
|--rrr--|    |--rr--|    |---rr---|

I want to move away from this particular abstraction because of its ambiguity--iq tests generally don't make for good documentation, hehe.

Copy link
Author

@RobHelgeson RobHelgeson Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. This is a set of synthetic test cases that provide full coverage of all 7 paths.

the only relevant pieces of information in the ascii "diagrams" are stop and start positions of morphs and rubies in the same string. i.e. the lengths themselves are not relevant

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants