-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/replace-get-highlighted-text #313
base: v4
Are you sure you want to change the base?
Feature/replace-get-highlighted-text #313
Conversation
nr. 7 is a essentially 5 + 6. Since 6 -> undefined, then 7 -> undefined. |
Worked on the different filters. This template results in ....
... this output ... I think it works well. I'm interested in your thoughts on the impl, since I ship styles to the template to accomplish it. This allows the user to add any local overrides they want in their styles. The first should be exactly equvelent to the path that the bulk processor outputs, in case they want that. The final one is just there to show how I use it on my cards -- If there is no data in the am-highlighted field, use the jit processor. |
let's keep these active for the remainder of this pr
ankimorphs/highlight_morphs_jit.py
Outdated
return ( | ||
f"<span class='{filter_name}'>" | ||
+ styles | ||
+ " " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you can inline styles like this... It doesn't work for me at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Odd, this worked for me.
I must have some other things allowing it to work. I'll double check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you send me either a screenshot, or the html, produced by (something like):
<div>{{am-highlight:Reading}}</div>
<div>{{am-highlight-furigana:Reading}}</div>
<div>{{am-highlight-kana:Reading}}</div>
<div>{{am-highlight-kanji:Reading}}</div>
I'm trying to break it over here but it looks like it works for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
font-size: 999;
the 999 needs a measurement like rem
, px
or %
.
Edit: yes changing font size does affect the output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
font-size: 999;
the 999 needs a measurement likerem
,px
or%
right, does a valid change like that work for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get the same result if I remove the styles:
highlighted_jit_text = (
f"testinggg<span class='{filter_name}'>"
+ " "
+ text_highlighting.get_highlighted_text(
am_config=am_config,
morphemes=card_morphs,
text=_dehtml(field_text),
use_html_rubies=filter_name != "am-highlight",
)
+ "</span>"
)
If i change the font size nothing happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😮
that's... odd.
is it possible that you may have some of the css that we were experimenting with before, still active somewhere?
I'll attach some screenshots, if you can try this, we can see where your styles are coming from vs where mine are coming from.
grab: AnkiWebView Inspector https://ankiweb.net/shared/info/31746032
inspect kana element by right clicking and inspecting:
click Computed in the right panel of the inspector:
click the circled arrow
icon on the property you want to see where it "came from" (I cannot screenshot the circled icon because it's on mouseover only, but its a grey circle next to the value of the property):
this will take you to the styles
tab, where you can see what styles are loaded and applied:
and by clicking on the top right <style>
here, it will highlight where that style got loaded in.
I know that's a HUGE pain in the butt. :D but I don't know why this would not be working for you but is for me (so far)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When using {{am-highlight-kana:kana:Front}} I don't get any ruby/rt elements at all:
aha, this makes sense. The built-in filter removes all kanji from the string before we even "see it" so we process a "kana only" string.
This has the negative side effect of not allowing us to use {{am-highlight-kana:kana:Front}}
for backward compatibility (i.e. when on mobile).
The built in furigana
and kanji
filters will still work because the morphs match on kanji (most of the time) with kana readings removed from the parsed string.
I can't speak for you the project, but I'll suggest that having that as a known issue may be the appropriate action, with the recommendation to either:
- set your card with
{{am-highlight-kana:furigana:Front}}
and have kana locally and furigana on mobile. - set your card template to be "mobile aware" and use
{{am-highlight-kana:Front}}
locally and{{kana:Front}}
on mobile (I do not know how to do this at the moment)
I think the current kana/kanji approach is a little off right now. If we emulate the native filters i.e.:
Originally posted by @mortii in #313 (comment) then we wouldn't need any special stylings, because there wouldn't be any ruby elements to manipulate in place. Could we solve this by making |
not exactly. it would not solve the issue at hand. (but agree that it would be a cleaner implementation). If the user chooses So as it stands now, all filters work as expected by themselves (I do agree with your point that we could just fully pre-process them on the "back end" and not have to mess with styling on the "front end"). And all filters except This is why I recommend taking it as a "known issue". We could also consider pulling |
just sayin..... this issue makes me want to understand why build in filters cannot run after custom ones, and raise a PR against Anki itself. that would make this SO easy..... |
Ah, that's true, I didn't even consider that... I was just focused on the problem of injecting styles.
Yeah, injecting styles would be a completely hidden side effect, so we would either have to complicate the guide by explaining it, or risk users encountering very mysterious bugs. So I think this is a worthwhile change even though it doesn't fix the filter chaining issue.
I agree, let's note it as a known problem. Yesterday I found myself being overly reliant on kanji for understanding some words, and I realized why a kana filter could be very useful, so I definitely don't want to drop it completely.
Yeah, it does feel somewhat arbitrary... |
I have an implementation for this, I'll do some manual testing tonight and then start on unit tests. |
class FuriganaRubyRange(RubyRange): | ||
"""Represents an html ruby and its range in parent string.""" | ||
|
||
def open(self) -> str: | ||
return "<ruby>" | ||
|
||
def close(self) -> str: | ||
return "</ruby>" | ||
|
||
def rt(self) -> str: | ||
return f"<rt>{self.ruby}</rt>" | ||
|
||
|
||
class KanjiRubyRange(RubyRange): | ||
"""Represents a kanji ruby and its range in parent string.""" | ||
|
||
def open(self) -> str: | ||
return "" | ||
|
||
def close(self) -> str: | ||
return "" | ||
|
||
def rt(self) -> str: | ||
return "" | ||
|
||
|
||
class KanaRubyRange(RubyRange): | ||
"""Represents a kana ruby and its range in parent string.""" | ||
|
||
def open(self) -> str: | ||
return "" | ||
|
||
def close(self) -> str: | ||
return "" | ||
|
||
def rt(self) -> str: | ||
return "" | ||
|
||
def __str__(self) -> str: | ||
return self.ruby | ||
|
||
|
||
class TextRubyRange(RubyRange): | ||
"""Represents a text ruby and its range in parent string.""" | ||
|
||
def open(self) -> str: | ||
return " " | ||
|
||
def close(self) -> str: | ||
return "" | ||
|
||
def rt(self) -> str: | ||
return f"[{self.ruby}]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New subclasses for each formatting type.
debug_utils.dev_print( | ||
"Scenario 6: The status is completely inside the ruby." | ||
) | ||
if isinstance(ruby, FuriganaRubyRange): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now furigana is the "special case" and all others are text-like. (i.e. they do not need special processing for <ruby>
html tags.
span_elements.append( | ||
SpanElement(morph_match.group(), morph_status, start_index, end_index) | ||
) | ||
ruby_range_type: type[RubyRange] = TextRubyRange |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default to text, incase we get garbage input.
) -> None: | ||
am_config = AnkiMorphsConfig() | ||
highlighted_text: str = text_highlighting.get_highlighted_text( | ||
am_config, card_morphs, input_text | ||
am_config, card_morphs, input_text, ruby_type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added test cases such that all tests are tested against all 4 formatting options. (even non-Japanese, just to ensure no regressions there!)
With the exception of the debug flag that I left on (and some vulture errors re: some of the debugging utils). I think this is ready for your review (again ;) ) |
I'm sick so it might take a couple of days 🙏 |
ugh!! Take your time! Feel better soon! 🤞🏻 |
############################################################################################## | ||
# CASE: Morph and ruby interaction | ||
############################################################################################## | ||
# This third example sets all the cases where morphs and rubies coexist. | ||
# |-mmm---| |--mmm--| |---mmm-| |-mmm---| |-mmmmm-| |-mmmmm-| |--mmm--| |--m-m--| | ||
# |----rrr| |--rrr--| |--rrr--| |--rrr--| |--rrr--| |--r-r--| |-rrrrr-| |-rrrrr-| | ||
# Collection choice is arbitrary. | ||
# Database choice is arbitrary. | ||
############################################################################################## |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only intended to test the 7 path variations right? I just want to make absolutely sure before I update these tests since it could be interpreted that the morph and ruby length has significance, i.e.
|-mmmmm-| != |-mmmm-| != |-mmmmmm-|
|--rrr--| |--rr--| |---rr---|
I want to move away from this particular abstraction because of its ambiguity--iq tests generally don't make for good documentation, hehe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. This is a set of synthetic test cases that provide full coverage of all 7 paths.
the only relevant pieces of information in the ascii "diagrams" are stop and start positions of morphs and rubies in the same string. i.e. the lengths themselves are not relevant
This pr is completely experimental and if it never goes further than a couple of people taking a look, I'm all good with that :)
(This pr also includes all changes in #308. if / when that has merged this is a very small pr.
What is the current behavior?
get_highlighted_text
is currently implemented with some nice hand written looping methods. It does however have a small issue with integrating the morph highlighting and html rubies, as outlined in the docs: https://mortii.github.io/anki-morphs/user_guide/known-problems.html under the headingRuby characters (furigana, etc.) are displayed wrong in am-highlighted
What is the new behavior?
the
get_highlighted_text
process has been completely rewritten to address this issue.Contrasting the implementations:
46.7K cards SpaCy japanese large model (Mac M1)
New:
Recalc duration: 18.992 seconds
Recalc duration: 14.782 seconds
Recalc duration: 14.563 seconds
Recalc duration: 14.792 seconds
Recalc duration: 14.476 seconds
Main:
Recalc duration: 19.884 seconds
Recalc duration: 15.912 seconds
Recalc duration: 17.181 seconds
Recalc duration: 17.442 seconds
Recalc duration: 16.230 seconds
What kind of changes does this PR introduce?
Checklist:
DOES NOT PASS TESTS THAT USE THE TEST DECKS THEY NEED TO BE REGENERATED AND I DONT KNOW HOW -- I fully anticipate that once regenerated, they'll all pass.
I have written tests for all of the new code, based on the tests for the old code. all of these pass.