Use HarfBuzz's advances instead of FreeType's #639

TriangulumDesire · 2024-07-23T06:24:40Z

This pull request alters the HarfBuzz sample to use the shaped advances instead of FreeTypes. The main reason to do this is because it provides more accurate kerning.

OpenType has two main ways of providing kerning information: either with a simple kern table or with the more modern GPOS feature. GPOS is much more sophisticated and can contain context-dependent information (such as position in a string or locale data). Some fonts might support both, and some will only support GPOS.

FreeType can only retrieve kerning from the kern table, whereäs HarfBuzz can retrieve kerning from either option (which it then adds to the advances), so it is more correct to use HarfBuzz's advances where possible.

This PR also removes all the old FreeType kerning functions from the sample (which are no longer needed).

TriangulumDesire · 2024-07-23T07:15:57Z

Samples/basic/harfbuzz/src/FontFaceHandleHarfBuzz.cpp

@@ -98,7 +96,7 @@ const FallbackFontGlyphMap& FontFaceHandleHarfBuzz::GetFallbackGlyphs() const
 }

 int FontFaceHandleHarfBuzz::GetStringWidth(StringView string, const TextShapingContext& text_shaping_context,
-	const LanguageDataMap& registered_languages, Character prior_character)
+	const LanguageDataMap& registered_languages, Character /*prior_character*/)


One thing to note is that this change will make the prior_character parameter unused.

Originally, this character was used to calculate any extra kerning that may have existed prior to this string being rendered (for example, in cases where a much larger, continuous string was split into several renders).

One possible way to reïmplement this would be to create a new string that is a combination of the prior character and the first character of the current string. We could shape this new string and then query the offsets of the second shaped glyph (if it exists) and use those offsets when positioning the first character. I'll leave it up to you if this is something you want to implement.

This might cause some issues in the text inputs especially. When we select text there, we break a sentence up into pre-selection, selection, and post-selection parts. If the kerning between these parts are not the same when changing the selection, then the text will move around a bit.

This may also affect where we decide to place the text cursor from a mouse position. And also for breaking up text in normal text flow, there might be some differences.

When glancing over the harfbuzz API, I think I saw some of them taking whole strings for prior text. This is probably something that would be a lot more accurate if we used this with harfbuzz, since harfbuzz can combine several characters (code points) into single glyphs.

I think this change as it stands is a bit problematic for the above reasons. Maybe it is better to use Freetype kerning until we have solved this issue properly? Maybe you could test it with a text area and a font with kerning, to see how problematic it is?

I spent some time trying to resolve this but to no avail. I couldn't find any HarfBuzz functions that allow you to provide prior text or anything similar.

Firstly, I added the prior character to the shaping buffer before the rest of the text (and then ignored its advance when measuring). However, it became difficult to determine which shaped glyphs were part of the prior character and which were not. There was also the issue of ligatures. The font supports the ti ligature (where they become a single glyph). If I were to have a prior character of t and a first character of i, HarfBuzz would shape them into the ti ligature. Then, if I were to ignore measuring the first shaped glyph (which would normally correspond to the prior character), it would skip both the t and i entirely (since they are now a single glyph).

I also tried creating a second, smaller shaping buffer that only consisted of the prior character and first character of the string but ran into similar problems listed above.

I also reïnstated the FreeType kerning function and used that. It worked fine for the “LatoLatin” font (since it supports the kern table), but this method won't work for fonts that do not include a kern table (EB Garamond is one such font). It also has the same aforementioned ligature issues.

I did come across hb-font-get-glyph-kerning-for-direction, but I looked at its source and it only returns legacy kerning values (such as those from the kern table), making it functionally similar to FreeType's kerning functions.

I found this old issue that was posted by someone solving a similar problem. However, it seems that this is a much more complicated issue, and one that HarfBuzz doesn't seem to solve—some shaping results can require multiple characters of context.

This is the one I believe I was thinking of: hb_buffer_add_codepoints() (looks like this also applies to _add_utf8):

When shaping part of a larger text (e.g. a run of text from a paragraph), instead of passing just the substring corresponding to the run, it is preferable to pass the whole paragraph and specify the run start and length as item_offset and item_length , respectively, to give HarfBuzz the full context to be able, for example, to do cross-run Arabic shaping or properly handle combining marks at stat of run.

I'm not sure if this is actually useful to help solve this problem though. It does indeed sound like a complicated problem. I came to remember this blog post about text shaping and ligatures, which seems like a similar problem: https://faultlore.com/blah/text-hates-you/#style-can-change-mid-ligature

Once we get to text editing, there is also the problem of placing the text cursor, and moving them properly between glyphs and such: https://lord.io/text-editing-hates-you-too/

Ah, yeah. That function is for adding text to a buffer. When it comes to shaping, you process the entire buffer in one go. One method to try could be to do this and then get the shaped glyphs for the segment we care about, but the main problem is actually identifying the start and end of said segment. Regardless, it seems as if we'll need more than just the prior character to assist herewith.

In the meantime, what would you like me to do with this pull request? I could reïmplement the original FreeType kerning functions and use the value it returns. It won't be perfect, but it'll provide a decent enough estimate for now. I have tried selecting text in input elements, and whilst the text does jump around a bit if you displace kerning or cut a ligature, it is only a few pixels, and I wouldn't say it's that noticeable unless you're specifically looking for it.

Yeah, on English the movement is relatively subtle, although not something I would use in production. I also tested on Hindi and Arabic, and they are very janky. I also tested this on master, and it's pretty much the same there. We don't really do anything to handle right-to-left, so selection on Arabic does weird things. And while Hindi at least works, it's moving about a lot. I guess the main reason is that we don't consider more than one character back. And also, we're allowed to move between code points that become merged glyphs, which allows you to corrupt these glyphs. So yeah, a lot of things to work on before this becomes very stable.

This PR definitely is the right direction. And considering it's not considerably worse, and it does look a lot better in normal rendering, I'm okay with merging this if you agree with that. I think people in the mean time will just have to be careful with input fields if they want to use this font engine. Basically, one would need to use a font without any kerning here.

Oh, right. I should have tested languages other than English in the input fields. I tried selecting Arabic and Hindi, and I can see what you mean by how janky they are. I also tried selecting these languages on the master branch, and these problems still occur—so this issue definitely transcends this pull request.

I'm fine with it being merged if you are, since most of these issues were happening before this PR. I'm now just thinking of ways to resolve this issue. The first option that comes to mind is to give the entire line to the font engine and then determine whole characters for selection using HarfBuzz cluster indices. However, determining position within a string using cluster indices is difficult, and this still has the issue of splitting ligatures (since even though they are two characters, they are a single cluster) and would require a lot of changes to RmlUi's internal text processing. We might end up requiring both the Unicode Text Segmentation algorithm and the Unicode Bidirectional Algorithm.

mikke89 · 2024-07-23T20:52:23Z

Thanks a lot for the pull request!

It certainly makes a lot of sense to use harfbuzz for advances. So the contribution is very much welcome.

I took some screenshots to compare the old kerning with Freetype (master) and Harfbuzz (this PR). Both with the harfbuzz font engine.

See screenshots here

English

Harfbuzz:

Freetype:

Hindi

Harfbuzz:

Freetype:

The new kerning certainly looks a lot more compact. Aside from that, I think the overall spacing looks a bit better generally. But I wonder if it is supposed to be that compact? Maybe it is, it certainly looks better the more I look at it :)

TriangulumDesire · 2024-07-24T10:39:01Z

I think the more-compacted text from HarfBuzz is correct. I have compared it to a few browsers, and it matches their renders. In the FreeType render, you can see a few issues. For example, the spacing between the letters in the word “consectetur” on the first line is irregular. Likewise, the word “magna” at the end of the second line has its ‘g’ and ‘n‘ almost touching each other (which definitely shouldn't be happening).

I looked at the internal tables of the “Poppins” font, and it turns out its kern table is empty, so it originally wasn't even rendering with kerning (which is why it looks very spaced out in the FreeType render).

mikke89 · 2024-07-24T20:44:53Z

Oh I see. Yeah, those two examples you bring up definitely look better on harfbuzz, and overall too. Not having any kerning on freetype at all explains the big difference, thanks for looking in to it.

Use HarfBuzz kerning instead of FreeType's

cb651cf

TriangulumDesire changed the title ~~Use HarfBuzz advance instead of FreeType's~~ Use HarfBuzz's advances instead of FreeType's Jul 23, 2024

TriangulumDesire commented Jul 23, 2024

View reviewed changes

mikke89 added samples internationalization labels Jul 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use HarfBuzz's advances instead of FreeType's #639

Use HarfBuzz's advances instead of FreeType's #639

TriangulumDesire commented Jul 23, 2024

TriangulumDesire Jul 23, 2024 •

edited

Loading

mikke89 Jul 23, 2024

TriangulumDesire Jul 24, 2024 •

edited

Loading

mikke89 Jul 24, 2024

TriangulumDesire Jul 25, 2024

mikke89 Jul 25, 2024

TriangulumDesire Jul 26, 2024 •

edited

Loading

mikke89 commented Jul 23, 2024

TriangulumDesire commented Jul 24, 2024

mikke89 commented Jul 24, 2024

Use HarfBuzz's advances instead of FreeType's #639

Are you sure you want to change the base?

Use HarfBuzz's advances instead of FreeType's #639

Conversation

TriangulumDesire commented Jul 23, 2024

TriangulumDesire Jul 23, 2024 • edited Loading

Choose a reason for hiding this comment

mikke89 Jul 23, 2024

Choose a reason for hiding this comment

TriangulumDesire Jul 24, 2024 • edited Loading

Choose a reason for hiding this comment

mikke89 Jul 24, 2024

Choose a reason for hiding this comment

TriangulumDesire Jul 25, 2024

Choose a reason for hiding this comment

mikke89 Jul 25, 2024

Choose a reason for hiding this comment

TriangulumDesire Jul 26, 2024 • edited Loading

Choose a reason for hiding this comment

mikke89 commented Jul 23, 2024

TriangulumDesire commented Jul 24, 2024

mikke89 commented Jul 24, 2024

TriangulumDesire Jul 23, 2024 •

edited

Loading

TriangulumDesire Jul 24, 2024 •

edited

Loading

TriangulumDesire Jul 26, 2024 •

edited

Loading