Some text characters in the content stream are present in a different format when the content stream is extracted using doc.xref_stream().decode() #1368
-
|
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 5 replies
-
Some fonts support so-called ligatures. These are single glyphs that represent more than one character. MuPDF supports these 7: >>> for i in range(7):
print(chr(0xfb00 + i))
ff
fi
fl
ffi
ffl
ſt
st
>>> So in your font, 0xFB01 = "fi" is represented by the glyph id 0o37 (as an octal number). |
Beta Was this translation helpful? Give feedback.
-
Strictly speaking, many languages have ligatures as part of their normal alphabet like the German "umlauts" ä. ö, ü, etc. which came from ae, oe, ue, or ß => ss, or think of the many special Scandinavian letters. |
Beta Was this translation helpful? Give feedback.
Some fonts support so-called ligatures. These are single glyphs that represent more than one character. MuPDF supports these 7:
So in your font, 0xFB01 = "fi" is represented by the glyph id 0o37 (as an octal number).