Skip to content

Some text characters in the content stream are present in a different format when the content stream is extracted using doc.xref_stream().decode() #1368

Answered by JorjMcKie
meghanaviyyapu asked this question in Q&A
Discussion options

You must be logged in to vote

Some fonts support so-called ligatures. These are single glyphs that represent more than one character. MuPDF supports these 7:

>>> for i in range(7):
	print(chr(0xfb00 + i))

	







>>> 

So in your font, 0xFB01 = "fi" is represented by the glyph id 0o37 (as an octal number).

Replies: 2 comments 5 replies

Comment options

You must be logged in to vote
1 reply
@meghanaviyyapu
Comment options

Answer selected by meghanaviyyapu
Comment options

You must be logged in to vote
4 replies
@meghanaviyyapu
Comment options

@JorjMcKie
Comment options

@meghanaviyyapu
Comment options

@JorjMcKie
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants
Converted from issue

This discussion was converted from issue #1367 on November 04, 2021 08:30.