Skip to content

Greater than one pixel accuracy for page.get_textbox() method #2445

Answered by JorjMcKie
coldter asked this question in Q&A
Discussion options

You must be logged in to vote

Ultimately, we would need an example file. But as a general comment:
What you describe is likely caused by (1) having a font with a fairly large "natural" line height, and (2) text lines written with a closer distance than this line height.
The is the "fault" of the document creator.
You can identify (locate) your text after first having set a global variable fitz.TOOLS.set_small_glyph_heights(True). This will cause subsequent text searches and extractions use line heights equal to the font size. Which might be sufficient to restrict extractions to the wanted text.

Otherwise, try using an extremely small bbox height (e.g. 20% of the font size). In any case, only characters intersecting th…

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@JorjMcKie
Comment options

Answer selected by coldter
@coldter
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
2 participants
Converted from issue

This discussion was converted from issue #2444 on June 01, 2023 21:53.