-
Is your feature request related to a problem? Please describe. I'm currently working with intently dense PDF and when using get_textbox with rectangle, it returns unwanted surrounding despite having accurate pinpoint coordinates, I believe it's due to limitation of the rectangle only having one pixel accuracy. Is there any way to get a more accurate result? |
Beta Was this translation helpful? Give feedback.
Answered by
JorjMcKie
Jun 1, 2023
Replies: 1 comment 2 replies
-
A typical Discussions item ... transferring |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Ultimately, we would need an example file. But as a general comment:
What you describe is likely caused by (1) having a font with a fairly large "natural" line height, and (2) text lines written with a closer distance than this line height.
The is the "fault" of the document creator.
You can identify (locate) your text after first having set a global variable
fitz.TOOLS.set_small_glyph_heights(True)
. This will cause subsequent text searches and extractions use line heights equal to the font size. Which might be sufficient to restrict extractions to the wanted text.Otherwise, try using an extremely small bbox height (e.g. 20% of the font size). In any case, only characters intersecting th…