Skip to content

How To OCR #4200

Discussion options

You must be logged in to vote

Sure it does. It uses page.get_pixmap(dpi=DPI) with a DPI value provided by you. Then it internally uses Pixmap.pdfocr_tobytes() whicg creates an in-memory 1-page PDF which contains the OCR-ed text layer. From this, a normal TextPage is populated.
This TextPage can then be used for all the usual text extraction variants.
We are not including the TextPage creation as an option directly in get_text() (which would have been possible), because the OCR process is a long-lasting thing. So we are enforcing to make a separate TextPage which can be re-used multiple times.

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by JorjMcKie
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants