Skip to content

PyMuPDF-1.22.2: extractText show unreadible output from specific pdf #2378

Answered by JorjMcKie
zdenop asked this question in Q&A
Discussion options

You must be logged in to vote

The PDF uses non-standard encoding which makes it impossible to extract text - not only for (Py-) MuPDF, but also for Adobe Acrobat, Nitro 5, and other PDF viewers.
Confirm this by selecting some text and paste it in some word processor document.

Please be aware that showing text and extracting it are feature that do not necessarily be connected - as is the case here.
So all you can do is OCR-ing.

Replies: 2 comments

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Answer selected by JorjMcKie
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
not a bug not a bug / user error / unable to reproduce
2 participants
Converted from issue

This discussion was converted from issue #2377 on April 28, 2023 07:49.