Replies: 1 comment
-
The problem here is that the font uses non-standard encoding, which leads to incorrect backtranslation of the glyphs (character appearance in viewers) to the originating character unicodes. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello.
I am trying to parse a large PDF document into a single Excel table for further processing.
A page from one of those documents is attached: fragment.pdf
The minimal code I use to extract text from a single page:
and then I get this:
The problem is that the extracted text does not match the text in PDF, it does not display something like ? or TOFU symbols.
Are there any suggestions why this happens and is there any solution for it?
Beta Was this translation helpful? Give feedback.
All reactions