-
Please provide all mandatory information! Describe the bug (mandatory)In one pdf, pymupdf extracts some chinese character not right. Example PDF: code
Many characters are not right. How can I recognize and fix it? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 3 replies
-
This is a "Discussions" item, so I will first covert it. |
Beta Was this translation helpful? Give feedback.
-
This problem is caused by the fonts themselves. A text extraction software is dependent on the font's back-translation information, which is contained in the |
Beta Was this translation helpful? Give feedback.
Showing this PDF is not the problem. You talked about text extraction. If you create a page pixmap with PyMuPDF, you will get the right picture.
But if using e.g. Adobe Acrobat or any other PDF viewer and then selecting the text with the cursor you will get the same wrong result.