Replies: 2 comments 3 replies
-
Additional data point : with |
Beta Was this translation helpful? Give feedback.
0 replies
-
I'd need to see evidence that this affects all viewers not just qpdfview. |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Something is strange and I'm not sure what the problem could possibly be.
I'm processing files with
--redo-ocr --sidecar
. A typical file has 1300 pages, 54MB before, 59MB after processing. The sidecar is a 1.9MB txt file.The original PDF does have some OCR already; if I extract the text with
pdf2txt.py
from pdfminer, it produces a 1.7MB text file. So the amount of text is roughly similar, by that metric.When I open the original PDF in qpdfview, a search takes less than 2 seconds. In the processed PDF, searching for the same text takes 28-30 seconds !
Is there something in PDF/A structure that would explain this ? And some way to improve the situation ? This effect was seen in 16.4.2 and 16.6.0 so probably not a random bug.
Beta Was this translation helpful? Give feedback.
All reactions