How to remove the contents of the images? #1346
-
Hi, Is there a method could remove all the contents of an image in a pdf file? Pymupdf is sometimes too powerful and picks up some words from images. And my goal is analyzing the text not in the image. Thanks |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 7 replies
-
I think I don't understand your problem.
Please be more specific. |
Beta Was this translation helpful? Give feedback.
-
If bit |
Beta Was this translation helpful? Give feedback.
I think I don't understand your problem.
There are several ways how to ignore images and only extract text: page method
get_text
has an option string as argument one, and a keyword parameterflags
, which together control this:TEXT_PRESERVE_IMAGES
fromflags
will only extract text for the other option values ("blocjs", "dict" etc.)Please be more specific.