Inline images extraction #1231
-
There are some small pictures in my PDFs and I am not able to extract them with page.getImageList() (it returns the empty list). Can you please recommend me what I can do to extract those images, maybe another library? I need those images very much) |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 11 replies
-
If this happens, you
Drawings can be extracted via Inline images are only contained in the internal page command source (the |
Beta Was this translation helpful? Give feedback.
-
The list of drawings dicts |
Beta Was this translation helpful? Give feedback.
page.get_text("dict")["blocks"]
is a list of blocks on the page. Each one is either a text block or an image block - see documentation forTextPage
. Image blocks haveblock["type"] = 1
. The image binary is contained inblock["image"]
. More info is contained in the other dict keys.The list of drawings dicts
page.get_drawings()
can be used to re-draw each on some other page - see the docu here.Each "path" dict therein has a
path["rect"]
, which is the rectangle containing all the elementary draws in it.You could also do a
page.get_pixmap(..., clip=path["rect"])
to create an image of the path. Of course there is the risk, that other things (not belonging to the path) are also part of that …