Skip to content

Inline images extraction #1231

Answered by JorjMcKie
maiiabocharova asked this question in Q&A
Aug 25, 2021 · 2 comments · 11 replies
Discussion options

You must be logged in to vote

page.get_text("dict")["blocks"] is a list of blocks on the page. Each one is either a text block or an image block - see documentation for TextPage. Image blocks have block["type"] = 1. The image binary is contained in block["image"]. More info is contained in the other dict keys.

The list of drawings dicts page.get_drawings() can be used to re-draw each on some other page - see the docu here.
Each "path" dict therein has a path["rect"], which is the rectangle containing all the elementary draws in it.
You could also do a page.get_pixmap(..., clip=path["rect"]) to create an image of the path. Of course there is the risk, that other things (not belonging to the path) are also part of that …

Replies: 2 comments 11 replies

Comment options

You must be logged in to vote
1 reply
@maiiabocharova
Comment options

Comment options

You must be logged in to vote
10 replies
@JorjMcKie
Comment options

@maiiabocharova
Comment options

@maiiabocharova
Comment options

@JorjMcKie
Comment options

@maiiabocharova
Comment options

Answer selected by JorjMcKie
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
2 participants
Converted from issue

This discussion was converted from issue #1230 on August 25, 2021 12:09.