-
the case is simply from I need to remove some blocks in some pages of the pdf file. It's not difficult to get the json string and manipulate the blocks in the dict, but after that I want to recreate the pdf file, is there any API available? |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 13 replies
-
Don't take the JSON string, take the "dict" format instead. JSON is derived from it. You cannot write back to the same page - instead make a new page in a new PDF and write (a subset of) the contents of the dictioanry to it. |
Beta Was this translation helpful? Give feedback.
-
And I have another curious topic, how to transfer the image block we get into a PIL image instance like return by PIL.Image.open? |
Beta Was this translation helpful? Give feedback.
-
When MuPDF builds its TextPage object (which I wrap with the same-named Python object), it applies some heuristics when it creates its block-line-span-character hierarchy.
So, in the generated dictionary I also have |
Beta Was this translation helpful? Give feedback.
Don't take the JSON string, take the "dict" format instead. JSON is derived from it.
You cannot write back to the same page - instead make a new page in a new PDF and write (a subset of) the contents of the dictioanry to it.
There is an example set of scripts for font replacement, where you can probably learn a lot from.