Replies: 1 comment 1 reply
-
Text extraction works for all document types. That happens if you extract the text of the XML page? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
What would be the best way to insert data from the xml to the new pdf file?
Up until now I parsed the xml file with xmltodict library. Then flattened the nested dictionary, iterated through it and used insert_text function. This works but I am wondering is there some more appropriate method.
I also tried
doc = fitz.open(xml)
xml = doc.convert_to_pdf()
pdf = fitz.open("pdf", xml)
for page in pdf:
text = page.get_text()
this would work but xml file is in shift-JIS encoding and I don't get the correct output.
Does anyone have any suggestions
Beta Was this translation helpful? Give feedback.
All reactions