-
I have a word document opened via office, then i save this docx file as a pdf file. i open this pdf file with pymupdf and extract the embedded fonts, then i write the text info to a new pdf file with these fonts. the code is as follows: import fitz
doc = fitz.open("a.pdf")
pdf = fitz.open()
for page in doc:
fonts = page.get_fonts()
pos = 90
for font in fonts:
new_page = pdf.new_page()
font_name, _type, _, buffer = doc.extract_font(font[0])
new_page.insert_font(fontname=font_name, fontbuffer=buffer)
print(font_name)
new_page.insert_text(fitz.Point(pos, pos), text="hello, 你好", fontname=font_name)
pdf.save("new.pdf") when i execute this code , contents in each page of the new pdf seems to be wrong. i am very confused, is it possible to use the embedded fonts in a pdf file to write another pdf? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
This is no issue, but a typical post for the "Discussions" tab. |
Beta Was this translation helpful? Give feedback.
-
Not as easily as it may seem: To put it in simple words: you probably can only write the same same text again with each character in its original font. |
Beta Was this translation helpful? Give feedback.
Not as easily as it may seem:
The fonts created by Word's export to PDF are only font subsets: they contain exactly the characters that are used in the docx - no more. So you cannot write the word "hello" with them because this word's characters are not contained in the font subset.
To put it in simple words: you probably can only write the same same text again with each character in its original font.
But even then: the way Word stores its font subsets prevents doing this either, because Word also changes the mapping of unicode to glyph (= visual appearance) number, when creating the font subset.
So if you wan…