is it possible to use the embedded fonts in a pdf file to write another pdf? #1829

leyiwang · 2022-07-22T07:13:14Z

leyiwang
Jul 22, 2022

I have a word document opened via office, then i save this docx file as a pdf file. i open this pdf file with pymupdf and extract the embedded fonts, then i write the text info to a new pdf file with these fonts. the code is as follows:

import fitz
doc = fitz.open("a.pdf")
pdf = fitz.open()
for page in doc:
    fonts = page.get_fonts()
    pos = 90
    for font in fonts:
        new_page = pdf.new_page()
        font_name, _type, _, buffer = doc.extract_font(font[0])
        new_page.insert_font(fontname=font_name, fontbuffer=buffer)
        print(font_name)
        new_page.insert_text(fitz.Point(pos, pos), text="hello, 你好", fontname=font_name)
pdf.save("new.pdf")

when i execute this code , contents in each page of the new pdf seems to be wrong. i am very confused, is it possible to use the embedded fonts in a pdf file to write another pdf?

a.pdf
new.pdf

Answered by JorjMcKie

Jul 22, 2022

is it possible to use the embedded fonts in a pdf file to write another pdf?

Not as easily as it may seem:
The fonts created by Word's export to PDF are only font subsets: they contain exactly the characters that are used in the docx - no more. So you cannot write the word "hello" with them because this word's characters are not contained in the font subset.

To put it in simple words: you probably can only write the same same text again with each character in its original font.
But even then: the way Word stores its font subsets prevents doing this either, because Word also changes the mapping of unicode to glyph (= visual appearance) number, when creating the font subset.
So if you wan…

View full answer

JorjMcKie · 2022-07-22T08:14:49Z

JorjMcKie
Jul 22, 2022
Maintainer

This is no issue, but a typical post for the "Discussions" tab.

0 replies

JorjMcKie · 2022-07-22T08:42:40Z

JorjMcKie
Jul 22, 2022
Maintainer

is it possible to use the embedded fonts in a pdf file to write another pdf?

Not as easily as it may seem:
The fonts created by Word's export to PDF are only font subsets: they contain exactly the characters that are used in the docx - no more. So you cannot write the word "hello" with them because this word's characters are not contained in the font subset.

To put it in simple words: you probably can only write the same same text again with each character in its original font.
But even then: the way Word stores its font subsets prevents doing this either, because Word also changes the mapping of unicode to glyph (= visual appearance) number, when creating the font subset.
So if you want to write "实" again using the extracted subset font, the glyph number it is giving you for "实" may be wrong (and probably will be wrong) and garbage (or nothing) comes out.

1 reply

leyiwang Aug 9, 2022
Author

i see, thank you for reply.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

is it possible to use the embedded fonts in a pdf file to write another pdf? #1829

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

is it possible to use the embedded fonts in a pdf file to write another pdf? #1829

leyiwang Jul 22, 2022

Replies: 2 comments · 1 reply

JorjMcKie Jul 22, 2022 Maintainer

JorjMcKie Jul 22, 2022 Maintainer

leyiwang Aug 9, 2022 Author

leyiwang
Jul 22, 2022

Replies: 2 comments 1 reply

JorjMcKie
Jul 22, 2022
Maintainer

JorjMcKie
Jul 22, 2022
Maintainer

leyiwang Aug 9, 2022
Author