Use TextWriter with specified font #2273

hgdhot · 2023-03-01T07:02:16Z

hgdhot
Mar 1, 2023

Hi,

I get the fonts, flags in span through the page.get_text("dict") method,

(1)Can I use textwriter to write some new content with specified font like 'CMMI7'? How can I get the "CMMI7 Font object"?
(2)Is there any way to write some new text with bold and italic characters ?

Answered by JorjMcKie

Mar 1, 2023

Answering directly here - then I will transfer this to "Discussions", where we may handle any additional questions.
This is not issue / bug.

Ad (1)

The displayed fontname as you are showing it probably is incomplete, because most creator software will build font subsets - resulting in an extended name like "ABCDEF+CMMI7". The first 6 characters are random, so you must extract the text after setting a global parameter which causes returning the full subset name in text extraction. Once you know this name, inspect this output to find the font xref:

In [4]: page.get_fonts()
Out[4]:
[(12, 'cff', 'Type1', 'EAMKOQ+FrutigerLT-Roman', 'T1_0', 'WinAnsiEncoding'),
 (14, 'cff', 'Type1', 'DWXEAS+Frut…

View full answer

JorjMcKie · 2023-03-01T12:27:37Z

JorjMcKie
Mar 1, 2023
Maintainer

Answering directly here - then I will transfer this to "Discussions", where we may handle any additional questions.
This is not issue / bug.

Ad (1)

The displayed fontname as you are showing it probably is incomplete, because most creator software will build font subsets - resulting in an extended name like "ABCDEF+CMMI7". The first 6 characters are random, so you must extract the text after setting a global parameter which causes returning the full subset name in text extraction. Once you know this name, inspect this output to find the font xref:

In [4]: page.get_fonts()
Out[4]:
[(12, 'cff', 'Type1', 'EAMKOQ+FrutigerLT-Roman', 'T1_0', 'WinAnsiEncoding'),
 (14, 'cff', 'Type1', 'DWXEAS+FrutigerLT-Bold', 'T1_1', 'WinAnsiEncoding'),
 (15,
  'cff',
  'Type1',
  'CXIYMU+AGaramondPro-Regular',
  'T1_2',
  'WinAnsiEncoding')]
In [5]: # access font buffer of xref 12 and make a font of it
In [6]: ff = doc.extract_font(12)
In [7]: fontbuffer = ff[-1]
In [8]: myfont = fitz.Font(fontbuffer=fontbuffer)
In [9]: myfont
Out[9]: Font('FrutigerLT-Roman Regular')
In [10]: myfont.flags
Out[10]:
{'mono': False,
 'serif': True,
 'bold': False,
 'italic': False,
 'substitute': False,
 'stretch': False,
 'fake-bold': False,
 'fake-italic': False,
 'opentype': False,
 'invalid-bbox': False,
 'cjk': False,
 'cjk-lang': None,
 'embed': True,
 'never-embed': False}

Setting the global variable for returning the full subset font name: fitz.TOOLS.set_subset_fontnames(True).

Then you can use myfont just as is in TextWriter output. If using insert_text()/insert_textb() you must first install the font on the page page.insert_font(...,fontname="myname", fontbuffer=myfont.buffer). Then use page.insert_text(..., fontname="myname").

BUT beware that these are subset fonts: not all characters of the full fonts will be present - that's the whole motivation behind subsets, isn't it. So if your output text contains unsupported characters you will see spaces there.
It may be better to get hold of the full font - which often is possible for popular fonts.

Ad (2)

Font weights (bold) and styles (italic) generally are part of different fontfiles. If you text contains a mixture of regular and these other variations, you must undertake the effort to make different text pieces yourself - each with its own font (variant). This is tedious with insert_text and a little easier with TextWriter.

But there is an elegant solution available now in PyMuPDF: use the new Story class!
This allows you to use HTML source to express desired font variants using syntax like "Good morning, if it is a good morning, which I doubt.". If you then use HTML styling (e.g. via CSS), The story will automatically switch between the different fonts to output the regular, bold and italic text parts.
Please read the respective documentation to see if this is an option for you.

11 replies

JorjMcKie Mar 2, 2023
Maintainer

No, Story supports also fonts given via fontfile. Lets look at what CSS source that function generates:

import fitz
arch = fitz.Archive()
CSS = fitz.css_for_pymupdf_font("ubuntu",archive=arch, name="sans-serif")
print(CSS)

@font-face {font-family: sans-serif; src: url(ubuntu);}

@font-face {font-family: sans-serif; src: url(ubuntubo);font-weight: bold;}

@font-face {font-family: sans-serif; src: url(ubuntubi);font-weight: bold;font-style: italic;}

@font-face {font-family: sans-serif; src: url(ubuntuit);font-style: italic;}

The same sort of CSS source can be used for your own font:

choose a name for font-family (here: "sans-serif"), this is arbitrary
under "src" provide the names of your font files: one file for each combination of regular, bold, etc.
provide font-weight / font-style as appropriate for the respective fontfile
If your HTML then contains "<b>", "<i>" etc. the right font will automatically be selected from your CSS spec.
You must include the folder of your fontfiles in the fitz.Archive that you give to your Story.
You can also provide fontbuffers instead of fontfiles. This is what you see in the example: I am putting the ubuntu fontbuffers as single members in the archive, so Story will find them there. Instead, you can provide fontfile paths - look up the CSS syntax specifications to be sure.

hgdhot Mar 2, 2023
Author

I tried to use font file simsun.ttc, simsunb.ttf to test Bold Song, the font files are located in /home/chinese/ :

def test_story():
    CSS = """
        @font-face {font-family: test; src: url(simsun.ttc);}

        @font-face {font-family: test; src: url(simsunb.ttf);font-weight: bold;}

        @font-face {font-family: test; src: url(simsunb.ttf);font-weight: bold;font-style: italic;}

        @font-face {font-family: test; src: url(simsunb.ttf);font-style: italic;}
    """
    
    HTML = """
    <p style="font-family: test;color: blue"><b>你好</b></p>
    <p style="font-family: test;color: blue">你好</p>
    """
    
    MEDIABOX = fitz.paper_rect("letter")
    WHERE = MEDIABOX + (36, 36, -36, -36)
    WHERE = MEDIABOX / 2
    arch = fitz.Archive('/home/chinese', 'test')
    story = fitz.Story(user_css=CSS, archive=arch, html=HTML)  
    

    writer = fitz.DocumentWriter("output.pdf")

    more = 1

    while more:
        device = writer.begin_page(MEDIABOX)
        more, _ = story.place(WHERE)
        story.draw(device)
        writer.end_page()

    writer.close()

It didn't seem to work, I guess I did something wrong, but couldn't find where is the problem

JorjMcKie Mar 2, 2023
Maintainer

This works based on font files:

thisdir = os.path.dirname(os.path.abspath(__file__))
arch = fitz.Archive([".", "C:/Users/haral/OneDrive/Desktop/extra-fonts/pragmaticaC"])
# folder "." contains the image, the other folder has the fonts
# Use font-family based on files
CSS = """
@font-face {font-family: myfamily; src: url(pragmaticaC.otf);}
@font-face {font-family: myfamily; src: url(PragmaticaC-Bold.otf);font-weight: bold;}
@font-face {font-family: myfamily; src: url(PragmaticaC-Oblique.otf);font-weight: bold;font-style: italic;}
@font-face {font-family: myfamily; src: url(PragmaticaC-BoldOblique.otf);font-style: italic;}
"""
docname = __file__.replace(".py", ".pdf")  # output PDF file name
# read the HTML source from here
HTML = pathlib.Path(os.path.join(thisdir, "springer.html")).read_bytes().decode()

# make the Story object
story = fitz.Story(HTML, user_css=CSS, archive=arch)
...

hgdhot Mar 3, 2023
Author

Thanks for replay! I modified my code according to yours(the creation of Archive part), the code didn't throw any exception, somehow the output pdf is empty(its size is zero bytes)

def test_story():
    CSS = """
        @font-face {font-family: test; src: url(new-simsun.ttc);}
        @font-face {font-family: test; src: url(new-simsun.ttc);font-weight: bold;}
        @font-face {font-family: test; src: url(new-simsun.ttc);font-weight: bold;font-style: italic;}
        @font-face {font-family: test; src: url(new-simsun.ttc);font-style: italic;}
    """
    
    HTML = """
    <p style="font-family: test;color: blue"><b>你好</b></p>
    <p style="font-family: test;color: blue">你好</p>
    """
    
    MEDIABOX = fitz.paper_rect("letter")
    WHERE = MEDIABOX + (36, 36, -36, -36)
    # the font files are located in /home/chinese
    arch = fitz.Archive(['/home/chinese'])
    # if not specfied user_css, the output pdf has content
    story = fitz.Story(HTML, user_css=CSS, archive=arch)  
    

    writer = fitz.DocumentWriter("output.pdf")

    more = 1

    while more:
        device = writer.begin_page(MEDIABOX)
        more, _ = story.place(WHERE)
        story.draw(device)
        writer.end_page()

    writer.close()

How should I debug this problem？

JorjMcKie Mar 3, 2023
Maintainer

Any error messages?
There may be some font support problem ...
I have successfully tested your script on Windows using font C:/Windows/Fonts/simsun.ttc.
You may want to try a different version of that font - preferably a TTF.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use TextWriter with specified font #2273

{{title}}

Replies: 1 comment 11 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Use TextWriter with specified font #2273

hgdhot Mar 1, 2023

Replies: 1 comment · 11 replies

JorjMcKie Mar 1, 2023 Maintainer

JorjMcKie Mar 2, 2023 Maintainer

hgdhot Mar 2, 2023 Author

JorjMcKie Mar 2, 2023 Maintainer

hgdhot Mar 3, 2023 Author

JorjMcKie Mar 3, 2023 Maintainer

hgdhot
Mar 1, 2023

Replies: 1 comment 11 replies

JorjMcKie
Mar 1, 2023
Maintainer

JorjMcKie Mar 2, 2023
Maintainer

hgdhot Mar 2, 2023
Author

JorjMcKie Mar 2, 2023
Maintainer

hgdhot Mar 3, 2023
Author

JorjMcKie Mar 3, 2023
Maintainer