-
Notifications
You must be signed in to change notification settings - Fork 563
How to Extract Fonts from a PDF
Jorj X. McKie edited this page Jan 2, 2018
·
4 revisions
This script can be used to extract all fonts referenced by some page of a PDF.
from __future__ import print_function
import fitz
doc = fitz.open("some.pdf")
xref_visited = [] # memorize already processed font xrefs here
for page in doc:
fl = page.getFontList() # list of fonts of page
for f in fl:
xref = f[0] # xref of font
if xref in xref_visited:
continue # skip if already processed
xref_visited.append(xref) # do not process a second time
# extract font buffer
basename, ext, _, buffer = doc.extractFont(xref)
if ext != "n/a": # is the font extractable?
foutname = "%s-%i.%s" % (basename, xref, ext) # build the filename
fout = open(foutname, "wb") # and output the font
fout.write(buffer)
fout.close()
print("extracted", foutname)
footer = "extracted %i font files from %s." % (len(xref_visited), doc.name)
footer_line = "-".ljust(len(footer), "-")
# output some protocol
print(footer_line)
print("extracted", len(xref_visited), "font files from", doc.name)
print(footer_line)
HOWTO Button annots with JavaScript
HOWTO work with PDF embedded files
HOWTO extract text from inside rectangles
HOWTO extract text in natural reading order
HOWTO create or extract graphics
HOWTO create your own PDF Drawing
Rectangle inclusion & intersection
Metadata & bookmark maintenance