Replies: 1 comment 1 reply
-
I haven't looked at the code but good job getting it working at all. :) I don't think you need to encode format 14 cmap at all, at least for Prawn. You see, PDF is a little funny format. It has bunch of layers when it comes to text. The base layer—rendering—only needs glyphs. TTFunk subsetting creates a bunch of small fonts that use single-byte encoding regardless of how big those characters in Unicode. Be it a ligature for "ffi" or the most elaborate emoji gluing together 15 codepoints, if its a single glyph TTFunk will give it a one-byte code in the subset. This is encoded in cmap tables in the embedded fonts. Text meaning is completely different layer. This is what lets you search for text in the document, or lets you copy selected text and paste it in another app and it would look right. This is basically the inverse encoding: this codepoint maps to this Unicode string. This is done in the PDF itself. I'm simplifying here for the TTFunk/Prawn case. So back to your question. TTFunk subsetting will produce a bunch of fonts with simple encodings. All they are implemented already. No complex cmap encoding is required for Prawn to work. What you might need instead is tech Prawn to understand Unicode Variation Sequences so that it could put those into PDF so that copied text would look right. One thing I'm not sure about is how glyph selection is supposed to work. Usually cmap table provide one-codepoint to one-glyph mapping and all the fancy stuff happens in different tables like mort/morx, GDEF, and GSUB. I'm not sure why there are two mechanisms for looking up glyphs for multi-codepoint sequences. That said, Prawn/TTFunk doesn't support either of them at the moment. |
Beta Was this translation helpful? Give feedback.
-
Hello,
I am currently working on a project that involves generating PDFs with Prawn, which requires support for Unicode variation sequences. This necessitated support for Format 14 in the cmap table. To achieve this, I have made some modifications to both TTFunk and Prawn.
Here is the repository with my changes:
In the TTFunk repository, I wrote a short example (
test/output_ivs.rb
), and it seems to produce the correct output. Here are the before and after results:before
after
I tested the implementation and it seems to correctly output the desired PDF. However, there are still some uncertainties and issues like:
When creating the cmap for Format 14, it seems necessary to use other formats to assist in the process. Currently, I am using a global variable within the Format 12 implementation to achieve this. Is there a more appropriate way to handle this?
Thank you for taking the time to read this. Is there a better methods to enable TTFunk to support Format 14 effectively.
Beta Was this translation helpful? Give feedback.
All reactions