Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrote small_caps check (and a minor case_mapping improvement) #4721

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

yanone
Copy link
Collaborator

@yanone yanone commented May 15, 2024

Description

(Issue #4713)

Rewrote the entirely useless small_caps check (0 FAILs on GF library) from scratch. It now yields for GF:

    ERROR: 0
    FATAL: 0
    FAIL: 464
    WARN: 0
    INFO: 0
    SKIP: 2853
    PASS: 12

Indeed, missing smcp or c2sc substitutions are very common. Only 2.5% of fonts with those features are completely defined. I manually cross-checked some of the fonts. In many cases, the glyphs are present but missing from the feature code.

Still, marked the check as experimental for good measure (especially since we may need a few more hard-coded exceptions here and there).

Also, moved the check from GF to universal profile.

Since I'm now excluding incomplete legacy Greek (mu, Ohm, etc.) dynamically, I implemented the same for the case_mapping check. Hard-coding exceptions for Greek will automatically exclude them from being checked for fonts that actually support Greek. The basic Greek alphabet is 24+24 characters, so as soon as Greek support is below 24, it's now being excluded dynamically (= whatever Greek is in the font).

Checklist

  • update CHANGELOG.md
  • wait for the tests to pass
  • request a review

Closes #4713

@yanone yanone marked this pull request as ready for review May 15, 2024 15:03
@yanone yanone requested a review from felipesanches May 15, 2024 15:03
@dscorbett
Copy link
Member

Anything using characters_per_script should have the network condition, because it uses youseedee, which downloads Unicode files at runtime.

@yanone
Copy link
Collaborator Author

yanone commented May 15, 2024

@dscorbett Excellent catch; I added that.

@simoncozens Getting extended character metadata is probably the reason you implemented youseedee in the first place, but what do you think of https://github.com/iwsfutcmd/unicodedataplus (on par with Unicode 15) or even https://github.com/fonttools/unicodedata2 (says Unicode 13 in the README) for getting a character's script?

Update: looking at unicodedata2's code it appears as not supporting 'script'.

@yanone
Copy link
Collaborator Author

yanone commented May 15, 2024

Since unicodedataplus has no external dependencies, I've refactored characters_per_script() to use that instead so it can work offline. Let me know what you guys think...

@yanone
Copy link
Collaborator Author

yanone commented May 15, 2024

The failed check says it can't import unicodedataplus.script, but I can't see why, especially when the majority of tests pass. Need an adult here.

@simoncozens
Copy link
Collaborator

it uses youseedee, which downloads Unicode files at runtime.

Once, and then stores them.

I'm not convinced of the need to add another Unicode library to avoid a single network access.

@yanone
Copy link
Collaborator Author

yanone commented May 15, 2024

I'm not convinced of the need to add another

FWIW, youseedee was previously not part of the FB package. I had to add that explicitly.

Unicode library to avoid a single network access.

Yeah, that's the question. Aren't there maybe scenarios in wasm environments or so where network access is generally impossible? I don't have the overview. I would generally try to avoid network access, but if it would work in all major environments, I don't see a usability issue of downloading it once. It's more a technical question. For instance, it would be sad if such a check would not be available in fontbakery.com.

@simoncozens
Copy link
Collaborator

simoncozens commented May 15, 2024

FWIW, youseedee was previously not part of the FB package.

OK, yeah, not explicitly but it was a transitive dependency of stuff on the shaping profile.

Aren't there maybe scenarios in wasm environments or so where network access is generally impossible?

This is true, and more to the point the filesystem isn't available either. So maybe it's OK.

@dscorbett
Copy link
Member

pip install unicodedataplus failed for me on macOS 14.4.1 with “error: incompatible pointer to integer conversion returning 'void *' from a function with result type 'int' [-Wint-conversion]” while the installation script was running clang. I had to run CPPFLAGS=-Wno-int-conversion pip install unicodedataplus instead.

@yanone
Copy link
Collaborator Author

yanone commented May 15, 2024

I had to run CPPFLAGS=-Wno-int-conversion pip install unicodedataplus instead.

Is that a deal-breaker? I'm not a Python pro

@yanone
Copy link
Collaborator Author

yanone commented May 15, 2024

What do you guys think about bundling the Unicode file with Fontbakery for youseedee?

My guess is that Simon didn't want to have to make new packages for every Unicode update. Or are there licensing reasons?

@felipesanches
Copy link
Collaborator

felipesanches commented May 15, 2024

note: I have posted the comment below by mistake as an edit of the original post by @simoncozens. I apologize for that. I have already edited back the post to its original contents and here's my reply:


FWIW, youseedee was previously not part of the FB package.

OK, yeah, not explicitly but it was a transitive dependency of stuff on the shaping profile.

That's an interesting aspect of dependency management. I would say that if our code uses the dep's API, then we should explicitly list the dependency, even if it would already be installed as a "dep of a dep".

My reasoning is that one does not necessarily know what are the deps of the deps, as this can be considered an implementation detail of the dependency, and it would be more robust to treat the deps as black-boxes, from the point of view of dependency management.

@felipesanches
Copy link
Collaborator

What do you guys think about bundling the Unicode file with Fontbakery for youseedee?

My guess is that Simon didn't want to have to make new packages for every Unicode update. Or are there licensing reasons?

I'm also interested in hearing what @simoncozens thinks about this.

@simoncozens
Copy link
Collaborator

Yes, my idea was to have a Python package which would always install the latest Unicode version. I genuinely didn't have WASM on my radar when I wrote it. :-) Now I see that for that environment, something which bundles the data might be better.

@dscorbett
Copy link
Member

Some lowercase letters don’t have uppercase counterparts or vice versa, in which case 'smcp' and 'c2sc' can’t be assumed to be meaningful. Instead of checking a character’s general category, you should check that lowercasing or uppercasing it produces a different string.

I only brought up the CPPFLAGS issue because it is a surprising requirement that should be documented. The workaround is easy, so I wouldn’t consider it a deal-breaker if you choose to use that package.

@yanone yanone marked this pull request as draft May 24, 2024 15:45
felipesanches pushed a commit that referenced this pull request Sep 9, 2024
'case_mapping' check on Universal profile.

(PR #4721)
felipesanches pushed a commit that referenced this pull request Sep 9, 2024
'case_mapping' check on Universal profile.

(PR #4721)
felipesanches pushed a commit that referenced this pull request Sep 9, 2024
'case_mapping' check on Universal profile.

(PR #4721)
felipesanches pushed a commit that referenced this pull request Sep 9, 2024
'case_mapping' check on Universal profile.

(PR #4721)
felipesanches pushed a commit that referenced this pull request Sep 9, 2024
'case_mapping' check on Universal profile.

(PR #4721)
On the Universal profile:
  - missing_small_caps_glyphs: Rewrote it from scratch, marked it as **experimental**.

(issue #4713)
'case_mapping' check on Universal profile.

(PR #4721)
@felipesanches
Copy link
Collaborator

@yanone, I've just squashed it all into a couple commits, rebased it to latest main branch and fixed all conflicts.

The unicodedataplus problem with the CI job not finding the script sub-module was related to the lack of type hints for this library. So I placed a comment deactivating type checks on that import line.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[com.google.fonts/check/missing_small_caps_glyphs] Check passes for font missing Cyrillic small caps
4 participants