Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Simplify spec's whitespace handling #360

Open
sanbox-irl opened this issue Mar 26, 2023 · 2 comments
Open

Proposal: Simplify spec's whitespace handling #360

sanbox-irl opened this issue Mar 26, 2023 · 2 comments

Comments

@sanbox-irl
Copy link

The yarn spinner spec currently defines whitespace as follows:

Whitespace is any non-visible character with a width greater than 0. Common whitespace encountered include the space and the tab.

This allows for valid, meaningful non-zero width characters, especially bare accent markers, to be considered "white space". The arabic letter mark 061C, for example, is whitespace according to this definition but not according to anyone else.

Instead, since Yarn already requires its input to be valid utf8, I suggest we punt to UTF8's own definitions of whitespace, found here: https://www.unicode.org/Public/UCD/latest/ucd/PropList.txt.

This is precisely what C#'s (https://learn.microsoft.com/en-us/dotnet/api/system.char.iswhitespace?view=net-7.0#remarks) white space already does, so we're in good company.

Therefore, I would suggest that the line be chnaged to:

Whitespace is, generally, any non-visible character with a width greater than 0. Common whitespace encountered include the space and the tab. Specifically, whitespace is specified in the [Unicode Character Database](https://www.unicode.org/reports/tr44/) [PropList.txt](https://www.unicode.org/Public/UCD/latest/ucd/PropList.txt).
@McJones
Copy link
Collaborator

McJones commented Mar 28, 2023

I am totally in favour of this, the Unicode consortium has already done the hard work and we should take advantage of this.

@sanbox-irl
Copy link
Author

I'll make a PR tonight then

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants