Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

We should have APIs that accept potentially-invalid UTF8 #135

Open
Manishearth opened this issue Apr 30, 2024 · 1 comment
Open

We should have APIs that accept potentially-invalid UTF8 #135

Manishearth opened this issue Apr 30, 2024 · 1 comment

Comments

@Manishearth
Copy link
Member

Our UTF16 API is able to handle invalid UTF16 by pretending unpaired surrogates are U+FFFD REPLACEMENT CHARACTERs.

It would be nice to be able to do the same for the UTF8 API.

We could implement TextSource for [u8] and have a whole other set of BidiInfo copies for unvalidated UTF8.

We could also have .char_indices() bail on the first non-UTF8, and then use the same BidiInfo with a separate constructor that is documented to accept invalid UTF8 and truncate the returned levels based on that.

@Manishearth
Copy link
Member Author

cc @robertbastian

I'm inclined to do the "bail on first non-UTF8" thing for now since it's a smaller change.

If we ever 2.0, we should make this code generic over encodings

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant