-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Create OOB package for Rune, System.Text.Unicode.Utf8, and related APIs #52947
Comments
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label. |
The naming pattern we have established for netstandard2.0 polyfills like this is |
Thanks for the tip! I've updated the proposal to read Microsoft.Bcl.Unicode. |
@GrabYourPitchforks is your idea to ship this new package in the 6.0 wave? I ask just to set the milestone. @terrajobst what do you think? This is not really an API Proposal but just a new package proposal so not sure what the right process is for approving something like this. |
@joperezr 6.0 would be ideal, yes, so that our existing packages could remove their copies of this code and take the dependency appropriately. |
@GrabYourPitchforks have made any progress? Is this still planned for 6.0? |
@krwq I made significant progress on this (see here), but since we're so late in the process I don't think we can hit 6.0. Spoke with Jeff about this as well, and will spend a little time looking through other code bases to see if they can benefit from these types or whether we should change which APIs are available in these packages. |
@GrabYourPitchforks I'll move this to 7.0.0, please let us know if there are any updates here. |
There hasn't been much demand for this posted on this issue, but we're not quite ready to close it out. We'd like to invite input and requests on this issue. If you have scenarios where these APIs would be valuable in downlevel scenarios, please share your needs here. If we don't hear of substantial needs over the next couple months, we will close this out. |
Azure SDK does not need the Rune APIs, but we do need the UTF8 buffer-based APIs that Levi worked on. (including but not only the Utf8 static class above) It really would be good to have them in the platform. |
I came across this issue while working on #90352 -- briefly contemplated writing yet another Rune polyfill in STJ. |
This would be helpful for me too as I have my own polyfill for Rune as well. |
Let's discuss this for .NET 10. Looks like we have some internal dependencies on this now, which would be a good driver to make progress here. |
namespace System.Text.Unicode
{
public static partial class Utf8
{
public static OperationStatus FromUtf16(ReadOnlySpan<char> source, Span<byte> destination, out int charsRead, out int bytesWritten, bool replaceInvalidSequences = true, bool isFinalBlock = true) { throw null; }
public static OperationStatus ToUtf16(ReadOnlySpan<byte> source, Span<char> destination, out int bytesRead, out int charsWritten, bool replaceInvalidSequences = true, bool isFinalBlock = true) { throw null; }
public static bool IsValid(ReadOnlySpan<byte> value);
}
} |
Problem summary
While working on the System.Text.Encodings.Web refactoring I noticed that we have several duplicates of the
System.Text.Rune
type (or its backing logic) throughout our projects.The official implementation exposed by System.Runtime:
A copy used by the Utf8String OOB:
Multiple copies used by System.Text.Encodings.Web:
Duplicated logic in System.Text.Json:
runtime/src/libraries/System.Text.Json/src/System/Text/Json/Reader/JsonReaderHelper.Unescaping.cs
Lines 378 to 479 in 79ae74f
System.Text.Json also has some duplicated logic from what turned into the
System.Text.Unicode.Utf8
helper APIs:Proposal
I propose to create a standalone Microsoft.Bcl.Unicode package which includes the functionality of the
Rune
type plus select helper methods. The exposed API surface would be as follows.The package should target the frameworks net461, netstandard2.0, and netcoreapp3.1. When running on .NET Core 3.1+, the implementations are largely removed and the package type-forwards into the SDK.
Consumers like System.Text.Json and System.Text.Encodings.Web do not need to include references to this package as part of their ref set. However, when compiling for targets before .NET Core 3.1, the implementation assemblies would have a reference to this pacakge.
Q&A
Can we use an existing package instead of creating a new Microsoft.Bcl.Unicode package? Probably, if that's desired. If we wanted to use an existing package, System.Runtime.Extensions and System.Text.Encoding.Extensions would probably be the obvious candidates. But these assemblies have existing sometimes-OOB-sometimes-inbox behavior, which also affects things like versioning. As part of the SDK, there may be implicit references to the APIs contained within these packages, which could cause problems for
MemoryExtensions
andStringExtensions
(see below). In my ideal world, Microsoft.Bcl.Unicode would be a fully-OOB package that would never be included in the inbox SDK, as it's quite literally a polyfill / point-in-time release.If we did add these APIs to System.Text.Encoding.Extensions or System.Runtime.Extensions, we would need to add a reference from those packages to System.Memory, which contains
OperationStatus
and[ReadOnly]Span<T>
, since these are exposed via the public API surface.Are
MemoryExtensions
andStringExtensions
a good idea? The proposal above introduces them into the System namespace so that they look like the shipped .NET Core 3.1+ APIs. The type nameMemoryExtensions
is clearly already taken, but since most callers should invoke the APIs via extension method syntax rather than typical static method invocation syntax, I don't believe this will cause conflicts in practice. The only time it could cause conflicts is if a caller is targeting .NET Core 3.1+ and also has referenced this package explicitly.If desired, we can also [Obsolete] the APIs in the .NET Core 3.1+ ref, which signals to library authors that they should only be pulling in this package when compiling for downlevel targets. If they're cross-compiling for multiple targets, they should configure their environment not to include this package reference in later APIs. We can even delete the APIs from the ref but leave them in the implementation, which forbids compiling against them but won't block existing compiled code from binding against it during load.
What about new APIs being introduced? For example, there may be desire for downlevel customers to call APIs introduced in #28230, which is slated for introduction in the 6.0 timeframe. We can take these on a case-by-case basis; but generally speaking, there's nothing stopping us from utilizing the same techniques here. Implement the APIs to the best of our ability downlevel - which may involve leaving off some of the API surface - and type-forward on compatible runtimes.
What about the APIs added under System.Globalization? APIs like
CompareInfo.IndexOf(ReadOnlySpan<char>, Rune, ...)
cannot be implemented out-of-band without taking a significant performance hit. As it is, the OOB implementation (not the inbox implementation) ofSystem.Text.Rune
may already allocate for scenarios likeRune.GetUnicodeCategory(new Rune(0x10000))
, but we would try to avoid allocations in the common case. For Rune-enlightened APIs onCompareInfo
, we would not be able to avoid this allocation under even the most basic of scenarios. I don't think we should attempt to implement such APIs out-of-band.The text was updated successfully, but these errors were encountered: