-
-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Typedbuffer and low level parsers #45
base: main
Are you sure you want to change the base?
Conversation
splitted ParseContext and ParseContext.Untyped
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't there a way to keep TextSpan : BufferSpan<char>
? And have dedicated methods for string
in here, used by the parser that deal with char/string?
@@ -45,7 +45,7 @@ public static bool IsWhiteSpaceOrNewLine(char ch) | |||
public static bool IsNewLine(char ch) | |||
=> (ch == '\n') || (ch == '\r') || (ch == '\v'); | |||
|
|||
public static char ScanHexEscape(string text, int index, out int length) | |||
public static char ScanHexEscape(char[] text, int index, out int length) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not ScanHexEscape(BufferSpan<char> text, int index, out int length)
so it doesn't have to allocate a char[]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it will have to allocate a char[] as soon as you will have an escape
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but here it has to allocate two of them, the argument, and the returned value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not really: ScanHexEscape just returns a char, so no allocation is done. And from the 2 places it is being called from, that's where the unescaping happens
@@ -68,75 +68,69 @@ public static char ScanHexEscape(string text, int index, out int length) | |||
return (char)code; | |||
} | |||
|
|||
public static TextSpan DecodeString(string s) => DecodeString(new TextSpan(s)); | |||
public static BufferSpan<char> DecodeString(string s) => DecodeString(s.ToCharArray()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public static BufferSpan<char> DecodeString(string s) => DecodeString(s.ToCharArray()); | |
public static BufferSpan<char> DecodeString(string s) => DecodeString(new BufferSpan<char>(s)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought so initially, but eventually, you need to build a char[] as you might be removing some characters if there are any escape to happen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we may improve it in the case of Span support to avoid this ToCharArray call though
src/Parlot/Fluent/Separated.cs
Outdated
@@ -5,18 +5,20 @@ | |||
|
|||
namespace Parlot.Fluent | |||
{ | |||
public sealed class Separated<U, T> : Parser<List<T>>, ICompilable | |||
public sealed class Separated<U, T, TParseContext, TChar> : Parser<List<T>, TParseContext, TChar>, ICompilable<TParseContext, TChar> | |||
where TParseContext : ParseContextWithScanner<Scanner<TChar>, TChar> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ParseContextWithScanner
should only be necessary in low level parsers that have to deal with chars (Literals/Terms).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thouht about it too, but it needs to have a scanner to reset position
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and getting rid of the type makes it hard to apply Then/And/Or/... with "really scanner dependent" parsers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't there be a base abstract type-less Scanner that can reset the position?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was then worried about the dispatch performance (as I faced it for interfaces earlier)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the scanner, that is hardly doable. You may want to do it on ParseContext but it also needs to get the Cursor position.
Not if you intend to keep BufferSpan a struct. The way I did it was using some extension methods for BufferSpan |
I think we should also think about how we could pass a stream or PipeReader (https://docs.microsoft.com/en-us/dotnet/api/system.io.pipelines.pipereader?view=dotnet-plat-ext-5.0) as a source. |
We should first start using SequenceReader (https://docs.microsoft.com/fr-fr/dotnet/api/system.buffers.sequencereader-1?view=net-5.0) instead of the Cursor |
Please, look at https://github.com/ToCSharp/paspan. It is fork of Parlot, based on Spans. |
@ToCSharp interesting, some questions:
I just realized that you took some code from Utf8JsonReader. |
This PR is dependent on #35 . It basically allows for typed buffer. Along with some textspan (renamed to bufferspan) improvements, it allows to use byte instead of char, opening the possibility for low level parsers like network/serial protocol parsers.