Typedbuffer and low level parsers #45

npenin · 2021-05-09T19:09:01Z

This PR is dependent on #35 . It basically allows for typed buffer. Along with some textspan (renamed to bufferspan) improvements, it allows to use byte instead of char, opening the possibility for low level parsers like network/serial protocol parsers.

splitted ParseContext and ParseContext.Untyped

…ope2

…uffer

sebastienros

Isn't there a way to keep TextSpan : BufferSpan<char> ? And have dedicated methods for string in here, used by the parser that deal with char/string?

sebastienros · 2021-05-09T21:27:53Z

src/Parlot/Character.cs

@@ -45,7 +45,7 @@ public static bool IsWhiteSpaceOrNewLine(char ch)
        public static bool IsNewLine(char ch)
            => (ch == '\n') || (ch == '\r') || (ch == '\v');

-        public static char ScanHexEscape(string text, int index, out int length)
+        public static char ScanHexEscape(char[] text, int index, out int length)


why not ScanHexEscape(BufferSpan<char> text, int index, out int length) so it doesn't have to allocate a char[]

it will have to allocate a char[] as soon as you will have an escape

but here it has to allocate two of them, the argument, and the returned value.

not really: ScanHexEscape just returns a char, so no allocation is done. And from the 2 places it is being called from, that's where the unescaping happens

sebastienros · 2021-05-09T21:28:34Z

src/Parlot/Character.cs

@@ -68,75 +68,69 @@ public static char ScanHexEscape(string text, int index, out int length)
            return (char)code;
        }

-        public static TextSpan DecodeString(string s) => DecodeString(new TextSpan(s));
+        public static BufferSpan<char> DecodeString(string s) => DecodeString(s.ToCharArray());


Suggested change

public static BufferSpan<char> DecodeString(string s) => DecodeString(s.ToCharArray());

public static BufferSpan<char> DecodeString(string s) => DecodeString(new BufferSpan<char>(s));

I thought so initially, but eventually, you need to build a char[] as you might be removing some characters if there are any escape to happen

we may improve it in the case of Span support to avoid this ToCharArray call though

sebastienros · 2021-05-09T21:46:17Z

src/Parlot/Fluent/Separated.cs

@@ -5,18 +5,20 @@

 namespace Parlot.Fluent
 {
-    public sealed class Separated<U, T> : Parser<List<T>>, ICompilable
+    public sealed class Separated<U, T, TParseContext, TChar> : Parser<List<T>, TParseContext, TChar>, ICompilable<TParseContext, TChar>
+    where TParseContext : ParseContextWithScanner<Scanner<TChar>, TChar>


ParseContextWithScanner should only be necessary in low level parsers that have to deal with chars (Literals/Terms).

I thouht about it too, but it needs to have a scanner to reset position

and getting rid of the type makes it hard to apply Then/And/Or/... with "really scanner dependent" parsers

Can't there be a base abstract type-less Scanner that can reset the position?

I was then worried about the dispatch performance (as I faced it for interfaces earlier)

On the scanner, that is hardly doable. You may want to do it on ParseContext but it also needs to get the Cursor position.

src/Parlot/Fluent/DecimalLiteral.cs

npenin · 2021-05-10T05:52:54Z

Isn't there a way to keep TextSpan : BufferSpan ? And have dedicated methods for string in here, used by the parser that deal with char/string?

Not if you intend to keep BufferSpan a struct. The way I did it was using some extension methods for BufferSpan

…uffer

removing useless ToString

sebastienros · 2021-05-10T17:29:43Z

I think we should also think about how we could pass a stream or PipeReader (https://docs.microsoft.com/en-us/dotnet/api/system.io.pipelines.pipereader?view=dotnet-plat-ext-5.0) as a source.

npenin · 2021-05-10T18:51:11Z

We should first start using SequenceReader (https://docs.microsoft.com/fr-fr/dotnet/api/system.buffers.sequencereader-1?view=net-5.0) instead of the Cursor

ToCSharp · 2021-12-04T11:35:31Z

I think we should also think about how we could pass a stream or PipeReader (https://docs.microsoft.com/en-us/dotnet/api/system.io.pipelines.pipereader?view=dotnet-plat-ext-5.0) as a source.

Please, look at https://github.com/ToCSharp/paspan. It is fork of Parlot, based on Spans.
I tryed to speed up huge logs parsing. Using PipeReader is much slower then reading whole file and then parse. Because it is async and it takes time for reader to say writer to read more, may be for socket streams it will be usefull.
Other way of parsing file with FileStream and PaspanMultiSegment(SequenceReader) is in TODO.

sebastienros · 2021-12-05T17:35:01Z

@ToCSharp interesting, some questions:

Aren't pipe readers async by nature? That would make everything async in Parlot.
How do you handle back-tracking? A PipeReader will help for back pressure, but when it has to revert to a previous point, you need the bugger in memory, so I don't see how to prevent allocating the whole buffer in memory. An option would be to define a custom buffer length (cyclic). and if it's not sufficient throw an exception. Maybe the solution is only in abstracting the Reader, such that any strategy can be used (FileStream, PipeReader, Buffer).
I was thinking that maybe Parlot should come in two flavors, bytes or chars, I see you went "bytes only". Would it be a viable alternative? It would require some allocations though for sources that come as string. Or maybe using a reader that converts chars to bytes on the fly. I find it interesting because it allows to only deal with one type (byte) and we don't need to use a TChar like in this PR, or split Parlot in two (bytes vs chars). A SkipWhitespace for instance works can work with bytes. We could still split some Parsers for chars, bytes, or logic flow (separated, zero or many, ...)

I just realized that you took some code from Utf8JsonReader.

npenin and others added 15 commits March 16, 2021 19:52

added scoping mechanism to parsers

f63fa58

code formatting

4f76cea

implemented strongly typed parsecontext

ec9d71d

splitted ParseContext and ParseContext.Untyped

Merge branch 'main' of https://github.com/sebastienros/parlot into sc…

795ed1e

…ope2

Merge main branch

0aaa2a6

Merge branch 'main' of https://github.com/sebastienros/parlot into sc…

c691276

…ope2

fixed compilation

8cdf179

Fixing benchmarks compilation

358143d

removed interface to restore performances

f18fd9a

improved scoping usage

1d8c924

Merge branch 'main' into scope2

0eb62bd

typed scanner and cursors (to allow for byte parsing)

3d70885

fixed compilation after merge from main

42dafd4

Merge branch 'scope2' of https://github.com/npenin/parlot into typedb…

a2d11e2

…uffer

fixed benchmark compilation

00ceccb

npenin changed the title ~~Typedbuffer~~ Typedbuffer and low level parsers May 9, 2021

npenin added 2 commits May 9, 2021 21:17

normalized tchar generics

be8c387

simplified parsecontextwithscanner

a0dafb3

sebastienros reviewed May 9, 2021

View reviewed changes

npenin added 4 commits May 10, 2021 08:49

renamed TextSpan to bufferspan

4787c8b

renamed ParseContext<> to ScopeParseContext

0ec9a68

Merge branch 'scope2' of https://github.com/npenin/parlot into typedb…

b253671

…uffer

merge from scope branch

0e3d03f

removing useless ToString

improved compilation

ceac2e6

remove code duplication and improve number parsing

63fa49d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Typedbuffer and low level parsers #45

Typedbuffer and low level parsers #45

npenin commented May 9, 2021

sebastienros left a comment

sebastienros May 9, 2021

npenin May 10, 2021

sebastienros May 10, 2021

npenin May 10, 2021

sebastienros May 9, 2021

npenin May 10, 2021

npenin May 10, 2021 •

edited

Loading

sebastienros May 9, 2021

npenin May 10, 2021

npenin May 10, 2021

sebastienros May 10, 2021

npenin May 10, 2021

npenin May 10, 2021

npenin commented May 10, 2021 •

edited

Loading

sebastienros commented May 10, 2021

npenin commented May 10, 2021

ToCSharp commented Dec 4, 2021 •

edited

Loading

sebastienros commented Dec 5, 2021

	public static BufferSpan<char> DecodeString(string s) => DecodeString(s.ToCharArray());
	public static BufferSpan<char> DecodeString(string s) => DecodeString(new BufferSpan<char>(s));

Typedbuffer and low level parsers #45

Are you sure you want to change the base?

Typedbuffer and low level parsers #45

Conversation

npenin commented May 9, 2021

sebastienros left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

npenin May 10, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

npenin commented May 10, 2021 • edited Loading

sebastienros commented May 10, 2021

npenin commented May 10, 2021

ToCSharp commented Dec 4, 2021 • edited Loading

sebastienros commented Dec 5, 2021

npenin May 10, 2021 •

edited

Loading

npenin commented May 10, 2021 •

edited

Loading

ToCSharp commented Dec 4, 2021 •

edited

Loading