Make literal and regex more flexible wrt whitespace #25

inkytonik · 2014-07-25T03:16:27Z

At the moment the decision of whether or not to skip whitespace in literal and regex (RegexParsers.scala) is made on the basis of whether the skipWhitespace method returns true or not. Thus, the same decision is used for all occurrences of literal and regex in a parsing module.

Sometimes it is convenient to skip white space on most occasions, but in one or two places, not to skip it. The current design of literal and regex makes this impossible, I think, without duplicating some internal details of RegexParsers in the client code.

E.g., suppose we are parsing names that consist of an alphabetic part, followed by a numeric part, and we want to use the two parts separately after parsing. We could parse a whole name with one regex parser ("[A-Z]+[0-9]+".r) and then post-process the whole name to extract the two parts.

However, it is conceptually cleaner to recognise the two pieces with separate regexes so the two parts are delivered separately in one step without post-processing being required. Unfortunately, the obvious parser

"[A-Z]+".r ~ "[0-9]+".r

does not work because the second parser will skip whitespace (assuming that we haven't altered the default behaviour). We can't just turn of whitespace processing in the module since we want the first parser here (and probably many others) to skip white space.

One solution is to create a version of regex (and similarly, literal) that does not perform whitespace handling and use that for the second parser above. However, this cannot be done in user code easily since we would have to duplicate elements of the RegexParsers implementation such as most of the regex method and the (private) class SubSequence.

A better approach would seem to be to extend the existing regex and literal methods to have an optional Boolean argument that specifies for a particularly call whether the module-wide whitespace handling should be performed or not. Then the above spec would be something like

"[A-Z]+".r ~ regex ("[0-9]+".r, handleWhiteSpace = false)

which is verbose but does the trick.The verbosity should not be a problem since it can be hidden behind another name, and this situation is rare anyway.

I'm interested in feedback on whether something like this would be supported in the library, or if there another approach to handling this issue that I've missed. I can submit a pull request for the actual change if there is support.

The text was updated successfully, but these errors were encountered:

gourlaysama · 2014-07-28T13:51:51Z

I agree, whitespace handling could be improved a lot. Not to mention it will skip whitespace only if you call methods (or use implicits) defined in RegexParsers; but if you call methods from the underlying Parser class, suddenly there is no more whitespace handling. See SI-6491 for a sad example of this.

Thank you for looking into this! I like the idea of being able to specify whitespace handling locally, but then why limit it to regex and literal?
I guess regex, literal (and positioned, the only three methods to use handleWhiteSpace) could all take an optional parameter for this. But I would also like to investigate how we could make it "fit in" better with the whole thing (maybe as a unary combinator?).

So yes, one way or another, this would be a good addition to the library. If you have any ideas, suggestions, or would like to have a shot at this, please go ahead :)

…la parser combinator code - in the longer term we are trying to have the parser combinator library support more flexile whitespace handling scala/scala-parser-combinators#25

inkytonik · 2015-08-20T20:03:17Z

Just letting you know that I'm unlikely to get much time to properly address this problem (as you might have noticed since the ticket has been here for a while). If anyone else wants to have a go, please go ahead.

gourlaysama added the enhancement label Aug 6, 2014

gourlaysama mentioned this issue Aug 6, 2014

err Not Working as (I) Expected #29

Closed

peteraldous mentioned this issue Jun 10, 2022

A proposed new interface for skipping characters automatically #464

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make literal and regex more flexible wrt whitespace #25

Make literal and regex more flexible wrt whitespace #25

inkytonik commented Jul 25, 2014

gourlaysama commented Jul 28, 2014

inkytonik commented Aug 20, 2015

Make literal and regex more flexible wrt whitespace #25

Make literal and regex more flexible wrt whitespace #25

Comments

inkytonik commented Jul 25, 2014

gourlaysama commented Jul 28, 2014

inkytonik commented Aug 20, 2015