-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make literal and regex more flexible wrt whitespace #25
Comments
I agree, whitespace handling could be improved a lot. Not to mention it will skip whitespace only if you call methods (or use implicits) defined in Thank you for looking into this! I like the idea of being able to specify whitespace handling locally, but then why limit it to So yes, one way or another, this would be a good addition to the library. If you have any ideas, suggestions, or would like to have a shot at this, please go ahead :) |
…la parser combinator code - in the longer term we are trying to have the parser combinator library support more flexile whitespace handling scala/scala-parser-combinators#25
Just letting you know that I'm unlikely to get much time to properly address this problem (as you might have noticed since the ticket has been here for a while). If anyone else wants to have a go, please go ahead. |
At the moment the decision of whether or not to skip whitespace in literal and regex (RegexParsers.scala) is made on the basis of whether the skipWhitespace method returns true or not. Thus, the same decision is used for all occurrences of literal and regex in a parsing module.
Sometimes it is convenient to skip white space on most occasions, but in one or two places, not to skip it. The current design of literal and regex makes this impossible, I think, without duplicating some internal details of RegexParsers in the client code.
E.g., suppose we are parsing names that consist of an alphabetic part, followed by a numeric part, and we want to use the two parts separately after parsing. We could parse a whole name with one regex parser ("[A-Z]+[0-9]+".r) and then post-process the whole name to extract the two parts.
However, it is conceptually cleaner to recognise the two pieces with separate regexes so the two parts are delivered separately in one step without post-processing being required. Unfortunately, the obvious parser
does not work because the second parser will skip whitespace (assuming that we haven't altered the default behaviour). We can't just turn of whitespace processing in the module since we want the first parser here (and probably many others) to skip white space.
One solution is to create a version of regex (and similarly, literal) that does not perform whitespace handling and use that for the second parser above. However, this cannot be done in user code easily since we would have to duplicate elements of the RegexParsers implementation such as most of the regex method and the (private) class SubSequence.
A better approach would seem to be to extend the existing regex and literal methods to have an optional Boolean argument that specifies for a particularly call whether the module-wide whitespace handling should be performed or not. Then the above spec would be something like
which is verbose but does the trick.The verbosity should not be a problem since it can be hidden behind another name, and this situation is rare anyway.
I'm interested in feedback on whether something like this would be supported in the library, or if there another approach to handling this issue that I've missed. I can submit a pull request for the actual change if there is support.
The text was updated successfully, but these errors were encountered: