Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make literal and regex more flexible wrt whitespace #25

Open
inkytonik opened this issue Jul 25, 2014 · 2 comments
Open

Make literal and regex more flexible wrt whitespace #25

inkytonik opened this issue Jul 25, 2014 · 2 comments

Comments

@inkytonik
Copy link

At the moment the decision of whether or not to skip whitespace in literal and regex (RegexParsers.scala) is made on the basis of whether the skipWhitespace method returns true or not. Thus, the same decision is used for all occurrences of literal and regex in a parsing module.

Sometimes it is convenient to skip white space on most occasions, but in one or two places, not to skip it. The current design of literal and regex makes this impossible, I think, without duplicating some internal details of RegexParsers in the client code.

E.g., suppose we are parsing names that consist of an alphabetic part, followed by a numeric part, and we want to use the two parts separately after parsing. We could parse a whole name with one regex parser ("[A-Z]+[0-9]+".r) and then post-process the whole name to extract the two parts.

However, it is conceptually cleaner to recognise the two pieces with separate regexes so the two parts are delivered separately in one step without post-processing being required. Unfortunately, the obvious parser

"[A-Z]+".r ~ "[0-9]+".r

does not work because the second parser will skip whitespace (assuming that we haven't altered the default behaviour). We can't just turn of whitespace processing in the module since we want the first parser here (and probably many others) to skip white space.

One solution is to create a version of regex (and similarly, literal) that does not perform whitespace handling and use that for the second parser above. However, this cannot be done in user code easily since we would have to duplicate elements of the RegexParsers implementation such as most of the regex method and the (private) class SubSequence.

A better approach would seem to be to extend the existing regex and literal methods to have an optional Boolean argument that specifies for a particularly call whether the module-wide whitespace handling should be performed or not. Then the above spec would be something like

"[A-Z]+".r ~ regex ("[0-9]+".r, handleWhiteSpace = false)

which is verbose but does the trick.The verbosity should not be a problem since it can be hidden behind another name, and this situation is rare anyway.

I'm interested in feedback on whether something like this would be supported in the library, or if there another approach to handling this issue that I've missed. I can submit a pull request for the actual change if there is support.

@gourlaysama
Copy link
Contributor

I agree, whitespace handling could be improved a lot. Not to mention it will skip whitespace only if you call methods (or use implicits) defined in RegexParsers; but if you call methods from the underlying Parser class, suddenly there is no more whitespace handling. See SI-6491 for a sad example of this.

Thank you for looking into this! I like the idea of being able to specify whitespace handling locally, but then why limit it to regex and literal?
I guess regex, literal (and positioned, the only three methods to use handleWhiteSpace) could all take an optional parameter for this. But I would also like to investigate how we could make it "fit in" better with the whole thing (maybe as a unary combinator?).

So yes, one way or another, this would be a good addition to the library. If you have any ideas, suggestions, or would like to have a shot at this, please go ahead :)

GoogleCodeExporter pushed a commit to seanjensengrey/kiama that referenced this issue Mar 14, 2015
…la parser combinator code

- in the longer term we are trying to have the parser combinator library support more flexile whitespace handling
scala/scala-parser-combinators#25
@inkytonik
Copy link
Author

Just letting you know that I'm unlikely to get much time to properly address this problem (as you might have noticed since the ticket has been here for a while). If anyone else wants to have a go, please go ahead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants