Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential bug with indirect left-recursion #247

Open
LPeter1997 opened this issue Nov 6, 2019 · 2 comments
Open

Potential bug with indirect left-recursion #247

LPeter1997 opened this issue Nov 6, 2019 · 2 comments

Comments

@LPeter1997
Copy link

LPeter1997 commented Nov 6, 2019

I've isolated a 3-rule pattern, where I'd expect the parser to succeed for a given input, but it fails instead. The sample grammar:

RuleA ::= RuleB.
RuleB ::= IDENTIFIER
        | RuleC '.' IDENTIFIER
        .
RuleC ::= RuleB
        | RuleA
        .

The implementation:

import scala.util.parsing.combinator._
import scala.util.parsing.combinator.syntactical.StandardTokenParsers

object SimpleParser extends StandardTokenParsers with PackratParsers {
  lexical.delimiters ++= List(".", "$")

  lazy val ruleA: PackratParser[String] = ruleB <~ "$"
  lazy val ruleB: PackratParser[String] = (ident
                                       ||| ruleC ~> "." ~> ident)
  lazy val ruleC: PackratParser[String] = (ruleB
                                       ||| ruleA)

  def main(args: Array[String]) = {
    println(ruleA(new PackratReader(new lexical.Scanner("x.x$"))))
  }
}

It fails for input x.x$, telling me that it expects a $ instead of the ..

For me, this seems to be a problem with how indirect left-recursion is handled. I'm not sure if the original algorithm is incapable of handling this pattern, or this is an implementation bug.

Edit:
I've accidentally used | (first matching alt.) instead of ||| (longest matching alt.), I've fixed that in the code, but doesn't change the outcome.

@LPeter1997
Copy link
Author

Perhaps an interesting/helpful finding: Introducing an "alias" rule for RuleB seems to successfully parse x.x$. The grammar:

RuleA ::= RuleB.
RuleB ::= IDENTIFIER
        | RuleC '.' IDENTIFIER
        .
RuleB' ::= IDENTIFIER
         | RuleC '.' IDENTIFIER
         .
RuleC ::= RuleB'
        | RuleA
        .

Implementation:

lazy val ruleA: PackratParser[String] = ruleB <~ "$"
lazy val ruleB: PackratParser[String] = (ident
                                     ||| ruleC ~> "." ~> ident)
lazy val ruleBvar: PackratParser[String] = (ident
                                        ||| ruleC ~> "." ~> ident)
lazy val ruleC: PackratParser[String] = (ruleBvar
                                     ||| ruleA)

However, it still rejects a valid input, x.x.x$.
This makes me further believe that the problem is with indirect left-recursion.

@toddobryan
Copy link

I think the problem is the one identified in this paper: https://tratt.net/laurie/research/pubs/html/tratt__direct_left_recursive_parsing_expression_grammars/

That talks about an interaction with left- and right-hand recursion and identifies a problem with the Warth et al algorithm used in the PackratParsers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants