Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trailing-context is not working #1088

Closed
michab66 opened this issue Apr 8, 2023 · 3 comments
Closed

Trailing-context is not working #1088

michab66 opened this issue Apr 8, 2023 · 3 comments

Comments

@michab66
Copy link

michab66 commented Apr 8, 2023

I tried to use a trailing context rule like:

_letter = [a-zA-Z]
_letters = _letter*
_end   = "stop"
comment_text  = _letters/_end

This always results in a 'syntax error'. Actually, I was not able to successfully define any rule using a trailing context.

According to the JFlex manual, trailing context is supported, so it seems to be not working currently.

@lsf37
Copy link
Member

lsf37 commented Apr 8, 2023

Can you post a spec file with input that doesn't work for you as expected? The rules above are not in the JFlex syntax.

I'd expect the following to work:

%%

%public
%class Test
%debug

letters = [a-zA-Z]+
end = "stop"

%%

{letters} / {end}  { return 0; }

[^] { return 1; }

For input

somethingstop

you should get a single match for something, returning 0, and single matches afterwards for s, t, o, p.

@michab66
Copy link
Author

michab66 commented Apr 10, 2023

Sorry for being much too short.
Your example can be compiled.
Taking this as a starting point, I tried to use trailing-context in a macro, see the example below. This is not supported and results in the described syntax error.

%%

%public
%class Test
%debug

letters = [a-zA-Z]+
end = "stop"

stopped = {letters} / {end}

%%

stopped  { return 0; }

[^] { return 1; }

It seems that, as a special case, trailing context regular expressions cannot be used in macro definitions.

I think that it would make it simpler to define complex lexical specs if that would be no special case, so I propose that as a new feature for JFlex: Offer support for trailing context rules in macros.

Background: The lexer I'm working on can be seen here. This implements the lexical structure for the Scheme programming language v5. So far, it was possible to define all scanning expressions as macros, matching only the terminal tokens in the match section. This is now not longer the case.

The requirement for using trailing context came in with the new Scheme v7 feature of nested comments, I added the relevant part of the lexical structure spec (note the comment text rule):

<comment> −→ ; all subsequent characters up to a line ending
  | <nested comment>
  | #; <intertoken space> <datum>
<nested comment> −→ #| <comment text>
  <comment cont>* |#
<comment text> −→ <character sequence not containing
  #| or |#>
<comment cont> −→ <nested comment> <comment text>

@lsf37
Copy link
Member

lsf37 commented Apr 10, 2023

It seems that, as a special case, trailing context regular expressions cannot be used in macro definitions.

That is indeed the case, and error reporting for it is not exactly great. I've opened #1089 for that.

I think that it would make it simpler to define complex lexical specs if that would be no special case, so I propose that as a new feature for JFlex: Offer support for trailing context rules in macros.

This is going to be tricky, and I'm not sure yet it's worth it. The underlying problem is that the matching algorithm only works for trailing context at the top level (otherwise we're not matching a regular language any more). If we allow trailing contexts in macros, we can now syntactically get trailing context at any nesting level in an expression.

I guess we could allow it syntactically and have a semantic pass for checking that it is only used at the top level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants