-
-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes for rewrite_repeat #250
base: main
Are you sure you want to change the base?
Conversation
Thinking about this a bit more, it occurs to me that there are more cases that weren't handled, and aren't handled by my PR. |
4bf9161
to
6ca0b11
Compare
I have updated the PR to handle this and included a comment in the code explaining when and why the transformation is valid. |
Hm thank you, I'll try to re-think some of this, too. Maybe we can find a few more situations to cover with tests, although I did think I had covered all combinations there :/ |
To be pedantic, the PCRE (v2) syntax does not allow
The PCRE spec doesn't clearly state the grammar, so it took a bit of looking, but the section on repetition is quite clear:
The question becomes... does libfsm want to support this as an addition? |
Not sure why I was thinking that this applied to PCRE. Using |
That's fine! Nothing to apologise about!! |
Yeah, that's fine, that's a good observation to make. It still needs to be handled in the AST, but perhaps all the existing PCRE does allow |
A brute force search found one edge case that would be mishandled if it hadn't already been handled by earlier code: it is invalid to rewrite
This effectively treats This could be changed to
to fix this, but as it can never come up, and the rewritten logic is harder to understand, I do not know whether it is worth the effort. The brute force search found no cases that my check fails to handle. I searched all possibilities for upper bounds up to 10, plus
|
Note to myself; I'm just waiting on thinking a bit about some of the AST behaviour here before I go ahead and merge this. |
rewrite_repeat, used for rewriting nested repeats contained a comment that said repeats can only be combined if the range of the result is not more than the sum of the ranges of the inputs, but the assertion actually tested the opposite: it tested that the range of the result is more than the sum of the ranges of the inputs. This assertion does not universally hold either way: a{2}{2} can be rewritten to a{4}, but the range as calculated of the latter (0) is equal to the sum of the ranges of the inputs (also 0). Separate from the assert, the rewrite was not performed in some cases where it is valid to do so. It is valid whenever the inner repeat has a lower bound of 0 or 1, whenever the inner repeat has no upper bound and the outer repeat has a positive lower bound, and whenever the outer range has a lower bound equal to the upper bound. The last case was missing. An example is a{2,3}{2}, which would previously be preserved in that form, but is now rewritten as a{4,6}.
6ca0b11
to
977f223
Compare
rewrite_repeat, used for rewriting nested repeats contained a comment
that said repeats can only be combined if the range of the result is not
more than the sum of the ranges of the inputs, but the assertion
actually tested the opposite: it tested that the range of the result is
more than the sum of the ranges of the inputs. This assertion does not
universally hold either way: a{2}{2} can be rewritten to a{4}, but the
range as calculated of the latter (0) is equal to the sum of the ranges
of the inputs (also 0).
Separate from the assert, the rewrite was not performed in some cases
where it is valid to do so. It is valid whenever the inner repeat has
a lower bound of 0 or 1, whenever the inner repeat has no upper bound
and the outer repeat has a positive lower bound, and whenever the outer
range has a lower bound equal to the upper bound. The last case was
missing. An example is a{2,3}{2}, which would previously be preserved
in that form, but is now rewritten as a{4,6}.