Skip to content

Commit

Permalink
Fixes for rewrite_repeat
Browse files Browse the repository at this point in the history
rewrite_repeat, used for rewriting nested repeats contained a comment
that said repeats can only be combined if the range of the result is not
more than the sum of the ranges of the inputs, but the assertion
actually tested the opposite: it tested that the range of the result is
more than the sum of the ranges of the inputs. This assertion does not
universally hold either way: a{2}{2} can be rewritten to a{4}, but the
range as calculated of the latter (0) is equal to the sum of the ranges
of the inputs (also 0).

Separate from the assert, the rewrite was not performed in some cases
where it is valid to do so. It is valid whenever the inner repeat has
a lower bound of 0 or 1, whenever the inner repeat has no upper bound
and the outer repeat has a positive lower bound, and whenever the outer
range has a lower bound equal to the upper bound. The last case was
missing. An example is a{2,3}{2}, which would previously be preserved
in that form, but is now rewritten as a{4,6}.
  • Loading branch information
hvdijk committed Oct 10, 2020
1 parent 901602d commit 977f223
Show file tree
Hide file tree
Showing 3 changed files with 31 additions and 38 deletions.
61 changes: 23 additions & 38 deletions src/libre/ast_rewrite.c
Original file line number Diff line number Diff line change
Expand Up @@ -410,48 +410,33 @@ rewrite_repeat(struct ast_expr *n, enum re_flags flags)
return 1;
}

if (h == 0 || h == 1) {
dead = n->u.repeat.e;

n->u.repeat.min = v;
n->u.repeat.max = w;
n->u.repeat.e = n->u.repeat.e->u.repeat.e;

dead->type = AST_EXPR_EMPTY;
ast_expr_free(dead);

return 1;
}

/*
* a{h,i}{j,k} is equivalent to a{h*j,i*k} if it's possible to combine,
* and it's possible iff the range of the result is not more than
* the sum of the ranges of the two inputs.
*/
if (i != AST_COUNT_UNBOUNDED && k != AST_COUNT_UNBOUNDED) {
/* I don't know why this is true */
assert(w - v > i - h + k - j);
}

/*
* a{h,}{j,}
* a{h,i}{j,}
* a{h,}{j,k}
* i.e. if for all n in [h*j,i*k], a{n} would be included in a{h,i}{j,k}.
* a{h*j} is trivially included, and for any a{n} that is included where
* n is not a multiple of i, a{n+1} is also included: if n is not a
* multiple of i, then one of the a{h,i} instances matched fewer than i
* times and can accept another a. The only case to consider is where n
* is a multiple of i and the rewritten expression would include a{n+1}.
*
* If j == k, this cannot be a problem. The only case where all instances
* of a{h,i} accept i instances of a is where n==i*j==i*k.
*
* If i == AST_COUNT_UNBOUNDED and j != 0, this cannot be a problem. It
* is impossible for any a{h,i} to accept i instances of a, so the problem
* never arises here either. Note the restriction on j != 0 though:
* a{2,}{0,} cannot be rewritten as a{0,}.
*
* If a{n} is included by m instances of a{i}, where j<=m<k, we know that
* a{n+1} is included by m+1 instances of a{h,i} if and only if
* h*(m+1) <= i*m+1, which simplifies to h*(j+1) <= i*j+1.
*
* Note that this last case includes all cases where h <= 1.
*/
if ((i == AST_COUNT_UNBOUNDED || k == AST_COUNT_UNBOUNDED) && h <= 1 && j <= 1) {
dead = n->u.repeat.e;

n->u.repeat.min = v;
n->u.repeat.max = w;
n->u.repeat.e = n->u.repeat.e->u.repeat.e;

dead->type = AST_EXPR_EMPTY;
ast_expr_free(dead);

return 1;
}

if (h > 1 && i == AST_COUNT_UNBOUNDED && j > 0) {
if (j == k
|| (i == AST_COUNT_UNBOUNDED && j > 0)
|| h * (j + 1) <= i * j + 1) {
dead = n->u.repeat.e;

n->u.repeat.min = v;
Expand Down
1 change: 1 addition & 0 deletions tests/pcre-repeat/in55.re
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
^(?:a{2}){2}$
7 changes: 7 additions & 0 deletions tests/pcre-repeat/out55.fsm
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
0 -> 1 'a';
1 -> 2 'a';
2 -> 3 'a';
3 -> 4 'a';

start: 0;
end: 4;

0 comments on commit 977f223

Please sign in to comment.