match merging optimization #363

TimWhiting · 2023-09-22T01:06:38Z

Takes a match that has common superstructure, and reworks it to match on that first.

Note that the transform expects there only to be a single pattern in the match statement, which I believe is a fine assumption to begin with, since tuple types are used for multiple pattern matches, and it rewrites the core from the bottom up. However, it probably should be improved to handle this as well. Branches can get tricky fast.

So

match e
    Cons(a, Cons(1, Cons(c, Nil))) -> (a + c).show.println
    Cons(_, Cons(b, _)) -> b.show.println
    Nil -> "Nothing".println
    _ -> implicit error

Turns into

match e
  Cons(a, Cons(b, d)) -> 
    match (b, d)
       (1, Cons(c, Nil) -> a + c.show.println
       (b, _) -> b.show.println
        _ -> implicit error
  Nil -> "Nothing".println
  _ -> implicit error

In the generated C code, the original results in duplicated Cons checks for the first two Conses.
In the new, those checks are eliminated.

It is smart enough to push the implicit error incomplete match error branches down into subpatterns, but doesn't do any exhaustiveness checking to see if the changes have resulted in any new exhaustive cases.

TimWhiting · 2023-10-23T22:38:40Z

This is not ready to merge, I'll add some tests since it does end up moving things around quite a bit.

skip guards catch all case

TimWhiting · 2024-07-10T06:21:05Z

TLDR: Shaves off 3-5% runtime on some heavily nested matches which impact ref counting (e.g. some variants of rbtree), otherwise doesn't impact the runtime.

Old fip benchmarks:

benchmark variant param elapsed relative stddev rss

kk rbtree fip 100000 0.63 1.000 .0148324 4816
kk rbtree fip-icfp 100000 0.62 .984 .00762203 4800
kk rbtree std-reuse 100000 0.65 1.031 .00461077 4800
kk rbtree std 100000 1.38 2.190 .0317361 4800
kk rbtree fip-clrs 100000 0.8 1.269 .00982963 4816
c rbtree clrs 100000 1.095 1.738 .01602356 10432
c rbtree clrs-mi 100000 0.62 .984 .0116428 5952
c rbtree clrs-full 100000 1.105 1.753 .0211089 10944
c rbtree clrs-full-mi 100000 0.615 .976 .0152768 5952
cpp rbtree stl 100000 0.76 1.206 .0357757 11472
cpp rbtree stl-mi 100000 0.33 .523 .00286459 5968

kk ftree fip 100000 0.75 1.000 .0083666 4928
kk ftree std-reuse 100000 0.73 .973 .00532934 4080
kk ftree std 100000 1.16 1.546 .0263274 4112

kk msort fip 100000 1.03 1.000 .0161245 6368
kk msort std-reuse 100000 0.92 .893 .00564783 10576
kk msort std 100000 1.13 1.097 .0120170 10608

kk qsort fip 100000 1.485 1.000 .0261725 12112
kk qsort std-reuse 100000 1.92 1.292 .0255149 12624
kk qsort std 100000 2.365 1.592 .0529204 12656

kk tmap fip 100000 1.24 1.000 .0130384 7968
kk tmap std-reuse 100000 0.78 .629 .00344517 7984
kk tmap std 100000 0.82 .661 .00512008 7984
c tmap fip 100000 4.64 3.741 .1656211 12600
c tmap fip-mi 100000 0.66 .532 .0044510 7520
c tmap std 100000 4.635 3.737 3.88184 10648
c tmap std-mi 100000 0.68 .548 .00387494 7520

New fip benchmarks:

benchmark variant param elapsed relative stddev rss

kk rbtree fip 100000 0.61 1.000 .0109545 4832
kk rbtree fip-icfp 100000 0.6 .983 .00695086 4832
kk rbtree std-reuse 100000 0.65 1.065 .010 4816
kk rbtree std 100000 1.38 2.262 .0189252 4800
kk rbtree fip-clrs 100000 0.785 1.286 .013177 4816
c rbtree clrs 100000 1.095 1.795 .0200686 12712
c rbtree clrs-mi 100000 0.61 1.000 .0114018 5952
c rbtree clrs-full 100000 1.09 1.786 .0532815 9456
c rbtree clrs-full-mi 100000 0.62 1.016 .00556486 5952
cpp rbtree stl 100000 0.725 1.188 .0266969 7552
cpp rbtree stl-mi 100000 0.32 .524 0 5968

kk ftree fip 100000 0.8 1.000 .00894427 4912
kk ftree std-reuse 100000 0.75 .937 .00662559 4112
kk ftree std 100000 1.1 1.375 .0115040 4112

kk msort fip 100000 0.01 0
kk msort std-reuse 100000 0.94 1.000 .0250998 10576
kk msort std 100000 1.15 1.223 .00386746 10624

kk qsort fip 100000 1.55 1.000 .0457165 12064
kk qsort std-reuse 100000 1.96 1.264 .0828860 12592
kk qsort std 100000 2.465 1.590 .0859924 12640

kk tmap fip 100000 1.235 1.000 .0276586 7984
kk tmap std-reuse 100000 0.78 .631 .00399079 7952
kk tmap std 100000 0.83 .672 .00368069 8000
c tmap fip 100000 4.6 3.724 .2286547 9608
c tmap fip-mi 100000 0.66 .534 .00506596 7520
c tmap std 100000 4.61 3.732 .2087919 10808
c tmap std-mi 100000 0.68 .550 .00301247 7520

anfelor

Hi Tim! This looks pretty cool, I didn't realize we could make the binary search tree benchmarks even faster with a proper pattern match compiler.

I added two nitpicks on the tests, but didn't really read the code. Is there something you wanted me to look at in particular?

test/parc/parc16.kk

test/parc/parc2.kk.out

TimWhiting · 2024-07-13T01:49:09Z

@anfelor

Hi Tim! This looks pretty cool, I didn't realize we could make the binary search tree benchmarks even faster with a proper pattern match compiler.

Yes, I originally intended to do this to clean up the generated C code - so there wouldn't be as many duplicated checks for similar structure, but I expected the C compiler at least to mostly optimize these redundant checks. It looks however that it is obscure enough that the C compiler cannot optimize it.

Looking at it again, when running the benchmark script I realized I was getting some inconsistent results, especially with -n=3. I noticed that the first run was considerably slower sometimes. I skipped reporting the first run and now run with -n=10. The new results aren't quite as impressive, but considering how fast and optimized these benchmarks are already, it seems like it is still a decent improvement. Up to %5 percent.

Is there something you wanted me to look at in particular?

You are one of the only other people besides Daan who could give this a review, and I'd appreciate pointing out any obvious bugs before I take up Daan's time since he is so busy. The file isn't very big <600 lines of heavily commented code. I understand if you are busy as well, so just whatever you are willing to look at I'd appreciate.

TimWhiting changed the base branch from master to dev September 22, 2023 04:32

TimWhiting marked this pull request as draft October 23, 2023 22:38

TimWhiting force-pushed the match-merging branch 2 times, most recently from 8448557 to 9fd2ac6 Compare June 18, 2024 19:56

TimWhiting force-pushed the match-merging branch from 3cd2901 to c160707 Compare July 10, 2024 02:49

match merging optimization

e1fa89d

skip guards catch all case

TimWhiting force-pushed the match-merging branch from 0e61ffa to e1fa89d Compare July 10, 2024 06:15

comment and clean up code

f8d000e

TimWhiting marked this pull request as ready for review July 11, 2024 14:12

TimWhiting requested a review from anfelor July 11, 2024 14:19

anfelor reviewed Jul 12, 2024

View reviewed changes

test/parc/parc16.kk Outdated Show resolved Hide resolved

test/parc/parc2.kk.out Show resolved Hide resolved

update to fix cases of no discriminants

7a4b5ee

TimWhiting force-pushed the match-merging branch from d8cd1ab to 7a4b5ee Compare July 12, 2024 23:29

relax merge precondition

437a14c

TimWhiting added 2 commits July 12, 2024 20:14

fix merging, avoid fbip warnings

7db3cb3

clean up imports

5f61184

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

match merging optimization #363

match merging optimization #363

TimWhiting commented Sep 22, 2023

TimWhiting commented Oct 23, 2023

TimWhiting commented Jul 10, 2024 •

edited

Loading

anfelor left a comment

TimWhiting commented Jul 13, 2024 •

edited

Loading

match merging optimization #363

Are you sure you want to change the base?

match merging optimization #363

Conversation

TimWhiting commented Sep 22, 2023

TimWhiting commented Oct 23, 2023

TimWhiting commented Jul 10, 2024 • edited Loading

Old fip benchmarks:

benchmark variant param elapsed relative stddev rss

New fip benchmarks:

benchmark variant param elapsed relative stddev rss

anfelor left a comment

Choose a reason for hiding this comment

TimWhiting commented Jul 13, 2024 • edited Loading

TimWhiting commented Jul 10, 2024 •

edited

Loading

TimWhiting commented Jul 13, 2024 •

edited

Loading