Slow compilation of bilinear forms #2544

pbrubeck · 2022-09-02T14:29:22Z

Compilation for these forms is just very slow, maybe someone could have a look to ensure the algorithmic complexity is the expected one:

from firedrake import *
from time import time

nx = 1
base = UnitSquareMesh(nx, nx, quadrilateral=True)
mesh = ExtrudedMesh(base, 1, nx)

V = VectorFunctionSpace(mesh, "NCF", degree=3)
u = TrialFunction(V)
v = TestFunction(V)

def time_assemble(form):
    start = time()
    assemble(form)
    print("Time", time()-start)

time_assemble(inner(div(u), div(v))*dx)
time_assemble(inner(div(u.T), div(v.T))*dx)

Here's the output:

Time 32.85061550140381
Time 110.47377705574036

The text was updated successfully, but these errors were encountered:

wence- · 2022-09-02T14:41:35Z

I think this is a loopy slowness.

from tsfc import compile_form

k, = compile_form(inner(div(u.T), div(v.T))*dx, coffee=False)

Takes around 2 seconds for me, but the subsequent

import loopy
lp = loopy.generate_code_v2(k.ast).device_code()

Takes 16 seconds. I imagine that also inlining it into the pyop2 kernel would make things slower.

cc: @inducer, @kaushikcfd

sv2518 · 2022-09-02T16:36:18Z

I think we are not inlining the local kernels anymore (unless they are vectorised, but this is still not merged to main).

inducer · 2022-09-02T17:18:23Z

Do you have a profile of what routines in loopy are being slow?

wence- · 2022-09-02T17:30:12Z

@pbrubeck can you run pyinstrument on your test script and upload the (gzipped) result file? Or have you moved on @inducer ?

pbrubeck · 2022-09-05T12:50:35Z

here's the profile
slow_compilation_profile.tar.gz

inducer · 2022-09-05T16:44:31Z

Thanks! I had moved on to vmprof from pyinstrument, but that HTML output is lovely. I might be back! At any rate, I'm compatible with whatever.

The dominant cost appears to be _match_caller_callee_argument_dimension_, which IMO doesn't deserve to take much time at all. As it stands, it seems to get bogged down in simplify_via_aff, which is probably solvable with some judicious caching. That should shave ~50s off the loopy time. Then, you somehow seem to be falling into the trial-and-error case of linearization, which costs you another 11s. (@kaushikcfd, any idea why?) The remaining 16s is actual codegen, which is probably harder to shrink. Then there's 75s of gcc(-ish) time, which we likely also won't be able to help with.

So altogether I see one easy target here, which is a cache for simplify_via_aff: inducer/loopy#678. Help is definitely welcome here; I don't know how soon I might be able to get to this.

pbrubeck changed the title ~~Slow compilation~~ Slow compilation of bilinear forms Sep 2, 2022

inducer mentioned this issue Sep 5, 2022

Cache simplify_via_aff inducer/loopy#678

Open

connorjward mentioned this issue Sep 21, 2022

Loopy caching fixes OP2/PyOP2#673

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow compilation of bilinear forms #2544

Slow compilation of bilinear forms #2544

pbrubeck commented Sep 2, 2022

wence- commented Sep 2, 2022 •

edited

Loading

sv2518 commented Sep 2, 2022

inducer commented Sep 2, 2022

wence- commented Sep 2, 2022

pbrubeck commented Sep 5, 2022

inducer commented Sep 5, 2022

Slow compilation of bilinear forms #2544

Slow compilation of bilinear forms #2544

Comments

pbrubeck commented Sep 2, 2022

wence- commented Sep 2, 2022 • edited Loading

sv2518 commented Sep 2, 2022

inducer commented Sep 2, 2022

wence- commented Sep 2, 2022

pbrubeck commented Sep 5, 2022

inducer commented Sep 5, 2022

wence- commented Sep 2, 2022 •

edited

Loading