Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow compilation of bilinear forms #2544

Open
pbrubeck opened this issue Sep 2, 2022 · 6 comments
Open

Slow compilation of bilinear forms #2544

pbrubeck opened this issue Sep 2, 2022 · 6 comments

Comments

@pbrubeck
Copy link
Contributor

pbrubeck commented Sep 2, 2022

Compilation for these forms is just very slow, maybe someone could have a look to ensure the algorithmic complexity is the expected one:

from firedrake import *
from time import time

nx = 1
base = UnitSquareMesh(nx, nx, quadrilateral=True)
mesh = ExtrudedMesh(base, 1, nx)

V = VectorFunctionSpace(mesh, "NCF", degree=3)
u = TrialFunction(V)
v = TestFunction(V)

def time_assemble(form):
    start = time()
    assemble(form)
    print("Time", time()-start)

time_assemble(inner(div(u), div(v))*dx)
time_assemble(inner(div(u.T), div(v.T))*dx)

Here's the output:

Time 32.85061550140381
Time 110.47377705574036
@pbrubeck pbrubeck changed the title Slow compilation Slow compilation of bilinear forms Sep 2, 2022
@wence-
Copy link
Contributor

wence- commented Sep 2, 2022

I think this is a loopy slowness.

from tsfc import compile_form

k, = compile_form(inner(div(u.T), div(v.T))*dx, coffee=False)

Takes around 2 seconds for me, but the subsequent

import loopy
lp = loopy.generate_code_v2(k.ast).device_code()

Takes 16 seconds. I imagine that also inlining it into the pyop2 kernel would make things slower.

cc: @inducer, @kaushikcfd

@sv2518
Copy link
Contributor

sv2518 commented Sep 2, 2022

I think we are not inlining the local kernels anymore (unless they are vectorised, but this is still not merged to main).

@inducer
Copy link

inducer commented Sep 2, 2022

Do you have a profile of what routines in loopy are being slow?

@wence-
Copy link
Contributor

wence- commented Sep 2, 2022

@pbrubeck can you run pyinstrument on your test script and upload the (gzipped) result file? Or have you moved on @inducer ?

@pbrubeck
Copy link
Contributor Author

pbrubeck commented Sep 5, 2022

here's the profile
slow_compilation_profile.tar.gz

@inducer
Copy link

inducer commented Sep 5, 2022

Thanks! I had moved on to vmprof from pyinstrument, but that HTML output is lovely. I might be back! At any rate, I'm compatible with whatever.

The dominant cost appears to be _match_caller_callee_argument_dimension_, which IMO doesn't deserve to take much time at all. As it stands, it seems to get bogged down in simplify_via_aff, which is probably solvable with some judicious caching. That should shave ~50s off the loopy time. Then, you somehow seem to be falling into the trial-and-error case of linearization, which costs you another 11s. (@kaushikcfd, any idea why?) The remaining 16s is actual codegen, which is probably harder to shrink. Then there's 75s of gcc(-ish) time, which we likely also won't be able to help with.

So altogether I see one easy target here, which is a cache for simplify_via_aff: inducer/loopy#678. Help is definitely welcome here; I don't know how soon I might be able to get to this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants