-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: reduce python overhead for awkward backend #554
Conversation
…ns in one single broadcasting traversal
In context of the trijet mass calculation of the AGC (ref: scikit-hep/awkward#3359), this brings the runtime further down to (from ~25ms): # 100k events (cpu backend)
In [1]: %timeit calculate_trijet_mass(events)
19.1 ms ± 203 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# typetracer
In [2]: %timeit calculate_trijet_mass(tt)
18.2 ms ± 49.6 μs per loop (mean ± std. dev. of 7 runs, 100 loops each) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pfackeldey - Thanks - this reduces broadcasting overhead as discussed during the meeting! Looks good to me!
Hi @ianna, |
It's a good idea to add a performance hint in the documentation. Perhaps, even a new section with "good practices" or "performance guide"? Do you mean something like: @broadcast_once(axis=0)
def my_func(arr)
... |
Ok, I'd go ahead and think about how to add a tipp/hint in awkward's documentation for |
if you are done with the PR, I can merge it :-) |
Reduce python overhead of awkward-array backend by "grouping" multiple operations into a single broadcasting traversal with
ak.transform
.Description
This PR improves the performance of the awkward-array backend by fusing the application of multiple operations (that would each trigger broadcasting every time) into a single broadcast. (This is conceptually similar to fusing GPU kernels to avoid multiple kernel launches, where a kernel launch would be equivalent to a broadcasting with awkward-array - of course, fusing GPU kernels is very much different otherwise...).
This removes the python overhead significantly, e.g.:
With this PR
vector.awkward_transform
is used internally in vector to speed up allvector._compute
methods with the awkward backend (through vector's dispatch mechanism).Note:
vector.awkward_transform
currently makes some assumptions about the internalvector._compute
methods, e.g. their signature and return signature. This could be written in a general way, but where to stop the generality? At some point input and output arguments would need to be flattened and unflattened similar to JAX PyTrees... and that would be a bit out-of-scope for this decorator. Thus, it is currently exposed in vector's public API, but only intended for expert usage.Oh, and if someone has a better name for this function/decorator, I'm happy to change it.
Checklist
$ pre-commit run --all-files
or$ nox -s lint
)$ pytest
or$ nox -s tests
)$ cd docs; make clean; make html
or$ nox -s docs
)$ pytest --doctest-plus src/vector/
or$ nox -s doctests
)