perf: reduce python overhead for awkward backend #554

pfackeldey · 2025-01-13T19:48:10Z

Reduce python overhead of awkward-array backend by "grouping" multiple operations into a single broadcasting traversal with ak.transform.

Description

This PR improves the performance of the awkward-array backend by fusing the application of multiple operations (that would each trigger broadcasting every time) into a single broadcast. (This is conceptually similar to fusing GPU kernels to avoid multiple kernel launches, where a kernel launch would be equivalent to a broadcasting with awkward-array - of course, fusing GPU kernels is very much different otherwise...).

This removes the python overhead significantly, e.g.:

import numpy as np
import awkward as ak

x1 = y1 = z1 = x2 = y2 = z2 = ak.Array([[1, 2, 3], [], None, [4, 5]])

lib = np
angle = 0.3

# an example function that has a lot of operations
from vector._compute.spatial.rotate_axis import cartesian

cartesian(lib, angle, x1, y1, z1, x2, y2, z2)
#(<Array [[1, 2, 3], [], None, [4, 5]] type='4 * option[var * float64]'>,
# <Array [[1, 2, 3], [], None, [4, 5]] type='4 * option[var * float64]'>,
# <Array [[1, 2, 3], [], None, [4, 5]] type='4 * option[var * float64]'>)

%timeit cartesian(lib, angle, x1, y1, z1, x2, y2, z2)
# 17.7 ms ± 33.6 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

# the following happens now automatically in vector (but it's also exposed to be used somewhere else, e.g. coffea?)
import vector

vector.awkward_transform(cartesian)(lib, angle, x1, y1, z1, x2, y2, z2)
#(<Array [[1, 2, 3], [], None, [4, 5]] type='4 * option[var * float64]'>,
# <Array [[1, 2, 3], [], None, [4, 5]] type='4 * option[var * float64]'>,
# <Array [[1, 2, 3], [], None, [4, 5]] type='4 * option[var * float64]'>)

%timeit vector.awkward_transform(cartesian)(lib, angle, x1, y1, z1, x2, y2, z2)
# 682 μs ± 4.66 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

With this PR vector.awkward_transform is used internally in vector to speed up all vector._compute methods with the awkward backend (through vector's dispatch mechanism).

Note: vector.awkward_transform currently makes some assumptions about the internal vector._compute methods, e.g. their signature and return signature. This could be written in a general way, but where to stop the generality? At some point input and output arguments would need to be flattened and unflattened similar to JAX PyTrees... and that would be a bit out-of-scope for this decorator. Thus, it is currently exposed in vector's public API, but only intended for expert usage.

Oh, and if someone has a better name for this function/decorator, I'm happy to change it.

Checklist

Have you followed the guidelines in our Contributing document?
Have you checked to ensure there aren't any other open Pull Requests for the required change?
Does your submission pass pre-commit? ($ pre-commit run --all-files or $ nox -s lint)
Does your submission pass tests? ($ pytest or $ nox -s tests)
Does the documentation build with your changes? ($ cd docs; make clean; make html or $ nox -s docs)
Does your submission pass the doctests? ($ pytest --doctest-plus src/vector/ or $ nox -s doctests)

…ns in one single broadcasting traversal

…le overhead...

pfackeldey · 2025-01-13T21:51:07Z

In context of the trijet mass calculation of the AGC (ref: scikit-hep/awkward#3359), this brings the runtime further down to (from ~25ms):

# 100k events (cpu backend)
In [1]: %timeit calculate_trijet_mass(events)
19.1 ms ± 203 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

# typetracer
In [2]: %timeit calculate_trijet_mass(tt)
18.2 ms ± 49.6 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

ianna

@pfackeldey - Thanks - this reduces broadcasting overhead as discussed during the meeting! Looks good to me!

pfackeldey · 2025-01-14T14:42:07Z

Hi @ianna,
thanks for your review!
Do you have some opinions about the name of this function/decorator? And/or how "publicly" exposed it should be?
I was wondering also, if we should add a "Tipp" to the awkward-array documentation that if one wants to apply a batch of e.g. ufuncs, it might be beneficial to do this in a single layout traversal with ak.transform. Is this somewhere already, if not, do you think it's worth to add somewhere?

ianna · 2025-01-15T13:22:35Z

Hi @ianna, thanks for your review! Do you have some opinions about the name of this function/decorator? And/or how "publicly" exposed it should be? I was wondering also, if we should add a "Tipp" to the awkward-array documentation that if one wants to apply a batch of e.g. ufuncs, it might be beneficial to do this in a single layout traversal with ak.transform. Is this somewhere already, if not, do you think it's worth to add somewhere?

It's a good idea to add a performance hint in the documentation. Perhaps, even a new section with "good practices" or "performance guide"?

Do you mean something like:

@broadcast_once(axis=0)
def my_func(arr) 
...

pfackeldey · 2025-01-15T13:44:52Z

Hi @ianna, thanks for your review! Do you have some opinions about the name of this function/decorator? And/or how "publicly" exposed it should be? I was wondering also, if we should add a "Tipp" to the awkward-array documentation that if one wants to apply a batch of e.g. ufuncs, it might be beneficial to do this in a single layout traversal with ak.transform. Is this somewhere already, if not, do you think it's worth to add somewhere?

It's a good idea to add a performance hint in the documentation. Perhaps, even a new section with "good practices" or "performance guide"?

Do you mean something like:
@broadcast_once(axis=0)
def my_func(arr) 
...

ak.transform is already doing the layout traversal (e.g. in context of broadcasting) only once. So we don't need to add a new function. It would just be a hint in the ak.transform documentation. I don't think we can really specify an axis, because all inputs to a transform need to be broadcasted together (numpy's broadcasting also doesn't support specifying an axis).

Ok, I'd go ahead and think about how to add a tipp/hint in awkward's documentation for ak.transform 👍.

ianna · 2025-01-15T13:50:15Z

Hi @ianna, thanks for your review! Do you have some opinions about the name of this function/decorator? And/or how "publicly" exposed it should be? I was wondering also, if we should add a "Tipp" to the awkward-array documentation that if one wants to apply a batch of e.g. ufuncs, it might be beneficial to do this in a single layout traversal with ak.transform. Is this somewhere already, if not, do you think it's worth to add somewhere?

It's a good idea to add a performance hint in the documentation. Perhaps, even a new section with "good practices" or "performance guide"?
Do you mean something like:
@broadcast_once(axis=0)
def my_func(arr) 
...
ak.transform is already doing the layout traversal (e.g. in context of broadcasting) only once. So we don't need to add a new function. It would just be a hint in the ak.transform documentation. I don't think we can really specify an axis, because all inputs to a transform need to be broadcasted together (numpy's broadcasting also doesn't support specifying an axis).

Ok, I'd go ahead and think about how to add a tipp/hint in awkward's documentation for ak.transform 👍.

if you are done with the PR, I can merge it :-)

awkward backend: reduce python overhead by applying multiple operatio…

c0177d9

…ns in one single broadcasting traversal

pfackeldey requested review from Saransh-cpp and jpivarski January 13, 2025 19:54

add a fast path for noops, otherwise ak.transform does a non-negligib…

6377820

…le overhead...

ianna approved these changes Jan 14, 2025

View reviewed changes

ianna merged commit 78cd02f into main Jan 15, 2025
18 checks passed

ianna deleted the pfackeldey/reduce_python_overhead branch January 15, 2025 15:38

This was referenced Jan 15, 2025

docs: add a performance tip to ak.transform docstring scikit-hep/awkward#3369

Merged

performance tip: merge ufuncs with ak.transform columnflow/columnflow#604

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: reduce python overhead for awkward backend #554

perf: reduce python overhead for awkward backend #554

pfackeldey commented Jan 13, 2025

pfackeldey commented Jan 13, 2025

ianna left a comment

pfackeldey commented Jan 14, 2025 •

edited

Loading

ianna commented Jan 15, 2025

pfackeldey commented Jan 15, 2025

ianna commented Jan 15, 2025

perf: reduce python overhead for awkward backend #554

perf: reduce python overhead for awkward backend #554

Conversation

pfackeldey commented Jan 13, 2025

Description

Checklist

pfackeldey commented Jan 13, 2025

ianna left a comment

Choose a reason for hiding this comment

pfackeldey commented Jan 14, 2025 • edited Loading

ianna commented Jan 15, 2025

pfackeldey commented Jan 15, 2025

ianna commented Jan 15, 2025

pfackeldey commented Jan 14, 2025 •

edited

Loading