Add support for annotated axis names #2596

agoose77 · 2023-07-27T14:36:08Z

Description of new feature

At the PyHEP.dev workshop, it was observed that the axis parameter has the potential to harm readability of an analysis. This follows from the idea that the semantic interpretation of the axis parameter requires knowledge of the array structure, which is normally a strong function of the execution history.

The notes from the workshop are below:

Present: Angus, Alex, Ioana, Matthew, Ianna, Jim, Tal, Remco, Jonas, Mason, Clemens,

We should be able to label / name axes for readability.

Awkward Array can/should add support for named axes, c.f. XArray with dimension names

No need to add labels because they're much more specific to plotting

New function e.g. array = ak.with_axis_name(array, 0, "events") to permit ak.sum(array, axis="events")

The flatten(array[argmax(..., keepdims=True)]) pattern is complex to understand, despite being a pattern

We could add an accessor that allows us to force ragged indexing. This permits to have a single-index accessor if needs be, that directly consumes the result of argmax(..., keepdims=False), e.g. array.at[...]. We could also have a similar-yet-different array.select[...] that accepts a keepdims=True positional reducer result, and flattens the result afterwards (we know that keepdims=True produces regular dimensions, which can be identified statically).

Interest in removing repeated array names event.foo.x[event.foo.y > 2] → event.foo.x[.y > 2] (or better)

Could introduce a this proxy that refers to array being sliced.

Tools:

Xarray

Example from user guide: https://docs.xarray.dev/en/stable/user-guide/indexing.html#

Nathan: JAX also has some (experimental) support for named axes and parallelism

Examples

Alex example from AGC

Gordon fantasy rewrite of Alex cell's 3.

My current thinking on this, following from the workshop discussion, is that we should add a mechanism for labelling axes with a name, such that this name can later be used in place of the integer value.

import awkward as ak

array = ak.Array([
	[{'pt': 0.0, ...}],
    ...
])

array = ak.with_axis_name(array, 0, "events")
array = ak.with_axis_name(array, 1, "particles")

total_pt = ak.sum(array.pt, axis="particles")

The text was updated successfully, but these errors were encountered:

pfackeldey · 2024-11-21T15:17:41Z

This is added in #3238 👍

agoose77 added the feature New feature or request label Jul 27, 2023

github-project-automation bot added this to Finalization Aug 28, 2024

github-project-automation bot moved this to P5 in Finalization Aug 28, 2024

pfackeldey mentioned this issue Sep 12, 2024

feat: named axis for ak.Array #3238

Merged

pfackeldey self-assigned this Sep 12, 2024

pfackeldey closed this as completed Nov 21, 2024

github-project-automation bot moved this from P5 to Done in Finalization Nov 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for annotated axis names #2596

Add support for annotated axis names #2596

agoose77 commented Jul 27, 2023

pfackeldey commented Nov 21, 2024

Add support for annotated axis names #2596

Add support for annotated axis names #2596

Comments

agoose77 commented Jul 27, 2023

Description of new feature

pfackeldey commented Nov 21, 2024