Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for annotated axis names #2596

Closed
agoose77 opened this issue Jul 27, 2023 · 1 comment
Closed

Add support for annotated axis names #2596

agoose77 opened this issue Jul 27, 2023 · 1 comment
Assignees
Labels
feature New feature or request

Comments

@agoose77
Copy link
Collaborator

Description of new feature

At the PyHEP.dev workshop, it was observed that the axis parameter has the potential to harm readability of an analysis. This follows from the idea that the semantic interpretation of the axis parameter requires knowledge of the array structure, which is normally a strong function of the execution history.

The notes from the workshop are below:

Present: Angus, Alex, Ioana, Matthew, Ianna, Jim, Tal, Remco, Jonas, Mason, Clemens,

  • We should be able to label / name axes for readability.

    • Awkward Array can/should add support for named axes, c.f. XArray with dimension names
    • No need to add labels because they're much more specific to plotting
    • New function e.g. array = ak.with_axis_name(array, 0, "events") to permit ak.sum(array, axis="events")
  • The flatten(array[argmax(..., keepdims=True)]) pattern is complex to understand, despite being a pattern

    • We could add an accessor that allows us to force ragged indexing. This permits to have a single-index accessor if needs be, that directly consumes the result of argmax(..., keepdims=False), e.g. array.at[...]. We could also have a similar-yet-different array.select[...] that accepts a keepdims=True positional reducer result, and flattens the result afterwards (we know that keepdims=True produces regular dimensions, which can be identified statically).
  • Interest in removing repeated array names event.foo.x[event.foo.y > 2]event.foo.x[.y > 2] (or better)

    • Could introduce a this proxy that refers to array being sliced.
  • Tools:

  • Examples

My current thinking on this, following from the workshop discussion, is that we should add a mechanism for labelling axes with a name, such that this name can later be used in place of the integer value.

import awkward as ak

array = ak.Array([
	[{'pt': 0.0, ...}],
    ...
])

array = ak.with_axis_name(array, 0, "events")
array = ak.with_axis_name(array, 1, "particles")

total_pt = ak.sum(array.pt, axis="particles")
@pfackeldey
Copy link
Collaborator

This is added in #3238 👍

@github-project-automation github-project-automation bot moved this from P5 to Done in Finalization Nov 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
Status: Done
Development

No branches or pull requests

2 participants