Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Odd behavior of type tracing on IndexOptionalArray #3264

Closed
yimuchen opened this issue Oct 2, 2024 · 1 comment · Fixed by #3266
Closed

Odd behavior of type tracing on IndexOptionalArray #3264

yimuchen opened this issue Oct 2, 2024 · 1 comment · Fixed by #3266
Labels
bug The problem described is something that must be fixed

Comments

@yimuchen
Copy link

yimuchen commented Oct 2, 2024

Version of Awkward Array

2.6.8

Description and code to reproduce

When attempting to run type tracing on an IndexOptionalArray, this generates an error. This is most noticable when using with dask awkward.

import awkward as ak
import dask_awkward as dak

arr = ak.Array([[1], [2, 3], [1, 2, 4, 5]])
print(arr.type) # 3 * var * int64

arr = arr[ak.max(arr, axis=-1) > 0.99]
print(arr.type) # 3 * option[var * int64]

dar = dak.from_awkward(ak.Array([[1], [2, 3], [1, 2, 4, 5]]), npartitions=1)
print(dar.type, dar._meta) # ?? * var * int64 [...]
dar = dar[ak.max(dar, axis=-1) > 0.99]  # ?? * option[var * int64] [...]
print(dar.type, dar._meta)
ak.typetracer.length_one_if_typetracer(dar._meta)

With the last line generating the error:

Traceback (most recent call last):
  File "/srv/testing/typetracer_behavior.py", line 24, in <module>
    ak.typetracer.length_one_if_typetracer(dar._meta)
  File "/srv/emj-env/lib/python3.11/site-packages/awkward/typetracer.py", line 133, in length_one_if_typetracer
    return _length_0_1_if_typetracer(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/srv/emj-env/lib/python3.11/site-packages/awkward/typetracer.py", line 81, in _length_0_1_if_typetracer
    layout = function(layout.form, highlevel=False)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/srv/emj-env/lib/python3.11/site-packages/awkward/forms/form.py", line 638, in length_one_array
    return ak.operations.ak_from_buffers._impl(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/srv/emj-env/lib/python3.11/site-packages/awkward/operations/ak_from_buffers.py", line 150, in _impl
    out = _reconstitute(form, length, container, getkey, backend, byteorder, simplify)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/srv/emj-env/lib/python3.11/site-packages/awkward/operations/ak_from_buffers.py", line 289, in _reconstitute
    content = _reconstitute(
              ^^^^^^^^^^^^^^
  File "/srv/emj-env/lib/python3.11/site-packages/awkward/operations/ak_from_buffers.py", line 335, in _reconstitute
    raw_array1 = container[getkey(form, "starts")]
                 ~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'None'

The same behavior can be triggered with just awkward with the following:

arr = ak.Array([[1], [2, 3], [1, 2, 4, 5]])
arr = arr[ak.max(arr, axis=-1) > 0.99]

ak.typetracer.length_one_if_typetracer( # Equivalent to type tracking on dar._meta
   arr.layout.to_typetracer(forget_length=True)
) 

I'm not sure if this should be classified as an awkward issue, or an issue with how dask_awkward is generating its _meta array used for type tracing. I can forward this to dask_awkward if it looks to be more appropriate to discuss over there.

@yimuchen yimuchen added the bug (unverified) The problem described would be a bug, but needs to be triaged label Oct 2, 2024
@jpivarski jpivarski added bug The problem described is something that must be fixed and removed bug (unverified) The problem described would be a bug, but needs to be triaged labels Oct 2, 2024
@jpivarski
Copy link
Member

Here's a reproducer that doesn't use dask-awkward (thus demonstrating that the bug is in Awkward):

>>> arr = ak.Array([[1], [2, 3], [1, 2, 4, 5]])[[0, None, 2]]
>>> ak.typetracer.length_one_if_typetracer(ak.to_backend(arr, "typetracer"))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jpivarski/irishep/awkward/src/awkward/typetracer.py", line 133, in length_one_if_typetracer
    return _length_0_1_if_typetracer(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jpivarski/irishep/awkward/src/awkward/typetracer.py", line 81, in _length_0_1_if_typetracer
    layout = function(layout.form, highlevel=False)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jpivarski/irishep/awkward/src/awkward/forms/form.py", line 638, in length_one_array
    return ak.operations.ak_from_buffers._impl(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jpivarski/irishep/awkward/src/awkward/operations/ak_from_buffers.py", line 154, in _impl
    out = _reconstitute(form, length, container, getkey, backend, byteorder, simplify)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jpivarski/irishep/awkward/src/awkward/operations/ak_from_buffers.py", line 296, in _reconstitute
    content = _reconstitute(
              ^^^^^^^^^^^^^^
  File "/home/jpivarski/irishep/awkward/src/awkward/operations/ak_from_buffers.py", line 342, in _reconstitute
    raw_array1 = container[getkey(form, "starts")]
                 ~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'None'

This arr ought to have a Form and containers like this:

>>> form, length, container = ak.to_buffers(arr)
>>> print(form)
{
    "class": "IndexedOptionArray",
    "index": "i64",
    "content": {
        "class": "ListArray",
        "starts": "i64",
        "stops": "i64",
        "content": {
            "class": "NumpyArray",
            "primitive": "int64",
            "form_key": "node2"
        },
        "form_key": "node1"
    },
    "form_key": "node0"
}
>>> length
3
>>> for key, value in container.items():
...     print(f"{repr(key):<20s}: {str(value)}")
... 
'node0-index'       : [ 0 -1  1]
'node1-starts'      : [0 3]
'node1-stops'       : [1 7]
'node2-data'        : [1 2 3 1 2 4 5]

As a typetracer, the values in the container will be different (mocked up to make a length-1 array), but we get something rather different:

{
    "class": "IndexedOptionArray",
    "index": "i64",
    "content": {
        "class": "ListArray",
        "starts": "i64",
        "stops": "i64",
        "content": "int64"
    },
    "form_key": "node-0"
}
length = 1
'node-0'            : b'\xff\xff\xff\xff\xff\xff\xff\xff'

The IndexedOptionArray has a form_key, but the ListArray and the NumpyArray do not. Why not? I'll dig a little further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug The problem described is something that must be fixed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants