Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tensor asarray support for usm ndarray protocol #1959

Merged

Conversation

oleksandr-pavlyk
Copy link
Collaborator

This PR changes dpctl.tensor.asarray function to recognize objects that implement __usm_ndarray__ protocol and expect this property to return dpctl.tensor.usm_ndarray instance corresponding to the content of the object.

This property is intended to speed-up conversion from dpnp.ndarray to dpt.usm_ndarray in x = dpt.asarray(dpnp_array_obj).
The input object that implements __usm_ndarray__ is recognized as owner of USM allocation that is managed by a smart pointer, and asynchronous deallocation of x need not involve GIL.

  • Have you provided a meaningful PR description?
  • Have you added a test, reproducer or referred to an issue with a reproducer?
  • Have you tested your changes locally for CPU and GPU devices?
  • Have you made sure that new changes do not introduce compiler warnings?
  • Have you checked performance impact of proposed changes?
  • Have you added documentation for your changes, if necessary?
  • Have you added your changes to the changelog?
  • If this PR is a work in progress, are you opening the PR as a draft?

`asarray` supports objects that implement `__sycl_usm_array_interface__`.
It can create a view into USM allocation owned by input object, hence maintains
a reference to it.

Asynchronous deallocation of such objects in dpctl.tensor functions require
manipulating Python object reference counters, and hold GIL. This is a source
of dead-locks, and affects performance.

This PR adds support for ingesting Python objects that implement __usm_ndarray__
attribute (property) that returns dpt.usm_ndarray object with such a view directly.

It is trivial for `dpnp.ndarray` to implement such a property, e.g,.

```
@Property
def __usm_ndarray__(self):
    return self._array_obj
```

With this definition, `dpt.asarray(dpnp_array)` will recognize that the underlying
USM allocation is managed by the smart pointer, and asynchronous deallocation will
not involve Python objects, avoiding dead-locks and improving performance.
@oleksandr-pavlyk oleksandr-pavlyk force-pushed the tensor-asarray-support-for-usm-ndarray-protocol branch from 8d89f85 to e8fe0e0 Compare January 10, 2025 17:27
Copy link

github-actions bot commented Jan 10, 2025

Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. 🤞

Copy link

Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_428 ran successfully.
Passed: 894
Failed: 2
Skipped: 118

1 similar comment
Copy link

Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_428 ran successfully.
Passed: 894
Failed: 2
Skipped: 118

@coveralls
Copy link
Collaborator

coveralls commented Jan 10, 2025

Coverage Status

coverage: 87.681% (-0.03%) from 87.715%
when pulling e8fe0e0 on tensor-asarray-support-for-usm-ndarray-protocol
into 9f8f90b on master.

@oleksandr-pavlyk
Copy link
Collaborator Author

Drop in coverage level has to do with some issue with coverage itself. Running locally, I see

dpctl/tensor/_ctors.py                               708     46    354     37    92%   66->68, 71, 76, 108, 150, 191, 197-198, 214, 224, 240, 292, 303, 315->318, 330, 344->
347, 360, 379-382, 420->424, 428-434, 435->437, 588->597, 657->681, 764, 774, 789-795, 874-875, 877, 1325->1327, 1500->1502, 1503->1505, 1512-1528, 1621, 1624, 1720, 1739->
1743, 1781, 1859

For example, lines 66-68 are newly added lines, but when I add a print statement there, I see the output during test execution.

@ndgrigorian
Copy link
Collaborator

A brief thought: NumPy specifically outlines how the __array__() interface works here: https://numpy.org/doc/2.1/user/basics.interoperability.html#dunder-array-interface
This is especially useful because they expect the copy and dtype arguments to be present, or a warning is thrown.

I'm not sure we need to require these arguments, but we should add some documentation for __usm_ndarray__ to outline our expectations, similarly.

@antonwolfy
Copy link
Collaborator

antonwolfy commented Jan 13, 2025

NumPy specifically outlines how the __array__() interface works

It turns me to the off-topic question if dpctl and dpnp have to provide support of __array__ method?

Since today you can pass usm_ndarray to numpy.asarray and it will return malformed data:

import numpy, dpctl, dpctl.tensor as dpt

a = dpt.ones(10)
numpy.asarray(a)
# Out:
# array([usm_ndarray(1.), usm_ndarray(1.), usm_ndarray(1.), usm_ndarray(1.),
#        usm_ndarray(1.), usm_ndarray(1.), usm_ndarray(1.), usm_ndarray(1.),
#        usm_ndarray(1.), usm_ndarray(1.)], dtype=object)

but if we add support of __array__ it seems it will work:

class A:
     def __init__(self, obj):
         self.obj = obj

     def __array__(self):
         return dpt.asnumpy(self.obj)

b = A(a)

numpy.asarray(b)
# Out: array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

or alternatively we can add an exception raising inside __array__ method to clearly notify that implicit conversion to a NumPy array is not allowed.

I can create a separate issue if there is something to do or discuss.

@oleksandr-pavlyk
Copy link
Collaborator Author

@antonwolfy I opened gh-1964 to implement usm_ndarray.__array__ property to disallow the implicit conversion as in your example.

Copy link
Collaborator

@ndgrigorian ndgrigorian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional documentation can be saved for a follow-up PR, this LGTM

Copy link
Collaborator

@antonwolfy antonwolfy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the changes with dpnp#2261 locally and all dpnp tests passed. It seems everything works properly.
Thank you @oleksandr-pavlyk

@oleksandr-pavlyk oleksandr-pavlyk merged commit 0f3536b into master Jan 14, 2025
60 of 61 checks passed
@oleksandr-pavlyk oleksandr-pavlyk deleted the tensor-asarray-support-for-usm-ndarray-protocol branch January 14, 2025 16:02
antonwolfy added a commit to IntelPython/dpnp that referenced this pull request Jan 20, 2025
The PR is intended to adopt to dpctl changes implemented in
[dpctl#1959](IntelPython/dpctl#1959).

It implements support of `__usm_ndarray__` protocol for `dpnp.ndarray`
and returns a property with `dpctl.tensor.usm_ndarray` instance
corresponding to the content of the array object.

This property is intended to speed-up conversion from `dpnp.ndarray` to
`dpt.usm_ndarray` in `x=dpt.asarray(dpnp_array_obj)`.
The input object that implements `__usm_ndarray__` is recognized as
owner of USM allocation that is managed by a smart pointer, and
asynchronous deallocation of `x` need not involve GIL.
github-actions bot added a commit to IntelPython/dpnp that referenced this pull request Jan 20, 2025
The PR is intended to adopt to dpctl changes implemented in
[dpctl#1959](IntelPython/dpctl#1959).

It implements support of `__usm_ndarray__` protocol for `dpnp.ndarray`
and returns a property with `dpctl.tensor.usm_ndarray` instance
corresponding to the content of the array object.

This property is intended to speed-up conversion from `dpnp.ndarray` to
`dpt.usm_ndarray` in `x=dpt.asarray(dpnp_array_obj)`.
The input object that implements `__usm_ndarray__` is recognized as
owner of USM allocation that is managed by a smart pointer, and
asynchronous deallocation of `x` need not involve GIL. 9ad1bb5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants