Merge of 0.15.1dev2 tag to gold/2021 #1471

oleksandr-pavlyk · 2023-11-09T04:03:13Z

This PR merges development milestone 0.15.1dev2 complete with reduction functions, and statistical functions to gold/2021 (golden old days) for the purpose of building the milestone and uploading it to the internal package channel.

Have you provided a meaningful PR description?
Have you added a test, reproducer or referred to an issue with a reproducer?
Have you tested your changes locally for CPU and GPU devices?
Have you made sure that new changes do not introduce compiler warnings?
Have you checked performance impact of proposed changes?
If this PR is a work in progress, are you opening the PR as a draft?

Fix upload_linux/upload_windows steps

* Implements necessary sycl utilities for custom reductions * Implements dpctl.tensor.max and dpctl.tensor.min * Adds tests for min and max * Reductions now set max_wg to the minimum of the max work group size and 2048 - This prevents running out of resources when using local memory on CPU * max and min nan propagation fixed for CPU devices - drops use of fetch_max/fetch_min for floats, which do not handle nans correctly * Tweak to test_reduction_kernels * Implements dpctl.tensor.argmax and argmin * Tests for argmin and argmax Also fixes argmin and argmax for scalar inputs * Argmin and argmax now handle identities correctly Adds a test for this behavior Fixed a typo in argmin and argmax causing shared local memory variant to be used for more types than expected * Replaced `std::min` with `idx_reduction_op_` * reductions now well-behaved for size-zero arrays - comparison and search reductions will throw an error in this case - slips in change to align sum signature with array API spec * removed unnecessary copies in reduction templates * Refactors sum to use generic reduction templates * Sum now uses a generic Python API * Docstrings added for argmax, argmin, max, and min * Small reduction clean-ups Removed unnecessary copies in custom_reduce_over_group Sequential reduction now casts before calling operator (makes behavior explicit rather than implicit) * Added test for argmin with keepdims=True * Added a test for raised errors in reductions Also removed unused `_usm_types` in `test_tensor_sum` * Removed `void` overloads from reduction utilities These were unused by dpctl * Added missing include, Identity to use has_known_identity Implementation of Identity trait should call sycl::known_identity if trait sycl::has_known_identity is a true_type. Added IsMultiplies, and identity value for it, since sycl::known_identity for multiplies is only defined for real-valued types. * Adding functor factories for product over axis * Added Python API for _prod_over_axis * Common reduction template takes functions to test if atomics are applicable Passing these function pointers around allows to turn atomic off altogether if desired. Use custom trait to check if reduce_over_groups can be used. This allows to work-around bug, or switch to custom code for reduction over group if desired. Such custom trait type works around issue with incorrect result returned from sycl::reduce_over_group for sycl::multiplies operator for 64-bit integral types. * Defined dpctl.tensor.prod Also tweaked docstring for sum. * Added tests for dpt.prod, removed uses of numpy * Corrected prod docstring Small tweaks to sum, min, and max docstrings --------- Co-authored-by: Oleksandr Pavlyk <[email protected]>

* Implements flat overload for repeat Adds tests for new functionality * repeat `repeats` parameter relaxed to permit lists and ranges Docstring has been adjusted to reflect changes to `axis` as well as new `repeats` types Corrected a bug in the behavior of `repeat` for size 1 `repeats` Python sequences * Fixed repeat error syntax for `repeats array with ndim > 1

[CI] Fix test_linux step in conda-packages workflow

https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/supported/sycl_ext_intel_device_info.md This includes HW characteristics for Intel Level-Zero GPU devices as well as access to PCI device-identifier.

Adds a test for _is_gen9 helper utility useful for skipping tests known to fail on Gen9. Adds a test for intel_device_info function. Test that descriptor names do not have typos.

1d variant of repeat was not passed host task event dependency for allocating shapes and strides on the device. This caused sporadic segfaults, where the kernel would attempt to access unallocated device data.

Removed cmake/IntelDPCPP.cmake, vendored cmake/IntelSYCL.cmake script Changed project's CMake scripts to use IntelSYCL. Renamed libsyclinterface/cmake/modules/FindIntelSycl.cmake to libsyclinterface/cmake/modules/FindIntelSyclCompiler.cmake to avoid possible name conflict on OS with case insensitive FS

Removed -fsycl for default linking options on Windows. The option is added as needed.

…'s pyhon_add_library

Set variable in public CI to override using interprocedural optimization in public CI to avoid insufficient resources failure during compilation on Windows.

…ffset-handling Fix reduction contig impl offset handling

* max and min now use MinMaxAtomicSupportFactory These functions were using ArithmeticAtomicSupportFactory, which disables atomics for floating point types * Resolves #1455 This issue was caused by a typo where when the `axis0` kernels for tree and atomic reductions would be called, the `axis1` kernel would be called instead * Adds tests for #1455 resolution

Closes gh-1457 ``` In [1]: import dpctl.tensor as dpt In [2]: dpt.asnumpy([1,2,3]) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[2], line 1 ----> 1 dpt.asnumpy([1,2,3]) File ~/repos/dpctl/dpctl/tensor/_copy_utils.py:185, in asnumpy(usm_ary) 169 def asnumpy(usm_ary): 170 """ 171 asnumpy(usm_ary) 172 (...) 183 of `usm_ary` 184 """ --> 185 return _copy_to_numpy(usm_ary) File ~/repos/dpctl/dpctl/tensor/_copy_utils.py:40, in _copy_to_numpy(ary) 38 def _copy_to_numpy(ary): 39 if not isinstance(ary, dpt.usm_ndarray): ---> 40 raise TypeError( 41 f"Expected dpctl.tensor.usm_ndarray, got {type(ary)}" 42 ) 43 nb = ary.usm_data.nbytes 44 hh = dpm.MemoryUSMHost(nb, queue=ary.sycl_queue) TypeError: Expected dpctl.tensor.usm_ndarray, got <class 'list'> In [3]: quit ```

#1462) * Fixes correctness regression in search functions ``py_search_over_axis`` no longer calls the ``axis1`` contiguous variant ``py_search_over_axis`` now only calls ``axis0`` variant wh * Adds tests for fixed search reduction behavior

1. Renamed misspelled variable 2. If reduction_nelems is small, used SequentialReductionKernel for tree-reductions as it is done for atomic reduction 3. Tweak scaling down logic for moderately-sized number of elements to reduce. We should also use max_wg if the iter_nelems is very small (one), since choosing max_wg for large iter_nelems may lead to under- utilization of GPU.

_tensor_impl continues holding constructors, where, clip _tensor_elementwise_impl holds elementwise functions _tensor_reductions_impl holds reduction functions.

Added stable API to retrieve implementation functions in each elementwise function class instance to allow `dpnp` to access that information using stable API.

@ndgrigorian

…at types Added entries for float and double types to TypePairSupportDataForCompReductionAtomic as spotted by @ndgrigorian in the PR review. Also moved comments around.

This removes use of dpnp.matmul from the example, making this example self-contained.

…ts (#1464) * Adds SequentialSearchReduction functor to search reductions * Search reductions use correct branch for float16 constexpr branch logic accounted for floating point types but not sycl::half, which meant NaNs were not propagating for float16 data

…duction Improve performance of reduction for small number of elements to reduce for types where tree-reduction is needed

…uction_kernels

Removed mention of dtype kwarg in usage line

Function _reduce_over_axis promotes input array to requested result data type and carries out reduction computation in that data type. This is done in dtype if implementation supports it. If implementation does not support the requested dtype, we reduce in the default_dtype, and cast to the request dtype afterwards.

Fix for gh-1468 in arithmetic reduction when type promotion is needed

…hape-is-integral-numpy-scalar Fix usm_ndarray ctor when shape is integral numpy scalar

* Adds __array_namespace_info__ inspection utility This inspection utility is coming to the array API specification in the near future * Set __array_api_version__ to "2022.12" * Remove --ci from array API conformity workflow * Adds __array_namespace_info__ docstrings Disallows dtypes for `kind` kwarg in __array_namespace_info__().dtypes Removes `float16` from dtypes listed by __array_namespace_info__ as per spec Permits dpctl.tensor.Device objects in device keyword arguments in array API inspection utilities * Adds tests for array API inspection

* Resolves gh-1456 Tree reductions now populate destination with the identity when reducing over zero-size axes. As a result, logic was removed for handling zero-size axes. ``argmax``, ``argmin``, ``max``, and ``min`` still raise an error for zero-size axes. Reductions now return a copy when provided an empty axis tuple. Adds additional supported dtype combinations to ``prod`` and ``sum``, specifically for input integers and inexact output type * Implements mean, var, and std * Adds more tests for statistical functions * Adds docstrings for statistical functions * Adds more supported types to arithmetic reductions Permits `float` accumulation type with 64 bit integer and unsigned integer inouts to prevent unnecessary copies on devices that don't support double precision * Changes mean reduction to use output data type as sum accumulation type Mean in-place division now uses the real type for the denominator

github-actions · 2023-11-09T04:30:44Z

View rendered docs @ https://intelpython.github.io/dpctl/pulls/1471/index.html

github-actions · 2023-11-09T15:03:42Z

Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. 🤞

oleksandr-pavlyk and others added 30 commits September 27, 2023 11:14

Use cmake_minimum_required with min...max

b528b10

Fix upload_linux/upload_windows steps

a9aae28

Merge pull request #1424 from IntelPython/address-cmake-warnings

b4aabe4

Merge pull request #1425 from IntelPython/fix-wheel-uploads

caa0939

Fix upload_linux/upload_windows steps

Do apt-get update before installing gdb

2c757d4

Merge pull request #1429 from IntelPython/fix-gdb-installation

d4cc465

[CI] Fix test_linux step in conda-packages workflow

Added a comment line

049cd77

Implement Python API for ext_intel_device_info descriptors

f73a1c4

https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/supported/sycl_ext_intel_device_info.md This includes HW characteristics for Intel Level-Zero GPU devices as well as access to PCI device-identifier.

Add tests for new function in dpctl.utils

d11e64e

Adds a test for _is_gen9 helper utility useful for skipping tests known to fail on Gen9. Adds a test for intel_device_info function. Test that descriptor names do not have typos.

Repeat Python bindings properly pass host task dependencies (#1430)

ebf118a

1d variant of repeat was not passed host task event dependency for allocating shapes and strides on the device. This caused sporadic segfaults, where the kernel would attempt to access unallocated device data.

Skip dpt.prod testing for complex result dtype on Gen9

0988dd0

Remove _is_gen9, export intel_device_info

79a9d78

Removed test for _is_gen9, as it was removed

662bc45

Skip prod tests for complex output types on Gen9

cd0ec5d

Use IntelSYCL instead of IntelDPCPP cmake script

5d5dcc2

libsyclinterface/tests/test_helper.cpp needs SYCL

de644b9

Make sure to not overwrite -fsycl option in select files

ce5e7c8

Removed -fsycl for default linking options on Windows. The option is added as needed.

IntelSycl_VERSION -> IntelSyclCompiler_VERSION

5898fdd

Added additional test file needed sycl

72e44ba

Try to make link options add by add_sycl_to_target PRIVATE

679fdc9

Do not override -fsycl

c6a9305

Fixed typo in examples/pybind11/external_usm_allocation/CMakeLists.txt

5adfe0e

Removed setting SKBUILD_LINK_LIBRARIES_KEYWORD since we move to cmake…

b60bc90

…'s pyhon_add_library

Use python_add_library with WITH_SOABI

ad1f120

Set no-fast-math for reduction_over_axis too

9f078d9

Channel OVERRIDE_INTEL_IPO env. variable

30b49b1

Set variable in public CI to override using interprocedural optimization in public CI to avoid insufficient resources failure during compilation on Windows.

_device_queries extension needs SYCL

550b20e

oleksandr-pavlyk and others added 27 commits October 26, 2023 19:56

Merge pull request #1458 from IntelPython/fix-reduction-contig_impl-o…

d82f3a9

…ffset-handling Fix reduction contig impl offset handling

Apply SequentialReductionKernel to axis0 reduction

6a0b09c

Split _tensor_impl into three extensions

f74eae0

_tensor_impl continues holding constructors, where, clip _tensor_elementwise_impl holds elementwise functions _tensor_reductions_impl holds reduction functions.

Used new native extension modules

421b270

Added docstrings and getter methods for ElementwiseFunc classes

41ec378

Added stable API to retrieve implementation functions in each elementwise function class instance to allow `dpnp` to access that information using stable API.

Instantiate atomic reduction templates for min/max ops for double/flo…

645044a

…at types Added entries for float and double types to TypePairSupportDataForCompReductionAtomic as spotted by @ndgrigorian in the PR review. Also moved comments around.

Modified sycl_timer example to use dpctl.tensor function

097ecf5

This removes use of dpnp.matmul from the example, making this example self-contained.

Fixed misspelled words

d4d4992

Remove superfluous includes in tensor_ctors.cpp per PR review

eb21e50

Merge pull request #1463 from IntelPython/optimize-small-size-tree-re…

9018745

…duction Improve performance of reduction for small number of elements to reduce for types where tree-reduction is needed

Enable use of np.int64 to specify shape of usm_ndarray

7e79083

Add a test for shape being np.int64 scalar

2bc7939

Eliminated multiple uses of same literal constants in test_search_red…

aadb6b4

…uction_kernels

Corrected argmin/argmax docstring

07c075b

Removed mention of dtype kwarg in usage line

Added a test based on gh-1468

ff9b5eb

Changed TypeError wording per PR feedback

da59476

Removed redundant asdtype function call

ca2c6aa

Merge pull request #1470 from IntelPython/fix-gh-1468-reduction

dbab3fe

Fix for gh-1468 in arithmetic reduction when type promotion is needed

Merge pull request #1467 from IntelPython/fix-usm-ndarray-ctor-when-s…

f686102

…hape-is-integral-numpy-scalar Fix usm_ndarray ctor when shape is integral numpy scalar

oleksandr-pavlyk merged commit 7957990 into gold/2021 Nov 9, 2023
47 of 53 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge of 0.15.1dev2 tag to gold/2021 #1471

Merge of 0.15.1dev2 tag to gold/2021 #1471

oleksandr-pavlyk commented Nov 9, 2023

github-actions bot commented Nov 9, 2023

github-actions bot commented Nov 9, 2023

Merge of 0.15.1dev2 tag to gold/2021 #1471

Merge of 0.15.1dev2 tag to gold/2021 #1471

Conversation

oleksandr-pavlyk commented Nov 9, 2023

github-actions bot commented Nov 9, 2023

github-actions bot commented Nov 9, 2023