Skip to content

CLBlast 1.4.0

Compare
Choose a tag to compare
@CNugteren CNugteren released this 03 Jun 11:27
· 294 commits to master since this release

CLBlast version 1.4.0. Changes since previous release (version 1.3.0):

  • Added Python interface to CLBlast 'PyCLBlast'
  • Added CLBlast to Ubuntu PPA and macOS Homebrew package managers
  • Added an API to run the tuners programmatically without any I/O
  • Improved the performance potential by adding a second tunable GEMM kernel with 2D register tiling
  • Added support for Intel specific subgroup shuffling extensions for faster GEMM on Intel GPUs
  • Re-added a local memory size constraint to the tuners
  • The routine tuners now automatically pick up tuning results from disk from the kernel tuners
  • Updated and reorganised the CLBlast documentation
  • Added a 'canary' region to check for overflows in the tuner and tests (inspired by clARMOR)
  • Added an option to test against and compare performance with Intel's MKL
  • Fixed an access violation when compiled with Visual Studio upon releasing the OpenCL program
  • Fixed incorrect releasing of the OpenCL program resulting in segfaults / access violations
  • Various minor fixes and enhancements
  • Added tuned parameters for various devices (see doc/tuning.md)
  • Added non-BLAS level-1 routines:
    • SHAD/DHAD/CHAD/ZHAD/HHAD (Hadamard element-wise vector-vector product)