Skip to content

Releases: eth-cscs/COSTA

v2.2.2: Merge pull request #20 from eth-cscs/cmake_fix

10 May 11:52
bb84528
Compare
Choose a tag to compare
  • remove the -mtune=native option from the CXXFLAGS

COSTA-v2.2.1

19 Apr 11:01
5e26703
Compare
Choose a tag to compare
  • fix a cmake bug when linking to cray libsci.

v2.2

22 Feb 14:09
4b4b977
Compare
Choose a tag to compare
  • Update of cmake build system
  • All optional sub-modules are treated as dependency

COSTA-v2.1

05 Jul 16:16
Compare
Choose a tag to compare

This version brings the following improvements/features:

  • scalapack-wrappers: COSTA now implements all scalapack pxgemr2d, pxtran (transpose), pxtranu (complex transpose) and pxtranc (conjugate, complex transpose) routines, i.e.
    • pdgemr2d, psgemr2d, pcgemr2d, pzgemr2d
    • pstran, pdtran, pctranu, pztranu, pctranc, pztranc
      and their prefixed versions:
    • costa_pdgemr2d, costa_psgemr2d, costa_pcgemr2d, costa_pzgemr2d
    • costa_pstran, costa_pdtran, costa_pctranu, costa_pztranu, costa_pctranc, costa_pztranc
  • code refactoring
  • performance improvements

COSTA-v2.0

27 May 12:53
28d0f44
Compare
Choose a tag to compare

This release brings the following improvements:

  • [row-major blocks]: COSTA now supports both row-major and col-major ordering of blocks. This is more general than scalapack which supports only the col-major ordering.
  • [performance-improvements]: the multithreaded backends are improved to avoid cores oversubscription, resulting in a performance boost.
  • [bugfixes]: a thourough testing has been performed during the integration of COSTA into COSMA and CP2K.

COSTA v1.0

28 Oct 20:33
34d2f39
Compare
Choose a tag to compare

This is the very first release of COSTA, bringing the following features:

  • scalapack wrappers: for redistribute (pxgemr2d) and transpose (pxtran(u)).
  • different layouts support: added representation for block-cyclic and arbitrary matrix layouts.
  • multiple layouts: can transform multiple layouts at once, i.e. in the same communication round.
  • comm-optimal: can minimize the communication volume.
  • scaling & transpose: in addition to redistributing the matrix, can also scale initial and final layouts and also transpose them.
  • highly optimized: optimized for distributed and multithreaded settings.