-
Notifications
You must be signed in to change notification settings - Fork 144
Conference call notes 20231011
Kenneth Hoste edited this page Oct 11, 2023
·
7 revisions
(back to Conference calls)
Notes on the 231th EasyBuild conference call, Wednesday 11 Oct 2023 (08:00 UTC)
List of attendees (9):
- Massimiliano Culpo (Spack core developer)
- Jasper Grimm (University of York, UK)
- Kenneth Hoste (HPC-UGent, Belgium)
- Adam Huffman (Big Data Institute, Oxford, UK)
- Kurt Lust (UAntwerpen, Belgium + LUMI User Support Team)
- Sebastien Moretti (SIB, Switzerland)
- Mikael Öhman (Chalmers University of Technology, Sweden)
- Åke Sandgren (Umeå University, Sweden)
- Jörg Saßmannshausen (Imperial College London, UK)
- overview of recent developments
- Q&A
- latest EasyBuild release: 4.8.1 (11 Sept 2023)
- ETA for next EasyBuild release: end of Oct'23
- ETA for EasyBuild 5.0 release: by the end of 2023 (?)
- started doing short sprint meetings, each Monday at 10:00 CEST to set next 5 goals to tackle that week
- easyconfigs merge sprint
- planned for Mon 23 Oct'23
- recent changes
-
docs (merged PRs)
- ...
-
framework (merged PRs)
-
bug fixes
- ...
-
enhancements
- ...
-
changes
- reduce number of CI jobs by testing for Lua and Tcl module syntax in a single CI job (PR #4192)
-
EasyBuild 5.0 (to
5.0.x
branch)- ...
-
bug fixes
- easyblocks (merged PRs)
-
easyconfigs (merged PRs)
- over 100 easyconfig PRs were merged since last conf call
-
bug fixes
- add missing required PyPy dependency for Clair3, also copy preprocess and shared subdirectories, and enhance sanity check for provided libclair3 Python package (PR #18847)
- fix source URL for segemehl 0.3.4 (PR #18878)
- avoid use of hardcoded paths for Pillow by using
--disable-platform-guessing option
(PR #18881)- was motivated by problems with installing Pillow in EESSI, see also EESSI support issue #9
- more problems remain because
Pillow
'ssetup.py
want to locatezlib.h
andlibz.*
in creative ways...
- add patch to disable flaky DDRGES3 LAPACK test in OpenBLAS 0.3.23 + 0.3.24 (PR #18887)
- add alternate checksum for NCCL v2.18.3 (PR #18906)
- add missing dependencies for MONAI to support extras required by MONAI-Label (PR #18921)
- enhancements
-
(noteworthy) new software
- ...
-
noteworthy software updates
- ...
- changes
-
EasyBuild 5.0 (to
5.0.x
branch)
-
docs (merged PRs)
- work-in-progress
-
docs (open PRs + issues)
- project board to perform yearly review cycle of all documentation pages: https://github.com/orgs/easybuilders/projects/17/views/1
- enhance documentation of checksums easyconfig parameter (PR #104)
- document policy on supported toolchain generations (PR #200)
-
framework (open PRs + issues)
-
reported bugs / bug fixes
- add optimal optimization flags for Intel compilers on AMD CPUs (issue #3793)
- for AMD Genoa, we don't want to use
-mavx2
since then we won't get AVX-512 instructions
- for AMD Genoa, we don't want to use
- add optimal optimization flags for Intel compilers on AMD CPUs (issue #3793)
-
enhancements
- ...
-
changes
- ...
-
EasyBuild 5.0 (to
5.0.x
branch)- TODO:
- improve error reporting for failing shell commands (and EasyBuild crashes) (PR #4351)
- should
shell
option forrun_shell_cmd
function be renamed touse_bash
? - see also EasyBuild 5.0 sync meeting notes
- TODO:
-
reported bugs / bug fixes
-
easyblocks (open PRs + issues)
- bug reports/fixes
- fix extension filter for Perl packages (PR #2699)
- see also easyconfigs PR #18789
- fix
--sanity-check-only
and--module-only
for UCX plugins (PR #3007)- nice example of how to make easyblocks compatible with
--sanity-check-only
and--module-only
- nice example of how to make easyblocks compatible with
- enhance TensorFlow easyblock to avoid use of
-mcpu=native
for XNNPACK component when building on aarch64 (PR #3011)
- fix extension filter for Perl packages (PR #2699)
- enhancements
- don't blindly overwrite
-Dccflags
+ honour preconfigopts in Perl easyblock (PR #3010)
- don't blindly overwrite
-
updates
- update WRF easyblock to correctly determine
wrf_subdir
for version >= 4.5.1 (PR #2997)- see also easyconfigs PR #18741
- update WRF easyblock to correctly determine
- new easyblocks
-
changes
- Install only SuiteSparse libraries with make install (PR #3004)
-
EasyBuild 5.0 (to
5.0.x
branch)- stop importing from
easybuild.tools.py2vs3
(+ minor cleanup in init easyblocks test) (PR #3015)
- stop importing from
- bug reports/fixes
-
easyconfigs (open PRs + issues)
- bug fixes/reports
- failing build of recent TensorFlow easyconfigs on AWS Graviton3 (
aarch64/neoverse_v1
) (issue #18899) - MPI hanging if
MPI_Init_thread
is used withfoss/2023a
(issue #18925)- due to bug in libfabric, various workarounds possible (like setting
$PSM3_DEVICES
toself,shm
)
- due to bug in libfabric, various workarounds possible (like setting
- make sure Python dependency included for ESPResSo is actually used by specifying
-DPYTHON_EXECUTABLE
(PR #18963)- YIL that
-DPython3_EXECUTABLE
,-DPython_EXECUTABLE
, and-DPYTHON_EXECUTABLE
are three very different options :man-facepalming: -
CMakeMake
should be settingv-DPython_EXECUTABLE
& co (in EasyBuild 5.0)?
- YIL that
- failing build of recent TensorFlow easyconfigs on AWS Graviton3 (
- enhancements
-
new software
- ...
- noteworthy software updates
- changes
- ...
-
EasyBuild 5.0 (to
5.0.x
branch)
- bug fixes/reports
-
docs (open PRs + issues)
-
2023b
toolchains should be included in EasyBuild 4.9.0 release- probably not yet next release (4.8.2, ETA end Oct'23)
- candidate toolchains are merged, ready for more extensive testing of "big" apps
- most significant change is jump to GCC 13.x
-
foss/2023.09
(PR #18886) - candidate forfoss/2023b
- GCC 13.2.0 + binutils 2.40
- OpenMPI 4.1.6 (+ UCX 1.15.0, PMIx 4.2.6, libfabric 1.19.0)
- FlexiBLAS 3.3.1 (+ OpenBLAS 0.3.24)
- FFTW 3.3.10
- ScaLAPACK 2.1.0
-
intel/2023.07
(PR #18439) - candidate forintel/2023b
- GCC 13.2.0 + binutils 2.40
- intel-compilers 2023.2.1
- impi 2021.10.0
- imkl 2023.2.1
- testing
- OSU-Micro-Benchmarks (already done?)
- SciPy-bundle (numpy, scipy)
- GROMACS (C++)
- OpenFOAM (C++)
- requires ParaView, Qt5, etc.
- should we keep building on top of ParaView (only needed for paraFoam utility)?
- installing paraFoam stand-alone is a PITA
- paraFoam isn't actually used when running OpenFOAM simulations
- CP2K (Fortran)
- check if Qt6 can be used
- discussion about problem with VTK that Jörg was hitting
- is there any testing being done on IPv6-only clusters?
- internal network in HPC cluster at Imperial College London is IPv6
- not currently, seems like a pretty exotic setup?
- (Jörg) anyone working on PyTorch 2.x?
- see Simon's closed PR #18269
- maybe Flamefire is looking into it?
- there's a PyTorch 2.1 release now
- we're held back a bit by the relative aggressive testing here...
- what would be a good minimal requirement?
- RHEL8, A100, Intel+AMD CPUs
- we should put a policy in place for this
- (Jörg) fluidity which requires Zoltan (provided by Trilinos)
- why is Fortran90 interface not enabled in Trilinos?
-
-DZoltan_ENABLE_F90INTERFACE=ON -DZoltan_ENABLE_ParMETIS=ON -DZoltan_ENABLE_Scotch=ON
- can adjust "minimal" Trilinos that was added in PR #17448
-
- why is Fortran90 interface not enabled in Trilinos?
- (Jörg) question on Bazel
- download problem when building Bazel, may be a red herring
- there may be an actual other problem higher up
- (Åke) how are people changes in Slurm 22.05.x w.r.t.
srun
and$SLURM_CPUS_PER_TASK
?-
srun
is only listening to$SRUN_CPUS_PER_TASK
(or usesrun -c
) - setting
$SRUN_CPUS_PER_TASK
causes trouble withmpirun
(because it usessrun
and affinity is then wrong) - using patch for
mpirun
that unsets$SRUN_CPUS_PER_TASK
- similar patching was done at LUMI for a while, but it causes problems
- CSCS may have more info on how they dealt with this (cfr. their bug report to Slurm: https://bugs.schedmd.com/show_bug.cgi?id=15632#c43)
-