You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm having issues building MPICH 4.2.3 and 4.1.3 with external UCX 1.17.0 (+ fix from openucx/ucx#9973) on ppc64le under emulation using podman. Builds on aarch64 and x86_64 are fine.
I'm configuring using --with-device=ch4:ofi,ucx.
I run the basic MPI helloworld example setting MPICH_CH4_NETMOD=ucx.
I'm getting the following failure:
Abort(135914895): Fatal error in internal_Init: Other MPI error, error stack:
internal_Init(49162).....: MPI_Init(argc=0x100000800470, argv=0x100000800478) failed
MPII_Init_thread(242)....:
MPID_Init(552)...........:
MPIDI_UCX_init_local(227): ucx function returned with failed status(ucx_init.c 227 MPIDI_UCX_init_local Invalid parameter)
IIRC, our attempts to build MPICH with UCX on conda-forge also faced runtime issues in ppc64le.
Any tips on how to further debug this issue?
The text was updated successfully, but these errors were encountered:
Is there anything useful in the output if you set UCX_LOG_LEVEL=info? Unfortunately I'm unable to launch a ppc64le container on my M1 Macbook to debug interactively.
I'm having issues building MPICH 4.2.3 and 4.1.3 with external UCX 1.17.0 (+ fix from openucx/ucx#9973) on ppc64le under emulation using podman. Builds on aarch64 and x86_64 are fine.
One of the build logs is here: https://github.com/mpi4py/mpi-publish/actions/runs/11803264738/job/32880846533.
I can also reproduce the problem locally.
I'm configuring using
--with-device=ch4:ofi,ucx
.I run the basic MPI helloworld example setting
MPICH_CH4_NETMOD=ucx
.I'm getting the following failure:
IIRC, our attempts to build MPICH with UCX on conda-forge also faced runtime issues in ppc64le.
Any tips on how to further debug this issue?
The text was updated successfully, but these errors were encountered: