Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCX fails to initialize on ppc64le #7213

Open
dalcinl opened this issue Nov 13, 2024 · 3 comments
Open

UCX fails to initialize on ppc64le #7213

dalcinl opened this issue Nov 13, 2024 · 3 comments

Comments

@dalcinl
Copy link
Contributor

dalcinl commented Nov 13, 2024

I'm having issues building MPICH 4.2.3 and 4.1.3 with external UCX 1.17.0 (+ fix from openucx/ucx#9973) on ppc64le under emulation using podman. Builds on aarch64 and x86_64 are fine.

One of the build logs is here: https://github.com/mpi4py/mpi-publish/actions/runs/11803264738/job/32880846533.
I can also reproduce the problem locally.

I'm configuring using --with-device=ch4:ofi,ucx.
I run the basic MPI helloworld example setting MPICH_CH4_NETMOD=ucx.
I'm getting the following failure:

Abort(135914895): Fatal error in internal_Init: Other MPI error, error stack:
internal_Init(49162).....: MPI_Init(argc=0x100000800470, argv=0x100000800478) failed
MPII_Init_thread(242)....: 
MPID_Init(552)...........: 
MPIDI_UCX_init_local(227):  ucx function returned with failed status(ucx_init.c 227 MPIDI_UCX_init_local Invalid parameter)

IIRC, our attempts to build MPICH with UCX on conda-forge also faced runtime issues in ppc64le.
Any tips on how to further debug this issue?

@raffenet
Copy link
Contributor

Is there anything useful in the output if you set UCX_LOG_LEVEL=info? Unfortunately I'm unable to launch a ppc64le container on my M1 Macbook to debug interactively.

@dalcinl
Copy link
Contributor Author

dalcinl commented Nov 19, 2024

No, UCX_LOG_LEVEL=info produced no additional output. I'm building UCX with the configure-release script, I'll try again with a debug build.

@dalcinl
Copy link
Contributor Author

dalcinl commented Nov 19, 2024

Once again, a debug build did not produce any additional output 😞 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants