-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MPICH with NVIDIA Compilers #7178
Comments
Which version of MPICH is this? Could you try the latest release? |
Its the latest 4.2.3 version. |
Could you add |
Here is the log file, The main error is |
Could you try ? -
|
Hui, here is the log. |
Any update on this ticket? |
Sorry for neglect. Could you try the newest MPICH 4.3.0rc1 release (https://www.mpich.org/downloads/), and if it still fails, upload the run log? |
Hi @hzhou we have seen a similar issue on Vista when working on MVAPICH.
Looks to me like the NVIDIA compiler performs some kind of unwanted optimization that is leading to this issue for us when NDEBUG is defined. Any thoughts on where to look? Edit: |
Hi Mpich Team,
I have build MPICH with NVIDIA compilers (nvc, nvc++ nvfortran) on TACC Vista machine. Though srun works but mpiexec job launcher results in following errors. Any suggestions?
i615-001gg$ mpiexec -np 16 -ppn 2 ./namd3_mpi_smp_fftw3 +ppn 71 +pemap 1-71,73-143 +commap 0,72 stmv.namd
[proxy:[email protected]] created hwloc xml file /tmp/hydra_hwloc_xmlfile_QmhOmh
[proxy:[email protected]] created hwloc xml file /tmp/hydra_hwloc_xmlfile_kYI4Ja
[proxy:[email protected]] created hwloc xml file /tmp/hydra_hwloc_xmlfile_7fPRik
[proxy:[email protected]] created hwloc xml file /tmp/hydra_hwloc_xmlfile_bjz7BQ
[proxy:[email protected]] created hwloc xml file /tmp/hydra_hwloc_xmlfile_LGXVSr
[proxy:[email protected]] created hwloc xml file /tmp/hydra_hwloc_xmlfile_4GtuuA
[proxy:[email protected]] created hwloc xml file /tmp/hydra_hwloc_xmlfile_ud3CVC
[proxy:[email protected]] created hwloc xml file /tmp/hydra_hwloc_xmlfile_uKHjRx
[proxy:[email protected]] cache_put_flush (proxy/pmip_pmi.c:183): assert (s) failed
[proxy:[email protected]] cache_put_flush (proxy/pmip_pmi.c:183): assert (s) failed
[proxy:[email protected]] cache_put_flush (proxy/pmip_pmi.c:183): assert (s) failed
[proxy:[email protected]] cache_put_flush (proxy/pmip_pmi.c:183): assert (s) failed
[proxy:[email protected]] cache_put_flush (proxy/pmip_pmi.c:183): assert (s) failed
[proxy:[email protected]] cache_put_flush (proxy/pmip_pmi.c:183): assert (s) failed
[proxy:[email protected]] cache_put_flush (proxy/pmip_pmi.c:183): assert (s) failed
[proxy:[email protected]] cache_put_flush (proxy/pmip_pmi.c:183): assert (s) failed
Abort(878831119) on node 2: Fatal error in internal_Init_thread: Other MPI error, error stack:
internal_Init_thread(49255)...: MPI_Init_thread(argc=0xfffff342b99c, argv=0xfffff342b990, required=1, provided=0xfffff342b988) failed
MPII_Init_thread(265).........:
MPIR_init_comm_world(34)......:
MPIR_Comm_commit(800).........:
MPIR_Comm_commit_internal(585):
MPID_Comm_commit_pre_hook(151):
MPIDI_world_pre_init(640).....:
MPIDI_UCX_init_world(263).....:
initial_address_exchange(79)..:
MPIDU_bc_table_create(153)....:
MPIR_pmi_allgather_shm(690)...:
get_ex_segs(431)..............:
(unknown)(): Other MPI error
The text was updated successfully, but these errors were encountered: