-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PMI1 shim hangs with dstore gds component, passes with hash #3
Comments
Per discussion on the teleconf this is likely because the dstore does not have an exhaustive search path for the PMI1 case where the process identifer is not provided in the I wanted to file this so that it is searchable for folks that might hit this type of issue when using PMI1. |
This should/will be shifted to the new PMI-1/2 repo: https://github.com/openpmix/pmi-shim |
@rhc54 : any objection to using Github's "Transfer issue" to move the issue to the new repo? That will preserve the issue discussion/history thus far. |
Didn't know about it 😄 Sure, I can do that. |
Thanks! |
Problem: on some platforms PMIx is the preferred mechanism to use for bootstrapping Flux. Add support to the broker's pmiutil.c to use PMIx if the PMIx server environment variables are set. For now, keep the PMIx integration as simple as possible, and use the PMIx_*() functions directly. We can consider other options such as indirection through dlopen() later, if we run into problems. This implementation was guided by the PMI-1 compatibility code here: https://github.com/openpmix/pmi-shim Since Flux does not require all of PMI-1, our code is much simpler. In addition, some PMIx differences from PMI-1 with respect to key scope could be dealt with directly, compared to the shim: - add a 'from_rank' to broker_pmi_kvs_get() so that PMIx_Get() can set proc.rank to this instead of PMIX_RANK_UNDEF. This avoids a hang with the dstore gds component, as described in openpmix/pmi-shim#3 - if 'from_rank' is set to -1, then set proc.rank to PMIX_RANK_UNDEF, and set the PMIX_OPTIONAL attribute to 1 so PMIx_Get() fails immediately if the key is not set. This is used when the broker tries to fetch the 'flux.instance-level' key, which the flux shell places in the KVS, and is not expected to exist when Flux is launched by a foreign resource manager. Note to future implementor of flux shell PMIx plugin (flux-framework#3536): this assumes that 'flux.instance-level' would be set using PMIx_server_register_nspace() or equivalent, which would push the key to the client at initialization. Add some PMIX well known environment variables to the blocklist in runat.c, so they do not propagate to the initial program when Flux is launched by a PMIx process manager. Co-authored-by: Jim Garlick <[email protected]>
Problem: on some platforms PMIx is the preferred mechanism to use for bootstrapping Flux. Add support to the broker's pmiutil.c to use PMIx if the PMIx server environment variables are set. For now, keep the PMIx integration as simple as possible, and use the PMIx_*() functions directly. We can consider other options such as indirection through dlopen() later, if we run into problems. This implementation was guided by the PMI-1 compatibility code here: https://github.com/openpmix/pmi-shim Since Flux does not require all of PMI-1, our code is much simpler. In addition, some PMIx differences from PMI-1 with respect to key scope could be dealt with directly, compared to the shim: - add a 'from_rank' to broker_pmi_kvs_get() so that PMIx_Get() can set proc.rank to this instead of PMIX_RANK_UNDEF. This avoids a hang with the dstore gds component, as described in openpmix/pmi-shim#3 - if 'from_rank' is set to -1, then set proc.rank to PMIX_RANK_UNDEF, and set the PMIX_OPTIONAL attribute to 1 so PMIx_Get() fails immediately if the key is not set. This is used when the broker tries to fetch the 'flux.instance-level' key, which the flux shell places in the KVS, and is not expected to exist when Flux is launched by a foreign resource manager. Note to future implementor of flux shell PMIx plugin (flux-framework#3536): this assumes that 'flux.instance-level' would be set using PMIx_server_register_nspace() or equivalent, which would push the key to the client at initialization. Add some PMIX well known environment variables to the blocklist in runat.c, so they do not propagate to the initial program when Flux is launched by a PMIx process manager. Co-authored-by: Jim Garlick <[email protected]>
Problem: on some platforms PMIx is the preferred mechanism to use for bootstrapping Flux. Add support to the broker's pmiutil.c to use PMIx if the PMIx server environment variables are set. For now, keep the PMIx integration as simple as possible, and use the PMIx_*() functions directly. We can consider other options such as indirection through dlopen() later, if we run into problems. This implementation was guided by the PMI-1 compatibility code here: https://github.com/openpmix/pmi-shim Since Flux does not require all of PMI-1, our code is much simpler. In addition, some PMIx differences from PMI-1 with respect to key scope could be dealt with directly, compared to the shim: - add a 'from_rank' to broker_pmi_kvs_get() so that PMIx_Get() can set proc.rank to this instead of PMIX_RANK_UNDEF. This avoids a hang with the dstore gds component, as described in openpmix/pmi-shim#3 - if 'from_rank' is set to -1, then set proc.rank to PMIX_RANK_UNDEF, and set the PMIX_OPTIONAL attribute to 1 so PMIx_Get() fails immediately if the key is not set. This is used when the broker tries to fetch the 'flux.instance-level' key, which the flux shell places in the KVS, and is not expected to exist when Flux is launched by a foreign resource manager. Note to future implementor of flux shell PMIx plugin (flux-framework#3536): this assumes that 'flux.instance-level' would be set using PMIx_server_register_nspace() or equivalent, which would push the key to the client at initialization. Add some PMIX well known environment variables to the blocklist in runat.c, so they do not propagate to the initial program when Flux is launched by a PMIx process manager. Co-authored-by: Jim Garlick <[email protected]>
Background information
What version of the PMIx Reference Library are you using? (e.g., v1.0, v2.1, git master @ hash, etc.)
Describe how PMIx was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
PMIx built with:
Open MPI built with:
Please describe the system on which you are running
Details of the problem
I was testing PMI1 shim with the
pmi_client.c
in the v3.1.4 release.If I force the
hash
GDS component then it worksThe text was updated successfully, but these errors were encountered: