-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial commit for IAWG review of Process Management chapter #500
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hope the comments help clarify things.
Chap_API_Proc_Mgmt.tex
Outdated
@@ -239,7 +217,7 @@ \subsection{\code{PMIx_Spawn_nb}} | |||
\reqattrend | |||
|
|||
\optattrstart | |||
The following attributes are optional for host environments that support this operation: | |||
The following attributes are optional for \ac{PMIx} implementations: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm...those operations are not performed within PMIx. The attribute is simply communicated to the host for execution. I suspect you are trying to say that a PMIx implementation doesn't have to define them, but that doesn't make much sense as we don't provide that option elsewhere. You have to define it - the host just may not implement support for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
davidsolt@davids-mbp-2 pmix-standard % grep -h "The following attributes are optional" *.tex | sort -u
The following attributes are optional for \ac{PMIx} implementations:
The following attributes are optional for \ac{PMIx} libraries that support \ac{IO} forwarding:
The following attributes are optional for host environments or \ac{PMIx} libraries that support this operation:
The following attributes are optional for host environments that support this operation:
The following attributes are optional for host environments:
The following attributes are optional for implementers of \ac{PMIx} libraries:
I think we should just look at this issue from a higher level and ask what we want to say across the board. I also don't think PMIx should specify whether PMIx implements something internally or whether it passes it along to a host environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This chapter even has sections where some attributes are specifically called out as optional for PMIx implementations and other attributes are marked as optional for host environments for the same API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure - it is optional to implement support for those attributes. We require support for others, and we are very clear which ones are optional for the implementation vs the host vs both.
It is critical that PMIx remain in the "messenger" role and not shift into being the "doer" of things. We do internally support PMIx functions, but we want to be very careful and clear where the line lies. And no, that isn't optional across PMIx implementations - or else the host now has to deal with uncertainty over what the implementation may or may not do, which will be very unpopular (should there ever be another implementation).
Chap_API_Proc_Mgmt.tex
Outdated
@@ -356,15 +372,15 @@ \subsection{Spawn attributes} | |||
} | |||
% | |||
\declareAttribute{PMIX_HOSTFILE}{"pmix.hostfile"}{char*}{ | |||
Hostfile to use for spawned processes. | |||
Hostfile to use for spawned processes. The format of this file is determined by the \ac{PMIx} implementation or host environment. The file may not be portable across implementations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing to do with the PMIx implementation - we just pass the name of the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comes down to whether we want to force the division of labor between PMIx or the host environment. We didn't want to preclude the possibility that someone say, took OpenPMix and decided to implement spawn directly inside of it. If we think that is never going to happen, we can go with:
The format of this file is determined by the host environment, therefore a file may not be portable across systems with different host environments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do support local fork/exec from within the PMIx library, but that is nowhere equivalent to spawn as we provide no support for launching things across nodes. We really should stick with the divisions we have had since day one and not muddy the water.
Chap_API_Proc_Mgmt.tex
Outdated
} | ||
% | ||
\declareAttribute{PMIX_ADD_HOST}{"pmix.addhost"}{char*}{ | ||
Comma-delimited list of hosts to add to the allocation. | ||
} | ||
% | ||
\declareAttribute{PMIX_ADD_HOSTFILE}{"pmix.addhostfile"}{char*}{ | ||
Hostfile containing hosts to add to existing allocation. | ||
Hostfile containing hosts to add to existing allocation. The format of this file is determined by the \ac{PMIx} implementation or host environment. The file may not be portable across implementations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto here - we only pass the name. We do not process the file.
Chap_API_Proc_Mgmt.tex
Outdated
@@ -432,7 +448,7 @@ \subsection{Spawn attributes} | |||
} | |||
% | |||
\declareAttribute{PMIX_INDEX_ARGV}{"pmix.indxargv"}{bool}{ | |||
Mark the \code{argv} with the rank of the process. | |||
If set to true, will use the given name of the executable (\code{argv[0]}) as a base name and each rank will be invoked using the base name with the string "-<\emph{rank}>" appended to it, where \emph{rank} is the \ac{PMIx} rank of the process being invoked. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably make clear that PMIx isn't doing this - we are passing an attribute to the host environment requesting that it do this. Whether or not it happens is completely up to the host.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is true for SO many attributes throughout the standard. I don't know if we want to get into the job of specifying for every attribute how it is used. Someone reading the standard just wants to know what the attribute does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noting that the change adds language like "...will use...", but leaves hanging the question of "who" will use the name. If you want to clarify, it might be something like "host is requested to use...". Or you can split this list to indicate which attributes are passed to the host for execution. Not entirely sure of the best way. Just trying to help you in your quest to clarify exactly what we mean here, and not replace one confusion with another.
Chap_API_Proc_Mgmt.tex
Outdated
@@ -448,7 +464,7 @@ \subsection{Spawn attributes} | |||
} | |||
% | |||
\declareAttribute{PMIX_REPORT_BINDINGS}{"pmix.repbind"}{bool}{ | |||
Report bindings of the individual processes. | |||
Report bindings of the individual processes. How and where this information is reported is implementation dependent as well as dependent on whether the processes are created through a launching tool or by a direct call to \refapi{PMIx_Spawn}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, PMIx isn't doing anything here - we just pass the attribute to the host requesting that it generate this report.
Chap_API_Proc_Mgmt.tex
Outdated
@@ -468,7 +484,7 @@ \subsection{Spawn attributes} | |||
} | |||
% | |||
\declareAttribute{PMIX_SPAWN_TOOL}{"pmix.spwn.tool"}{bool}{ | |||
Indicate that the job being spawned is a tool. | |||
Indicate that the job being spawned is a tool. The repercussions of setting this attribute varies based on implementation. For example, some implementations may not perform cpu-binding on process marked as a tool. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PMIx implementation isn't doing anything here - we just pass the attribute informing the host that the spawn request is launching a tool. What that means is totally up to the host.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback. We will re-visit these and consider updating to make it clearer that PMIx is just the messenger in all of these use cases. We just wanted to make it clear that the format or input/output was not designated by PMIx, but agree that the best way to do that is to make it cleat that PMIx is only passing these through.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the question here again is that you stated the behavior "...varies based on implementation", but not on which implementation - the host or the PMIx library? I'm not aware of any PMIx repercussions, nor was that the intent of the attribute - it was provided so that the host environment would know this is a tool being spawned, and if they wanted to do something different for tools, then they could do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just trying to help clarify - I think much of this has to do with clarifying which implementation (host or PMIx) your changes are talking about.
Chap_API_Proc_Mgmt.tex
Outdated
@@ -356,15 +372,15 @@ \subsection{Spawn attributes} | |||
} | |||
% | |||
\declareAttribute{PMIX_HOSTFILE}{"pmix.hostfile"}{char*}{ | |||
Hostfile to use for spawned processes. | |||
Hostfile to use for spawned processes. The format of this file is determined by the \ac{PMIx} implementation or host environment. The file may not be portable across implementations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do support local fork/exec from within the PMIx library, but that is nowhere equivalent to spawn as we provide no support for launching things across nodes. We really should stick with the divisions we have had since day one and not muddy the water.
Chap_API_Proc_Mgmt.tex
Outdated
@@ -432,7 +448,7 @@ \subsection{Spawn attributes} | |||
} | |||
% | |||
\declareAttribute{PMIX_INDEX_ARGV}{"pmix.indxargv"}{bool}{ | |||
Mark the \code{argv} with the rank of the process. | |||
If set to true, will use the given name of the executable (\code{argv[0]}) as a base name and each rank will be invoked using the base name with the string "-<\emph{rank}>" appended to it, where \emph{rank} is the \ac{PMIx} rank of the process being invoked. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noting that the change adds language like "...will use...", but leaves hanging the question of "who" will use the name. If you want to clarify, it might be something like "host is requested to use...". Or you can split this list to indicate which attributes are passed to the host for execution. Not entirely sure of the best way. Just trying to help you in your quest to clarify exactly what we mean here, and not replace one confusion with another.
Chap_API_Proc_Mgmt.tex
Outdated
@@ -468,7 +484,7 @@ \subsection{Spawn attributes} | |||
} | |||
% | |||
\declareAttribute{PMIX_SPAWN_TOOL}{"pmix.spwn.tool"}{bool}{ | |||
Indicate that the job being spawned is a tool. | |||
Indicate that the job being spawned is a tool. The repercussions of setting this attribute varies based on implementation. For example, some implementations may not perform cpu-binding on process marked as a tool. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the question here again is that you stated the behavior "...varies based on implementation", but not on which implementation - the host or the PMIx library? I'm not aware of any PMIx repercussions, nor was that the intent of the attribute - it was provided so that the host environment would know this is a tool being spawned, and if they wanted to do something different for tools, then they could do so.
Signed-off-by: David Solt <[email protected]>
Please use emoji reactions ON THIS COMMENT to indicate your position on this proposal.
Here are the meanings for the emojis:
|
@@ -1505,7 +1560,7 @@ \subsection{Device type} | |||
The \refstruct{pmix_device_type_t} is a \code{uint64_t} bitmask for identifying the type(s) whose distances are being requested, or the type of a specific device being referenced (e.g., in a \refstruct{pmix_device_distance_t} object). | |||
|
|||
\copySignature{pmix_device_type_t}{1.0}{ | |||
typedef uint16_t pmix_device_type_t; | |||
typedef uint64_t pmix_device_type_t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this an ABI change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is an errata, the ABI document should already have it listed as a uint64_t
\item \refconst{PMIX_ERR_JOB_INSUFFICIENT_RESOURCES} Insufficient resources to spawn job | ||
\item \refconst{PMIX_ERR_JOB_SYS_OP_FAILED} System library operation failed | ||
\item \refconst{PMIX_ERR_JOB_WDIR_NOT_FOUND} Specified working directory not found | ||
\item a non-zero \ac{PMIx} error constant indicating a reason for the request's failure. | ||
\end{itemize} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a pastrefconst type of macro that we could use here instead of duplicating descriptions (which can get out of sync). If not, can we add such a macro across the standard (not just this chapter)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not appear to have one. Looks like one should be pretty easy to create, but should be applied across all chapters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tracking issue #512
Added '.' to end of many refconst descriptions Changed provide back to display for PMIX_DISPLAY_MAP Changed adviceimpl to advicerm for how process info is shared between parent/child Fixed incorrect cut/paste of PMIX_SUCCESS description Fixed case where argin was accidentally duplicated Signed-off-by: David Solt <[email protected]>
Signed-off-by: David Solt <[email protected]>
PR500 passed the 2nd vote at ASC 2024-Q3 meeting. |
No description provided.