Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial commit for IAWG review of Process Management chapter #500

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

dsolt
Copy link
Contributor

@dsolt dsolt commented Oct 2, 2023

No description provided.

@dsolt dsolt added the WorkInProgress Work In Progress label Oct 2, 2023
Copy link
Member

@rhc54 rhc54 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hope the comments help clarify things.

@@ -239,7 +217,7 @@ \subsection{\code{PMIx_Spawn_nb}}
\reqattrend

\optattrstart
The following attributes are optional for host environments that support this operation:
The following attributes are optional for \ac{PMIx} implementations:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm...those operations are not performed within PMIx. The attribute is simply communicated to the host for execution. I suspect you are trying to say that a PMIx implementation doesn't have to define them, but that doesn't make much sense as we don't provide that option elsewhere. You have to define it - the host just may not implement support for it.

Copy link
Contributor Author

@dsolt dsolt Mar 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

davidsolt@davids-mbp-2 pmix-standard % grep -h "The following attributes are optional" *.tex | sort -u
The following attributes are optional for \ac{PMIx} implementations:
The following attributes are optional for \ac{PMIx} libraries that support \ac{IO} forwarding:
The following attributes are optional for host environments or \ac{PMIx} libraries that support this operation:
The following attributes are optional for host environments that support this operation:
The following attributes are optional for host environments:
The following attributes are optional for implementers of \ac{PMIx} libraries:

I think we should just look at this issue from a higher level and ask what we want to say across the board. I also don't think PMIx should specify whether PMIx implements something internally or whether it passes it along to a host environment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This chapter even has sections where some attributes are specifically called out as optional for PMIx implementations and other attributes are marked as optional for host environments for the same API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure - it is optional to implement support for those attributes. We require support for others, and we are very clear which ones are optional for the implementation vs the host vs both.

It is critical that PMIx remain in the "messenger" role and not shift into being the "doer" of things. We do internally support PMIx functions, but we want to be very careful and clear where the line lies. And no, that isn't optional across PMIx implementations - or else the host now has to deal with uncertainty over what the implementation may or may not do, which will be very unpopular (should there ever be another implementation).

@@ -356,15 +372,15 @@ \subsection{Spawn attributes}
}
%
\declareAttribute{PMIX_HOSTFILE}{"pmix.hostfile"}{char*}{
Hostfile to use for spawned processes.
Hostfile to use for spawned processes. The format of this file is determined by the \ac{PMIx} implementation or host environment. The file may not be portable across implementations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing to do with the PMIx implementation - we just pass the name of the file.

Copy link
Contributor Author

@dsolt dsolt Mar 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comes down to whether we want to force the division of labor between PMIx or the host environment. We didn't want to preclude the possibility that someone say, took OpenPMix and decided to implement spawn directly inside of it. If we think that is never going to happen, we can go with:

The format of this file is determined by the host environment, therefore a file may not be portable across systems with different host environments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do support local fork/exec from within the PMIx library, but that is nowhere equivalent to spawn as we provide no support for launching things across nodes. We really should stick with the divisions we have had since day one and not muddy the water.

}
%
\declareAttribute{PMIX_ADD_HOST}{"pmix.addhost"}{char*}{
Comma-delimited list of hosts to add to the allocation.
}
%
\declareAttribute{PMIX_ADD_HOSTFILE}{"pmix.addhostfile"}{char*}{
Hostfile containing hosts to add to existing allocation.
Hostfile containing hosts to add to existing allocation. The format of this file is determined by the \ac{PMIx} implementation or host environment. The file may not be portable across implementations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto here - we only pass the name. We do not process the file.

Chap_API_Proc_Mgmt.tex Outdated Show resolved Hide resolved
@@ -432,7 +448,7 @@ \subsection{Spawn attributes}
}
%
\declareAttribute{PMIX_INDEX_ARGV}{"pmix.indxargv"}{bool}{
Mark the \code{argv} with the rank of the process.
If set to true, will use the given name of the executable (\code{argv[0]}) as a base name and each rank will be invoked using the base name with the string "-<\emph{rank}>" appended to it, where \emph{rank} is the \ac{PMIx} rank of the process being invoked.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably make clear that PMIx isn't doing this - we are passing an attribute to the host environment requesting that it do this. Whether or not it happens is completely up to the host.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true for SO many attributes throughout the standard. I don't know if we want to get into the job of specifying for every attribute how it is used. Someone reading the standard just wants to know what the attribute does.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting that the change adds language like "...will use...", but leaves hanging the question of "who" will use the name. If you want to clarify, it might be something like "host is requested to use...". Or you can split this list to indicate which attributes are passed to the host for execution. Not entirely sure of the best way. Just trying to help you in your quest to clarify exactly what we mean here, and not replace one confusion with another.

@@ -448,7 +464,7 @@ \subsection{Spawn attributes}
}
%
\declareAttribute{PMIX_REPORT_BINDINGS}{"pmix.repbind"}{bool}{
Report bindings of the individual processes.
Report bindings of the individual processes. How and where this information is reported is implementation dependent as well as dependent on whether the processes are created through a launching tool or by a direct call to \refapi{PMIx_Spawn}.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, PMIx isn't doing anything here - we just pass the attribute to the host requesting that it generate this report.

@@ -468,7 +484,7 @@ \subsection{Spawn attributes}
}
%
\declareAttribute{PMIX_SPAWN_TOOL}{"pmix.spwn.tool"}{bool}{
Indicate that the job being spawned is a tool.
Indicate that the job being spawned is a tool. The repercussions of setting this attribute varies based on implementation. For example, some implementations may not perform cpu-binding on process marked as a tool.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PMIx implementation isn't doing anything here - we just pass the attribute informing the host that the spawn request is launching a tool. What that means is totally up to the host.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback. We will re-visit these and consider updating to make it clearer that PMIx is just the messenger in all of these use cases. We just wanted to make it clear that the format or input/output was not designated by PMIx, but agree that the best way to do that is to make it cleat that PMIx is only passing these through.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the question here again is that you stated the behavior "...varies based on implementation", but not on which implementation - the host or the PMIx library? I'm not aware of any PMIx repercussions, nor was that the intent of the attribute - it was provided so that the host environment would know this is a tool being spawned, and if they wanted to do something different for tools, then they could do so.

Copy link
Member

@rhc54 rhc54 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just trying to help clarify - I think much of this has to do with clarifying which implementation (host or PMIx) your changes are talking about.

@@ -356,15 +372,15 @@ \subsection{Spawn attributes}
}
%
\declareAttribute{PMIX_HOSTFILE}{"pmix.hostfile"}{char*}{
Hostfile to use for spawned processes.
Hostfile to use for spawned processes. The format of this file is determined by the \ac{PMIx} implementation or host environment. The file may not be portable across implementations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do support local fork/exec from within the PMIx library, but that is nowhere equivalent to spawn as we provide no support for launching things across nodes. We really should stick with the divisions we have had since day one and not muddy the water.

Chap_API_Proc_Mgmt.tex Outdated Show resolved Hide resolved
@@ -432,7 +448,7 @@ \subsection{Spawn attributes}
}
%
\declareAttribute{PMIX_INDEX_ARGV}{"pmix.indxargv"}{bool}{
Mark the \code{argv} with the rank of the process.
If set to true, will use the given name of the executable (\code{argv[0]}) as a base name and each rank will be invoked using the base name with the string "-<\emph{rank}>" appended to it, where \emph{rank} is the \ac{PMIx} rank of the process being invoked.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting that the change adds language like "...will use...", but leaves hanging the question of "who" will use the name. If you want to clarify, it might be something like "host is requested to use...". Or you can split this list to indicate which attributes are passed to the host for execution. Not entirely sure of the best way. Just trying to help you in your quest to clarify exactly what we mean here, and not replace one confusion with another.

@@ -468,7 +484,7 @@ \subsection{Spawn attributes}
}
%
\declareAttribute{PMIX_SPAWN_TOOL}{"pmix.spwn.tool"}{bool}{
Indicate that the job being spawned is a tool.
Indicate that the job being spawned is a tool. The repercussions of setting this attribute varies based on implementation. For example, some implementations may not perform cpu-binding on process marked as a tool.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the question here again is that you stated the behavior "...varies based on implementation", but not on which implementation - the host or the PMIx library? I'm not aware of any PMIx repercussions, nor was that the intent of the attribute - it was provided so that the host environment would know this is a tool being spawned, and if they wanted to do something different for tools, then they could do so.

Signed-off-by: David Solt <[email protected]>
@dsolt dsolt removed the WorkInProgress Work In Progress label Mar 25, 2024
@dsolt dsolt added the RFC Request for Comment label Apr 11, 2024
@dsolt
Copy link
Contributor Author

dsolt commented Apr 11, 2024

Please use emoji reactions ON THIS COMMENT to indicate your position on this proposal.

  • You do not need to vote on every proposal
  • If you have no opinion, don't vote - that is also useful data
  • If you've already commented on this issue, please still vote so
    we know your current thoughts
  • Not all proposals solve exactly the same problem, so we may end
    up accepting proposals that appear to have some overlap
    This is not a binding majority-rule vote, but it will be a very
    significant input into the corresponding ASC decision.

Here are the meanings for the emojis:

  • Hooray or Rocket: I support this so strongly that I
    want to be an advocate for it
  • Heart: I think this is an ideal solution
  • Thumbs up: I'd be happy with this solution
  • Confused: I'd rather we not do this, but I can tolerate it
  • Thumbs down: I'd be actively unhappy, and may even consider
    other technologies instead
    If you want to explain in more detail, feel free to add another
    comment, but please also vote on this comment.

@abouteiller abouteiller added the Eligible Eligible for consideration by ASC label Apr 12, 2024
@abouteiller abouteiller added this to the PMIx v5.1 Standard milestone Apr 12, 2024
Chap_API_Proc_Mgmt.tex Outdated Show resolved Hide resolved
Chap_API_Proc_Mgmt.tex Show resolved Hide resolved
Chap_API_Proc_Mgmt.tex Outdated Show resolved Hide resolved
@@ -1505,7 +1560,7 @@ \subsection{Device type}
The \refstruct{pmix_device_type_t} is a \code{uint64_t} bitmask for identifying the type(s) whose distances are being requested, or the type of a specific device being referenced (e.g., in a \refstruct{pmix_device_distance_t} object).

\copySignature{pmix_device_type_t}{1.0}{
typedef uint16_t pmix_device_type_t;
typedef uint64_t pmix_device_type_t;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this an ABI change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an errata, the ABI document should already have it listed as a uint64_t

Chap_API_Proc_Mgmt.tex Show resolved Hide resolved
\item \refconst{PMIX_ERR_JOB_INSUFFICIENT_RESOURCES} Insufficient resources to spawn job
\item \refconst{PMIX_ERR_JOB_SYS_OP_FAILED} System library operation failed
\item \refconst{PMIX_ERR_JOB_WDIR_NOT_FOUND} Specified working directory not found
\item a non-zero \ac{PMIx} error constant indicating a reason for the request's failure.
\end{itemize}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a pastrefconst type of macro that we could use here instead of duplicating descriptions (which can get out of sync). If not, can we add such a macro across the standard (not just this chapter)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not appear to have one. Looks like one should be pretty easy to create, but should be applied across all chapters?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tracking issue #512

Chap_API_Proc_Mgmt.tex Outdated Show resolved Hide resolved
Chap_API_Proc_Mgmt.tex Outdated Show resolved Hide resolved
Chap_API_Proc_Mgmt.tex Show resolved Hide resolved
Chap_API_Proc_Mgmt.tex Show resolved Hide resolved
dsolt added 2 commits May 9, 2024 08:02
Added '.' to end of many refconst descriptions
Changed provide back to display for PMIX_DISPLAY_MAP
Changed adviceimpl to advicerm for how process info is shared between parent/child
Fixed incorrect cut/paste of PMIX_SUCCESS description
Fixed case where argin was accidentally duplicated

Signed-off-by: David Solt <[email protected]>
@naughtont3 naughtont3 added the Accepted as Stable ASC second vote passed. Accepted as Stable! label Jul 18, 2024
@naughtont3
Copy link
Contributor

PR500 passed the 2nd vote at ASC 2024-Q3 meeting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Accepted as Stable ASC second vote passed. Accepted as Stable! Eligible Eligible for consideration by ASC First Vote Passed ASC first vote passed Major Text Change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants