-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
idea: add a method to determine the context within Flux in which a process is currently running #3817
Comments
In discussing the repercussions of our inability to determine if a process is running in the "scope" of a Flux instance or a job within a Flux instance with @ofaaland, we had the idea to use a simple environment variable set by the job shell, but cleared by the This would be trivial to implement and would assist @ofaaland's use case immediately. For now, the |
It sounds like this could be helpful. Were you thinking the prototype would be something like this? const char *flux_get_process_scope (void) Maybe |
Or perhaps this?
enum flux_process_scope { init, job };
enum flux_process_scope flux_get_process_scope (void);
Then the consumer can use the returned value directly in a conditional expression, and avoid bugs like strcmp(scope, "iniital").
…________________________________________
From: Jim Garlick ***@***.***>
Sent: Tuesday, December 7, 2021 9:32 PM
To: flux-framework/flux-core
Cc: Faaland, Olaf P.; Mention
Subject: Re: [flux-framework/flux-core] idea: add a method to determine the context within Flux in which a process is currently running (#3817)
It sounds like this could be helpful. Were you thinking the prototype would be something like this?
const char *flux_get_process_scope (void)
Maybe init would be OK as an abbreviation for initial program? A short, one word scope would be a little nicer popping out of a flux scope command.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<https://urldefense.us/v3/__https://github.com/flux-framework/flux-core/issues/3817*issuecomment-988513596__;Iw!!G2kpM7uM-TzIFchu!kqeB6bLHadlDQtoB17Hj1isA_mwW8O0UN5fck299GZ1qjfTgQS2EGBV6LE97Lm3TUA$>, or unsubscribe<https://urldefense.us/v3/__https://github.com/notifications/unsubscribe-auth/AB73C4727RZRS5NEXOBT4PDUP3UYJANCNFSM5BXZSGGQ__;!!G2kpM7uM-TzIFchu!kqeB6bLHadlDQtoB17Hj1isA_mwW8O0UN5fck299GZ1qjfTgQS2EGBV6LE8jN4QLSg$>.
Triage notifications on the go with GitHub Mobile for iOS<https://urldefense.us/v3/__https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675__;!!G2kpM7uM-TzIFchu!kqeB6bLHadlDQtoB17Hj1isA_mwW8O0UN5fck299GZ1qjfTgQS2EGBV6LE9cirqD9g$> or Android<https://urldefense.us/v3/__https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign*3Dnotification-email*26utm_medium*3Demail*26utm_source*3Dgithub__;JSUlJSU!!G2kpM7uM-TzIFchu!kqeB6bLHadlDQtoB17Hj1isA_mwW8O0UN5fck299GZ1qjfTgQS2EGBV6LE-nQRvMAA$>.
|
Look in the environment for FLUX_JOB_ID. Parse it to obtain the 64-bit unsigned integer representation and store it. Determine whether to query the current flux instance or the parent for the expiration time of the allocation. Note that this currently works by checking the environment for FLUX_KVS_NAMESPACE, but flux will provide a more explicit mechanism in the future. See flux-framework/flux-core#3817 for details and status. Fetch the expiration time and calculate remaining time based on that.
Look in the environment for FLUX_JOB_ID. Parse it to obtain the 64-bit unsigned integer representation and store it. Determine whether to query the current flux instance or the parent for the expiration time of the allocation. Note that this currently works by checking the environment for FLUX_KVS_NAMESPACE, but flux will provide a more explicit mechanism in the future. See flux-framework/flux-core#3817 for details and status. Fetch the expiration time and calculate remaining time based on that. Support get_rank() by looking in the environment for FLUX_TASK_RANK.
Look in the environment for FLUX_JOB_ID. Parse it to obtain the 64-bit unsigned integer representation and store it. Determine whether to query the current flux instance or the parent for the expiration time of the allocation. Note that this currently works by checking the environment for FLUX_KVS_NAMESPACE, but flux will provide a more explicit mechanism in the future. See flux-framework/flux-core#3817 for details and status. Fetch the expiration time and calculate remaining time based on that. Support get_rank() by looking in the environment for FLUX_TASK_RANK.
Look in the environment for FLUX_JOB_ID. Parse it to obtain the 64-bit unsigned integer representation and store it. Determine whether to query the current flux instance or the parent for the expiration time of the allocation. Note that this currently works by checking the environment for FLUX_KVS_NAMESPACE, but flux will provide a more explicit mechanism in the future. See flux-framework/flux-core#3817 for details and status. Fetch the expiration time and calculate remaining time based on that. Support get_rank() by looking in the environment for FLUX_TASK_RANK.
Look in the environment for FLUX_JOB_ID. Parse it to obtain the 64-bit unsigned integer representation and store it. Determine whether to query the current flux instance or the parent for the expiration time of the allocation. Note that this currently works by checking the environment for FLUX_KVS_NAMESPACE, but flux will provide a more explicit mechanism in the future. See flux-framework/flux-core#3817 for details and status. Fetch the expiration time and calculate remaining time based on that. Support get_rank() by looking in the environment for FLUX_TASK_RANK.
Add configure check X_AC_FLUX based on X_AC_LSF. When the user does not specify a location, use pkg-config to determine whether flux-core is installed and where. Otherwise look for flux-core.h and attempt to link to flux_open(). At runtime, look in the environment for FLUX_JOB_ID. Determine whether to query the current flux instance or the parent for the expiration time of the allocation. Note that this currently works by checking the environment for FLUX_KVS_NAMESPACE, but flux will provide a more explicit mechanism in the future. See flux-framework/flux-core#3817 for details and status. Fetch the expiration time and calculate remaining time based on that. Support get_rank() by looking in the environment for FLUX_TASK_RANK.
Add configure check X_AC_FLUX based on X_AC_LSF. When the user does not specify a location, use pkg-config to determine whether flux-core is installed and where. Otherwise look for flux/core.h and attempt to link against flux-core.so to use flux_open(). At runtime, look in the environment for FLUX_JOB_ID. Determine whether to query the current flux instance or the parent for the expiration time of the allocation. Note that this currently works by checking the environment for FLUX_KVS_NAMESPACE, but flux will provide a more explicit mechanism in the future. See flux-framework/flux-core#3817 for details and status. Fetch the expiration time and calculate remaining time based on that. Support get_rank() by looking in the environment for FLUX_TASK_RANK.
The ability to distinguish between 1, 2, and 3/4 would be very useful for some workflow systems I either know about or work on directly. |
The initial program (4) is not running as a job in its instance. It's just spawned directly by the broker. If there's a FLUX_JOB_ID set in its environment, it's the job ID of the flux instance in its enclosing instance. The job (5) on the other hand is spawned by the flux shell and has a job ID in the flux instance. Edit: confusing hence the need for tools :-) |
@garlick ahh, so basically "flux start foo.sh" vs "flux start flux mini run foo.sh" |
in real world terms (4) is a batch script and associated processes (inlcluding the |
FWIW, If the documentation includes one example command or situation for each state, it might bo a long way towards helping users understand what the states are and how to use flux correctly.
…________________________________________
From: Al Chu ***@***.***>
Sent: Monday, October 17, 2022 11:33 AM
To: flux-framework/flux-core
Cc: Faaland, Olaf P.; Mention
Subject: Re: [flux-framework/flux-core] idea: add a method to determine the context within Flux in which a process is currently running (#3817)
@garlick<https://urldefense.us/v3/__https://github.com/garlick__;!!G2kpM7uM-TzIFchu!lcdpe8S1ovt3R1NrRxsFdPVHVl0CzSxf0n4QvJ478_XSRoor0VNMKEHs-Kj2olpAJg$> ahh, so basically "flux start foo.sh" vs "flux start flux mini run foo.sh"
—
Reply to this email directly, view it on GitHub<https://urldefense.us/v3/__https://github.com/flux-framework/flux-core/issues/3817*issuecomment-1281306756__;Iw!!G2kpM7uM-TzIFchu!lcdpe8S1ovt3R1NrRxsFdPVHVl0CzSxf0n4QvJ478_XSRoor0VNMKEHs-Kgc3pZGBQ$>, or unsubscribe<https://urldefense.us/v3/__https://github.com/notifications/unsubscribe-auth/AB73C454FYKIXHT33SAZTRTWDWLXJANCNFSM5BXZSGGQ__;!!G2kpM7uM-TzIFchu!lcdpe8S1ovt3R1NrRxsFdPVHVl0CzSxf0n4QvJ478_XSRoor0VNMKEHs-KgFFW1cuA$>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
slowly beginning to work on this and amongst the contexts listed above, it was hard to distinguish between a few of them in my head. As I thought about it, I think there's two different things trying to be differentiated:
would two separate functions for these two things be better? I seems like we're mixing two thing together into one. Aside, I guess for me, when I started "permutating" things i couldn't understand why the potential scopes weren't 1 - none I suppose 2A is only conceptually possible??? although practically stupid Edit: Oh wait, system instance is started via systemd, so I think impossible? |
There are already simple ways to determine if the enclosing instance is a system instance vs single-user instance, or if you are in an initial program or a job (actually we have hit a problem here since I think the purpose of this issue is to add a single function that makes it easy for a caller to determine their rough "context", so that callers can make simple decisions with a single call to the Flux API.
I think the initial scopes listed above were the conclusion of the particular use cases we had in mind. i.e. these were the 3 or 4 cases that were important to differentiate. There is balance between adding every permutation and keeping the call useful, i.e. we don't want every caller to have to have a long conditional to match every case where the current process is part of an initial program (i.e. batch script). It is better to IMO to keep the interface simple and cater to the common use case. Edit: But I meant to say if we find a need to differentiate a couple other cases then that is fine too, but we should err on the side of simplicity. (e.g. I can't think of a reason a process would need to know whether it was in the "initial program" of a job that was running in a system instance, vs a single user Flux instance, vs a foreign RM, vs |
I think if we create a Also, if we need a broker connection to obtain attributes to make the determination, it seems like we should allow that to be passed in to the API call so that a user doesn't have to connect to the broker twice (assuming they want to do more fluxish stuff), but then how do we know that the broker connection is the correct one? IMHO it might be wise at this stage to provide a Sorry @chu11 to make this discouraging comment after a PR is already posted. I find this problem confusing to think about and the PR actually helped me make more sense of it than when we were discussing it here in the abstract. |
As noted in #3744, it would be useful to have some way for a process to determine the context in which it is running as it relates to Flux jobs, initial program, etc. Off the top of my head, I can think of a few different contexts we might want to delineate:
flux_open ()
fails withENOENT
)instance-level
attribute is0
,security.owner
!= current UID,jobid
attribute not set)flux start --test-size
session, and process is running as part of initial program (same as above, butsecurity.ower
== current uid,instance-level
is 0)jobid
is set, `instance-level > 0)AFAICT, there is not a good way to easily determine the difference between 4 and 5 above. Perhaps less importantly, there is not a clean way to tell the difference between 2 and 3 either (in the case a process is running with the UID of the
flux
user for example)It might be nice if we could add a function that would return "something" to allow a process to differentiate between these different contexts. Since "context" is actually a bit of an overloaded term, we might need something different, but the only idea I've come up with so far is to have a set of named process "scopes".
This should be just considered an early idea at this point and we can iterate as much as people desire, or even throw out this idea as unnecessary if it will cause too much confusion.
Here's a first cut at names for the "scopes" outlined above:
We could add an API call
flux_get_process_scope(3)
which would return one of these strings, and would allow programs to alter behavior based on their current context. For the example offlux bcast
it could abort with a warning if run injob
scope since it likely doesn't make sense to run that command as a job.A
flux scope
command could simply print the result offlux_get_process_scope(3)
for use in scripts, etc.The text was updated successfully, but these errors were encountered: