idea: add a method to determine the context within Flux in which a process is currently running #3817

grondo · 2021-08-07T21:54:46Z

As noted in #3744, it would be useful to have some way for a process to determine the context in which it is running as it relates to Flux jobs, initial program, etc. Off the top of my head, I can think of a few different contexts we might want to delineate:

Not within any Flux instance (flux_open () fails with ENOENT)
Enclosing instance is the multi-user system instance (instance-level attribute is 0, security.owner != current UID, jobid attribute not set)
Enclosing instance is a job in a foreign RM or flux start --test-size session, and process is running as part of initial program (same as above, but security.ower == current uid, instance-level is 0)
Enclosing instance is a Flux job and process is running as part of initial program (jobid is set, `instance-level > 0)
Enclosing instance is a Flux job and process is part of a job within that instance

AFAICT, there is not a good way to easily determine the difference between 4 and 5 above. Perhaps less importantly, there is not a clean way to tell the difference between 2 and 3 either (in the case a process is running with the UID of the flux user for example)

It might be nice if we could add a function that would return "something" to allow a process to differentiate between these different contexts. Since "context" is actually a bit of an overloaded term, we might need something different, but the only idea I've come up with so far is to have a set of named process "scopes".
This should be just considered an early idea at this point and we can iterate as much as people desire, or even throw out this idea as unnecessary if it will cause too much confusion.

Here's a first cut at names for the "scopes" outlined above:

1: none
2: system
3,4: initial program (I suppose instance-level could be used to differentiate these two)
5: job

We could add an API call flux_get_process_scope(3) which would return one of these strings, and would allow programs to alter behavior based on their current context. For the example of flux bcast it could abort with a warning if run in job scope since it likely doesn't make sense to run that command as a job.

A flux scope command could simply print the result of flux_get_process_scope(3) for use in scripts, etc.

The text was updated successfully, but these errors were encountered:

grondo · 2021-12-08T01:50:50Z

In discussing the repercussions of our inability to determine if a process is running in the "scope" of a Flux instance or a job within a Flux instance with @ofaaland, we had the idea to use a simple environment variable set by the job shell, but cleared by the flux-broker. Keying off this environment variable would allow flux_get_process_scope() or similar to determine whether the scope is job or initial program (perhaps instance is a better name for that one, I don't know)

This would be trivial to implement and would assist @ofaaland's use case immediately.

For now, the FLUX_KVS_NAMESPACE environment variable could be used as a stand-in for any future environment variable, since it is set only for jobs and cleared for the initial program.

garlick · 2021-12-08T05:32:41Z

It sounds like this could be helpful. Were you thinking the prototype would be something like this?

const char *flux_get_process_scope (void)

Maybe init would be OK as an abbreviation for initial program? A short, one word scope would be a little nicer popping out of a flux scope command.

ofaaland · 2021-12-08T07:23:45Z

Or perhaps this? enum flux_process_scope { init, job }; enum flux_process_scope flux_get_process_scope (void); Then the consumer can use the returned value directly in a conditional expression, and avoid bugs like strcmp(scope, "iniital").

…

________________________________________ From: Jim Garlick ***@***.***> Sent: Tuesday, December 7, 2021 9:32 PM To: flux-framework/flux-core Cc: Faaland, Olaf P.; Mention Subject: Re: [flux-framework/flux-core] idea: add a method to determine the context within Flux in which a process is currently running (#3817) It sounds like this could be helpful. Were you thinking the prototype would be something like this? const char *flux_get_process_scope (void) Maybe init would be OK as an abbreviation for initial program? A short, one word scope would be a little nicer popping out of a flux scope command. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<https://urldefense.us/v3/__https://github.com/flux-framework/flux-core/issues/3817*issuecomment-988513596__;Iw!!G2kpM7uM-TzIFchu!kqeB6bLHadlDQtoB17Hj1isA_mwW8O0UN5fck299GZ1qjfTgQS2EGBV6LE97Lm3TUA$>, or unsubscribe<https://urldefense.us/v3/__https://github.com/notifications/unsubscribe-auth/AB73C4727RZRS5NEXOBT4PDUP3UYJANCNFSM5BXZSGGQ__;!!G2kpM7uM-TzIFchu!kqeB6bLHadlDQtoB17Hj1isA_mwW8O0UN5fck299GZ1qjfTgQS2EGBV6LE8jN4QLSg$>. Triage notifications on the go with GitHub Mobile for iOS<https://urldefense.us/v3/__https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675__;!!G2kpM7uM-TzIFchu!kqeB6bLHadlDQtoB17Hj1isA_mwW8O0UN5fck299GZ1qjfTgQS2EGBV6LE9cirqD9g$> or Android<https://urldefense.us/v3/__https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign*3Dnotification-email*26utm_medium*3Demail*26utm_source*3Dgithub__;JSUlJSU!!G2kpM7uM-TzIFchu!kqeB6bLHadlDQtoB17Hj1isA_mwW8O0UN5fck299GZ1qjfTgQS2EGBV6LE-nQRvMAA$>.

Look in the environment for FLUX_JOB_ID. Parse it to obtain the 64-bit unsigned integer representation and store it. Determine whether to query the current flux instance or the parent for the expiration time of the allocation. Note that this currently works by checking the environment for FLUX_KVS_NAMESPACE, but flux will provide a more explicit mechanism in the future. See flux-framework/flux-core#3817 for details and status. Fetch the expiration time and calculate remaining time based on that.

Look in the environment for FLUX_JOB_ID. Parse it to obtain the 64-bit unsigned integer representation and store it. Determine whether to query the current flux instance or the parent for the expiration time of the allocation. Note that this currently works by checking the environment for FLUX_KVS_NAMESPACE, but flux will provide a more explicit mechanism in the future. See flux-framework/flux-core#3817 for details and status. Fetch the expiration time and calculate remaining time based on that. Support get_rank() by looking in the environment for FLUX_TASK_RANK.

Add configure check X_AC_FLUX based on X_AC_LSF. When the user does not specify a location, use pkg-config to determine whether flux-core is installed and where. Otherwise look for flux-core.h and attempt to link to flux_open(). At runtime, look in the environment for FLUX_JOB_ID. Determine whether to query the current flux instance or the parent for the expiration time of the allocation. Note that this currently works by checking the environment for FLUX_KVS_NAMESPACE, but flux will provide a more explicit mechanism in the future. See flux-framework/flux-core#3817 for details and status. Fetch the expiration time and calculate remaining time based on that. Support get_rank() by looking in the environment for FLUX_TASK_RANK.

Add configure check X_AC_FLUX based on X_AC_LSF. When the user does not specify a location, use pkg-config to determine whether flux-core is installed and where. Otherwise look for flux/core.h and attempt to link against flux-core.so to use flux_open(). At runtime, look in the environment for FLUX_JOB_ID. Determine whether to query the current flux instance or the parent for the expiration time of the allocation. Note that this currently works by checking the environment for FLUX_KVS_NAMESPACE, but flux will provide a more explicit mechanism in the future. See flux-framework/flux-core#3817 for details and status. Fetch the expiration time and calculate remaining time based on that. Support get_rank() by looking in the environment for FLUX_TASK_RANK.

jameshcorbett · 2022-08-12T21:38:33Z

The ability to distinguish between 1, 2, and 3/4 would be very useful for some workflow systems I either know about or work on directly.

chu11 · 2022-10-17T18:21:10Z

apologies, what is the difference between #4 and #5 above? There's a subtlety I'm missing.

garlick · 2022-10-17T18:29:00Z

The initial program (4) is not running as a job in its instance. It's just spawned directly by the broker. If there's a FLUX_JOB_ID set in its environment, it's the job ID of the flux instance in its enclosing instance.

The job (5) on the other hand is spawned by the flux shell and has a job ID in the flux instance.

Edit: confusing hence the need for tools :-)

chu11 · 2022-10-17T18:33:14Z

@garlick ahh, so basically "flux start foo.sh" vs "flux start flux mini run foo.sh"

grondo · 2022-10-17T18:35:29Z

in real world terms (4) is a batch script and associated processes (inlcluding the flux mini run in your example) while (5) is actual parallel job tasks.

ofaaland · 2022-10-17T18:36:29Z

FWIW, If the documentation includes one example command or situation for each state, it might bo a long way towards helping users understand what the states are and how to use flux correctly.

…

________________________________________ From: Al Chu ***@***.***> Sent: Monday, October 17, 2022 11:33 AM To: flux-framework/flux-core Cc: Faaland, Olaf P.; Mention Subject: Re: [flux-framework/flux-core] idea: add a method to determine the context within Flux in which a process is currently running (#3817) @garlick<https://urldefense.us/v3/__https://github.com/garlick__;!!G2kpM7uM-TzIFchu!lcdpe8S1ovt3R1NrRxsFdPVHVl0CzSxf0n4QvJ478_XSRoor0VNMKEHs-Kj2olpAJg$> ahh, so basically "flux start foo.sh" vs "flux start flux mini run foo.sh" — Reply to this email directly, view it on GitHub<https://urldefense.us/v3/__https://github.com/flux-framework/flux-core/issues/3817*issuecomment-1281306756__;Iw!!G2kpM7uM-TzIFchu!lcdpe8S1ovt3R1NrRxsFdPVHVl0CzSxf0n4QvJ478_XSRoor0VNMKEHs-Kgc3pZGBQ$>, or unsubscribe<https://urldefense.us/v3/__https://github.com/notifications/unsubscribe-auth/AB73C454FYKIXHT33SAZTRTWDWLXJANCNFSM5BXZSGGQ__;!!G2kpM7uM-TzIFchu!lcdpe8S1ovt3R1NrRxsFdPVHVl0CzSxf0n4QvJ478_XSRoor0VNMKEHs-KgFFW1cuA$>. You are receiving this because you were mentioned.Message ID: ***@***.***>

chu11 · 2022-10-18T22:49:39Z

slowly beginning to work on this and amongst the contexts listed above, it was hard to distinguish between a few of them in my head. As I thought about it, I think there's two different things trying to be differentiated:

what "flux instance" am I running under, i.e. system instance, user instance (i.e. flux start --test-size), job instance (i.e. flux mini submit flux start)
am i the initial program or a job

would two separate functions for these two things be better? I seems like we're mixing two thing together into one.

Aside, I guess for me, when I started "permutating" things i couldn't understand why the potential scopes weren't

1 - none
~~2A - enclosing instance system instance, process is initial program~~
2B - enclosing instance system instance, process is job
3A - enclosing instance job in foreign RM / flux start --test-size, process is initial program
3B - enclosing instance job in foreign RM / flux start --test-size, process is a job
4 - enclosing instance is flux job, process is initial program
5 - enclosing instance is flux job, process is a job

I suppose 2A is only conceptually possible??? although practically stupid

Edit: Oh wait, system instance is started via systemd, so I think impossible?

grondo · 2022-10-19T01:29:04Z

would two separate functions for these two things be better? I seems like we're mixing two thing together into one.

There are already simple ways to determine if the enclosing instance is a system instance vs single-user instance, or if you are in an initial program or a job (actually we have hit a problem here since FLUX_JOB_ID is set for the initial program, but that can be fixed).

I think the purpose of this issue is to add a single function that makes it easy for a caller to determine their rough "context", so that callers can make simple decisions with a single call to the Flux API.

Aside, I guess for me, when I started "permutating" things i couldn't understand why the potential scopes weren't

I think the initial scopes listed above were the conclusion of the particular use cases we had in mind. i.e. these were the 3 or 4 cases that were important to differentiate. There is balance between adding every permutation and keeping the call useful, i.e. we don't want every caller to have to have a long conditional to match every case where the current process is part of an initial program (i.e. batch script). It is better to IMO to keep the interface simple and cater to the common use case.

Edit: But I meant to say if we find a need to differentiate a couple other cases then that is fine too, but we should err on the side of simplicity. (e.g. I can't think of a reason a process would need to know whether it was in the "initial program" of a job that was running in a system instance, vs a single user Flux instance, vs a foreign RM, vs flux start --test-size. The whole point of Flux is that it shouldn't matter, and if it does (i.e. you need to talk to the parent, then you can further refine by checking attributes...)

garlick · 2022-10-25T16:06:28Z

I think if we create a flux_get_process_scope() API call, we should be sure it returns something sensible no matter where it is used. Looking over the current PR it would seem to fall short when called from a flux-proxy environment, or from anything running as instance owner in the system instance (cron jobs, perilog scripts, rc scripts).

Also, if we need a broker connection to obtain attributes to make the determination, it seems like we should allow that to be passed in to the API call so that a user doesn't have to connect to the broker twice (assuming they want to do more fluxish stuff), but then how do we know that the broker connection is the correct one?

IMHO it might be wise at this stage to provide a flux_get_remaining_time() call or similar, to constrain any heuristics to this one use case.

Sorry @chu11 to make this discouraging comment after a PR is already posted. I find this problem confusing to think about and the PR actually helped me make more sense of it than when we were discussing it here in the abstract.

grondo added enhancement design don't expect this to ever be closed... labels Aug 7, 2021

chu11 self-assigned this Oct 17, 2022

chu11 mentioned this issue Oct 19, 2022

libflux: support flux_get_process_scope() #4699

Closed

garlick mentioned this issue Oct 25, 2022

FLUX_JOB_ID is set in batch scripts #4716

Closed

grondo mentioned this issue Dec 5, 2024

idea: add FLUX environment variable that holds "closest enclosing jobid" #6474

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

idea: add a method to determine the context within Flux in which a process is currently running #3817

idea: add a method to determine the context within Flux in which a process is currently running #3817

grondo commented Aug 7, 2021 •

edited

Loading

grondo commented Dec 8, 2021

garlick commented Dec 8, 2021

ofaaland commented Dec 8, 2021 via email

jameshcorbett commented Aug 12, 2022

chu11 commented Oct 17, 2022

garlick commented Oct 17, 2022 •

edited

Loading

chu11 commented Oct 17, 2022

grondo commented Oct 17, 2022

ofaaland commented Oct 17, 2022 via email

chu11 commented Oct 18, 2022 •

edited

Loading

grondo commented Oct 19, 2022 •

edited

Loading

garlick commented Oct 25, 2022

idea: add a method to determine the context within Flux in which a process is currently running #3817

idea: add a method to determine the context within Flux in which a process is currently running #3817

Comments

grondo commented Aug 7, 2021 • edited Loading

grondo commented Dec 8, 2021

garlick commented Dec 8, 2021

ofaaland commented Dec 8, 2021 via email

jameshcorbett commented Aug 12, 2022

chu11 commented Oct 17, 2022

garlick commented Oct 17, 2022 • edited Loading

chu11 commented Oct 17, 2022

grondo commented Oct 17, 2022

ofaaland commented Oct 17, 2022 via email

chu11 commented Oct 18, 2022 • edited Loading

grondo commented Oct 19, 2022 • edited Loading

garlick commented Oct 25, 2022

grondo commented Aug 7, 2021 •

edited

Loading

garlick commented Oct 17, 2022 •

edited

Loading

chu11 commented Oct 18, 2022 •

edited

Loading

grondo commented Oct 19, 2022 •

edited

Loading