Skip to content
This repository has been archived by the owner on May 14, 2024. It is now read-only.

Usage over 100% not necessarily locking all accounts #361

Open
Comeani opened this issue Jan 26, 2024 · 0 comments
Open

Usage over 100% not necessarily locking all accounts #361

Comeani opened this issue Jan 26, 2024 · 0 comments

Comments

@Comeani
Copy link
Collaborator

Comeani commented Jan 26, 2024

It was noticed earlier this month that the jwang SLURM account had usage on the GPU cluster that exceeded 100% of their awarded amount in their current proposal, but were not locked by the daily update status mechanism.

Looking through the bank log, locking appeared to happen on jan 13, and then again the next day (with no indication of a manual unlock in between).
I haven't confirmed this yet but my guess at why this is happening is that the condition for being included as a SLURM account in update status is too broad (being "unlocked" on any cluster), and the locking is not broad enough (If I remember correctly, the only clusters considered for locking are those with SUs in the active proposal).
In a configuration where a new cluster (Teach, for example) is unlocked, but we don't explicitly add an allocation when awarding SUs on that cluster, it will never get locked, and always trip the conditional that qualifies the SLURM account for the locking check.

As for getting to the >100% usage in the first place, I think this may be related to how in the previous instance of the bank, people could use SUs from other clusters to cover their usage on a different cluster. The previous bank could have been indicating 100% usage on a cluster when in fact SLURM's accounting that we rely on now for the usage would indicate a much higher value.

We may just want to update the logging for now to provide more detail about why the account was considered for locking, what the SUs values were when locking occurred, etc. especially if this starts to impact more accounts before we are able to switch to Keystone.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant