Usage over 100% not necessarily locking all accounts #361

Comeani · 2024-01-26T16:09:10Z

It was noticed earlier this month that the jwang SLURM account had usage on the GPU cluster that exceeded 100% of their awarded amount in their current proposal, but were not locked by the daily update status mechanism.

Looking through the bank log, locking appeared to happen on jan 13, and then again the next day (with no indication of a manual unlock in between).
I haven't confirmed this yet but my guess at why this is happening is that the condition for being included as a SLURM account in update status is too broad (being "unlocked" on any cluster), and the locking is not broad enough (If I remember correctly, the only clusters considered for locking are those with SUs in the active proposal).
In a configuration where a new cluster (Teach, for example) is unlocked, but we don't explicitly add an allocation when awarding SUs on that cluster, it will never get locked, and always trip the conditional that qualifies the SLURM account for the locking check.

As for getting to the >100% usage in the first place, I think this may be related to how in the previous instance of the bank, people could use SUs from other clusters to cover their usage on a different cluster. The previous bank could have been indicating 100% usage on a cluster when in fact SLURM's accounting that we rely on now for the usage would indicate a much higher value.

We may just want to update the logging for now to provide more detail about why the account was considered for locking, what the SUs values were when locking occurred, etc. especially if this starts to impact more accounts before we are able to switch to Keystone.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Usage over 100% not necessarily locking all accounts #361

Usage over 100% not necessarily locking all accounts #361

Comeani commented Jan 26, 2024

Usage over 100% not necessarily locking all accounts #361

Usage over 100% not necessarily locking all accounts #361

Comments

Comeani commented Jan 26, 2024