Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inaccurate memory requested value #24

Open
prehensilecode opened this issue Nov 7, 2022 · 1 comment
Open

Inaccurate memory requested value #24

prehensilecode opened this issue Nov 7, 2022 · 1 comment

Comments

@prehensilecode
Copy link
Contributor

prehensilecode commented Nov 7, 2022

Environment:

  • RHEL 8.1
  • Slurm 21.08.8
  • Python 3.9.14
  • seff-array (latest master branch as of 2022-11-07 17:00 EST)

An example of an array of >400 tasks. Job request:

#SBATCH --cpus-per-task=20
#SBATCH --mem=8GB

Reported by seff are a couple examples:

Job ID: 4160740
Array Job ID: 4160739_0
Cluster: foocluster
User/Group: juser/juser
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 20
CPU Utilized: 00:11:28
CPU Efficiency: 26.46% of 00:43:20 core-walltime
Job Wall-clock time: 00:02:10
Memory Utilized: 5.58 GB
Memory Efficiency: 69.77% of 8.00 GB
Job ID: 4160954
Array Job ID: 4160739_207
Cluster: foocluster
User/Group: juser/juser
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 20
CPU Utilized: 00:25:51
CPU Efficiency: 36.93% of 01:10:00 core-walltime
Job Wall-clock time: 00:03:30
Memory Utilized: 5.46 GB
Memory Efficiency: 68.28% of 8.00 GB

Latest seff_array reports inaccurate value 409.6MB and inaccurate memory efficiency 1228.85% of 409.6MB

Job ID: 4160739
Cluster: foocluster
User/Group: juser/juser
Cores: 20
Average CPU Utilized: 01:04.22
CPU Efficiency: 36.21% of 02:57.38 core-walltime
Job Wall-clock time: 02:57.38
Average Memory Utilized: 5.03GB
Memory Efficiency: 1228.85% of 409.6MB
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Job States
COMPLETED: 414
FAILED: 1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
============================================================================== Max Memory Usage ==============================================================================
# NumSamples = 415; Min = 1.07MB; Max = 7.77GB
# Mean = 5.03GB; SD = 1.73GB; Median 5.64GB
# each ∎ represents a count of 2
  963.0KB -    1.07MB [   1]:
   1.07MB -  778.05MB [  32]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
 778.05MB -    1.56GB [   2]: ∎
   1.56GB -    2.33GB [   8]: ∎∎∎∎
   2.33GB -    3.11GB [   7]: ∎∎∎
   3.11GB -    3.89GB [  15]: ∎∎∎∎∎∎∎
   3.89GB -    4.66GB [   8]: ∎∎∎∎
   4.66GB -    5.44GB [  57]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
   5.44GB -    6.22GB [ 265]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
   6.22GB -    6.99GB [  13]: ∎∎∎∎∎∎
   6.99GB -    7.77GB [   7]: ∎∎∎
The requested memory was 409.6MB.
…
@prehensilecode
Copy link
Contributor Author

Looks like I misunderstood the value shown as "requested memory". seff-array shows the requested memory per CPU.

However the MaxRSS reported is for the task, and not a per-CPU value. So, shouldn't the memory efficiency be:

100 * rss_sum / req_mem / data_len

out of

mb_to_str(req_mem)

Or, if you want the per-CPU value:

100 * (rss_sum / req_cpus) / (req_mem / req_cpus) / data_len
== 100 * rss_sum / req_mem / data_len  # since the "req_cpus" divide out

out of

mb_to_str(req_mem / req_cpus)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant