-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
slurm commands not found #4
Comments
Not having Updating os.environ['PATH'] = ':'.join(('/usr/local/bin', os.environ['PATH'])) Slurmmon has |
Hi John, |
sudo is another thing that drops the environment customizations. There is a |
I'm having second thoughts about adding the So, in /etc/sudoers, you could modify the default PATH for the slurm user with something like:
(Or leave off the Do you think that's the better approach here and it'll work in your case? I want to make slurmmon easy to run, but I think keeping it simple, by assuming Slurm commands are in the dirs in PATH, and relying on system configuration to make that happen (rather than having slurmmon going out of it's way to find them, or not respect system configuration) is the cleaner approach. What do you think? |
I'm seeing another issue, which may be just my own fault. When I install the rpms and start slurmmond afterwards, the daemon doesn't find the slurm commands that are located in /usr/local/bin. Also the PATH environment includes the /usr/local/bin. Here's a trace of the messages that I just took:
slurmmond[5341]: starting
slurmmond[5341]: started sdiag metrics process, pid [5342]
slurmmond[5341]: started jobcount metrics process, pid [5343]
slurmmond[5341]: started reserved cores metrics process, pid [5345]
slurmmond[5341]: started [probejob-compute,IB] metrics process, pid [5346]
slurmmond(sdiag)[5342]: metrics for [slurmmond(sdiag)] failed with message [[Errno 2] No such file or directory]
slurmmond(jobcount)[5343]: metrics for [slurmmond(jobcount)] failed with message [shell code ["squeue -h -o '%u' -t PD | wc -l"] failed with exit status [0], stderr is ['/bin/sh: squeue: command not found\n']]
slurmmond(probejob-compute,IB)[5346]: metrics for [slurmmond(probejob-compute,IB)] failed with message [job submission ["sbatch '-p' 'compute,IB' '-J' 'probejob' '-n' '1' '-t' '2' '--mem' '10' '-o' '/dev/null' '-e' '/dev/null' --wrap 'true'"] failed with non-zero returncode [127] and/or non-empty stderr ['/bin/sh: sbatch: command not found']]
slurmmond(reservations)[5345]: metrics for [slurmmond(reservations)] failed with message [[Errno 2] No such file or directory]
If I add /usr/local/bin/ to all of the command calls slurmmond works.
I also tried to add a sys.path.append but this didn't work. Am I doing somehting wrong?
The text was updated successfully, but these errors were encountered: