Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-42530: changes for automatic Large Glideins #25

Merged
merged 11 commits into from
Feb 17, 2024
Merged

Conversation

daues
Copy link
Contributor

@daues daues commented Jan 29, 2024

No description provided.

@MichelleGower MichelleGower changed the title changes for automatic Large Glideins DM-42530: changes for automatic Large Glideins Jan 30, 2024
Copy link

@MichelleGower MichelleGower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some minor things, but there are also some bigger things that I'd like to discuss/see changed for maintainability of the code.

python/lsst/ctrl/execute/allocator.py Show resolved Hide resolved
python/lsst/ctrl/execute/slurmPlugin.py Outdated Show resolved Hide resolved
]
owner = f'(Owner=="{auser}") '
jstat = "&& (JobStatus==1) "
bps1 = "&& (bps_run isnt Undefined) "

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took me a while to find "isnt" in the documentation. In case someone else needs the link: https://htcondor.readthedocs.io/en/latest/classads/classad-mechanism.html#expression-operators

python/lsst/ctrl/execute/slurmPlugin.py Outdated Show resolved Hide resolved
python/lsst/ctrl/execute/slurmPlugin.py Show resolved Hide resolved
python/lsst/ctrl/execute/slurmPlugin.py Outdated Show resolved Hide resolved
python/lsst/ctrl/execute/slurmPlugin.py Show resolved Hide resolved
python/lsst/ctrl/execute/slurmPlugin.py Outdated Show resolved Hide resolved
python/lsst/ctrl/execute/slurmPlugin.py Outdated Show resolved Hide resolved
python/lsst/ctrl/execute/slurmPlugin.py Outdated Show resolved Hide resolved
schedd_name = socket.getfqdn()
coll = htcondor.Collector()
schedd_ad = coll.locate(htcondor.DaemonTypes.Schedd)
scheddref = htcondor.Schedd(schedd_ad)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my understanding these "preparations" (lines 221 - 224) are made only to call condor_q() later in line 260. However, it looks like the code is concerned with identifying the local Schedd only. As a result, it is a bit redundant as condor_q() can do it for you if you omit the (optional) schedds argument.


verbose = self.isVerbose()
if len(condorq_data) == 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty Python containers are false, so the preferred way to check if condor_data contains any elements would be

if not condor_data:

print("Auto: No Large BPS Jobs.")
return

if len(condorq_data) > 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this check is necessary (or alternatively the previous one in line 266). If condor_data contains no elements we will never reach this point because of the return statement in line 268 and containers size can't be a negative number.

print(f"\n{job_label}")
existingGlideinsIdle = 0
numberOfGlideinsReduced = 0
numberOfGlideins = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why we need to set existingGlideinsIdle, numberOfGlideinsReduced, and numberOfGlideins to zero here. Each of these variables seems to be initialized with a different value later (see lines 320, 334, and 339, respectively).

large = f"&& (RequestMemory>{memoryLimit} || RequestCpus>{autoCPUs})"
# The constraint determines that the jobs to be returned belong to
# the current user (Owner) and are Idle vanilla universe jobs.
full_constraint = f"{owner}{jstat}{bps1}{bps2}{juniv}{large}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think making owner, jstat, etc. just simple clauses that is

owner = f'(Owner=="{auser}")'
jstat = f"(JobStatus=={htcondor.JobStatus.IDLE})"
...

and writing the full constraint as

full_constraint = f"{owner} && {jstat} && {bps1} && {bps2} && {juniv} && {large}"

would make the code easier to read.

Copy link

@MichelleGower MichelleGower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much better. Merge approved.

@daues daues merged commit 7928439 into main Feb 17, 2024
3 checks passed
@daues daues deleted the tickets/DM-42530 branch February 17, 2024 00:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants