-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run Phoenix jobs using available CPU cores #266
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my latest PR, we will build and test on a compute node, by not issuing a build command, only a test command. This also goes around the problem where you end up building multiple times.
I would have to look into this post_process issue but I recall this only being a problem for debug builds with HDF5.
Thanks! Also (per above) using |
Can we specify the submit partition and other |
That's interesting. Could you try prepending |
One issue with the |
True, but I don't think this has ever happened to us. So, I'm not worried about it for now. |
No luck (with |
@henryleberre I noticed that all the running binaries from
|
Requesting advice from @henryleberre on the CPU affinity/subprocesses issue since the slowness of the Phoenix CPU runner was the real reason for this PR. If this will be fixed in PR #257 then I can just merge this PR. Update: Just to double check, indeed |
I discovered that adding |
I'm attempting to improve runtime on Phoenix runners. Requesting a node without specifying the cpu-small partition can hang for a bit.
I'm noticing that there's a problem with
./mfc.sh test -a
always rebuilding HDF5/Silo even if they are already built during./mfc.sh build
. This is slowing things down (especially on Debug runs). I think this might be triggered via the following output:Notice the
GLOB mismatch
that occurs for all targets and thus rebuilds their dependencies...I also see that the build step on Phoenix takes 30 minutes (at least for the CPU build), but about 10 minutes if one grabs a CPU node and builds the code there. This might motivate building MFC in CI on a Phoenix compute node so we can do
-j 12
.Update @henryleberre: This should actually be
-j 24
on the build and test, the compute nodes are dual-socket 12 core Intel Golds.Update 2: Using
./mfc.sh test -j 24 -b mpirun -a
dispatches 24 jobs to 1 core on a 24 core node. Core 0 is saturated at 100% utilization (perhtop
) but others are idle. Is this an easy fix?