Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run Phoenix jobs using available CPU cores #266

Merged
merged 1 commit into from
Dec 20, 2023
Merged

Conversation

sbryngelson
Copy link
Member

@sbryngelson sbryngelson commented Dec 17, 2023

I'm attempting to improve runtime on Phoenix runners. Requesting a node without specifying the cpu-small partition can hang for a bit.

I'm noticing that there's a problem with ./mfc.sh test -a always rebuilding HDF5/Silo even if they are already built during ./mfc.sh build. This is slowing things down (especially on Debug runs). I think this might be triggered via the following output:

atl1-1-03-002-15-1: p-sbryngelson3-0/MFC-2 $ time ./mfc.sh test -j 12 -b mpirun -a
mfc: OK > (venv) Entered the Python virtual environment.

      .=++*:          -+*+=.          [email protected] [Linux]
     :+   -*-        ==   =* .        -------------------------------------------------------
   :*+      ==      ++    .+-         --jobs 12
  :*##-.....:*+   .#%+++=--+=:::.     --mpi
  -=-++-======#=--**+++==+*++=::-:.   --no-gpu
 .:++=----------====+*= ==..:%.....   --no-debug
  .:-=++++===--==+=-+=   +.  :=
  +#=::::::::=%=. -+:    =+   *:      -----------------------------------------------------------
 .*=-=*=..    :=+*+:      -...--      $ ./mfc.sh [build, run, test, clean, count, packer] --help

Generating syscheck/include/case.fpp.
  INFO: Custom case.fpp file is up to date.

$ cmake --build /storage/coda1/p-sbryngelson3/0/sbryngelson3/MFC-2/build/no-debug_no-gpu_mpi/syscheck --target syscheck
-j 12 --config Release

[100%] Built target syscheck

$ cmake --install /storage/coda1/p-sbryngelson3/0/sbryngelson3/MFC-2/build/no-debug_no-gpu_mpi/syscheck

-- Install configuration: "Release"
-- Installing: /storage/home/hcoda1/6/sbryngelson3/p-sbryngelson3-0/MFC-2/build/install/no-debug_no-gpu_mpi/bin/syscheck
-- Set runtime path of "/storage/home/hcoda1/6/sbryngelson3/p-sbryngelson3-0/MFC-2/build/install/no-debug_no-gpu_mpi/bin/syscheck" to ""
Generating pre_process/include/case.fpp.
  INFO: Custom case.fpp file is up to date.

$ cmake --build /storage/coda1/p-sbryngelson3/0/sbryngelson3/MFC-2/build/no-debug_no-gpu_mpi/pre_process --target
pre_process -j 12 --config Release

-- GLOB mismatch!
-- Enabled IPO / LTO
-- Configuring done
-- Generating done
-- Build files have been written to: /storage/home/hcoda1/6/sbryngelson3/p-sbryngelson3-0/MFC-2/build/no-debug_no-gpu_mpi/pre_process
[  8%] Preprocessing (Fypp) m_variables_conversion.fpp
[  8%] Preprocessing (Fypp) m_constants.fpp
[ 11%] Preprocessing (Fypp) m_check_patches.fpp
[ 11%] Preprocessing (Fypp) m_data_output.fpp
[ 14%] Preprocessing (Fypp) m_derived_types.fpp
[ 17%] Preprocessing (Fypp) m_global_parameters.fpp
[ 26%] Preprocessing (Fypp) m_helper.fpp
[ 26%] Preprocessing (Fypp) m_initial_condition.fpp
[ 26%] Preprocessing (Fypp) m_model.fpp
[ 29%] Preprocessing (Fypp) m_mpi_common.fpp
[ 32%] Preprocessing (Fypp) m_mpi_proxy.fpp
[ 35%] Preprocessing (Fypp) m_patches.fpp
[ 38%] Preprocessing (Fypp) m_start_up.fpp
Scanning dependencies of target pre_process
[ 41%] Building Fortran object CMakeFiles/pre_process.dir/src/pre_process/autogen/m_constants.fpp.f90.o
nvfortran-Warning-CUDA_HOME has been deprecated. Please, use NVHPC_CUDA_HOME instead.
/storage/home/hcoda1/6/sbryngelson3/p-sbryngelson3-0/MFC-2/src/pre_process/autogen/m_constants.fpp.f90:
[ 44%] Building Fortran object CMakeFiles/pre_process.dir/src/pre_process/autogen/m_derived_types.fpp.f90.o
nvfortran-Warning-CUDA_HOME has been deprecated. Please, use NVHPC_CUDA_HOME instead.
/storage/home/hcoda1/6/sbryngelson3/p-sbryngelson3-0/MFC-2/src/pre_process/autogen/m_derived_types.fpp.f90:
[ 47%] Building Fortran object CMakeFiles/pre_process.dir/src/pre_process/autogen/m_global_parameters.fpp.f90.o
nvfortran-Warning-CUDA_HOME has been deprecated. Please, use NVHPC_CUDA_HOME instead.
/storage/home/hcoda1/6/sbryngelson3/p-sbryngelson3-0/MFC-2/src/pre_process/autogen/m_global_parameters.fpp.f90:
[ 52%] Building Fortran object CMakeFiles/pre_process.dir/src/pre_process/autogen/m_mpi_common.fpp.f90.o
[ 52%] Building Fortran object CMakeFiles/pre_process.dir/src/pre_process/autogen/m_helper.fpp.f90.o
nvfortran-Warning-CUDA_HOME has been deprecated. Please, use NVHPC_CUDA_HOME instead.

Notice the GLOB mismatch that occurs for all targets and thus rebuilds their dependencies...

I also see that the build step on Phoenix takes 30 minutes (at least for the CPU build), but about 10 minutes if one grabs a CPU node and builds the code there. This might motivate building MFC in CI on a Phoenix compute node so we can do -j 12.

Update @henryleberre: This should actually be -j 24 on the build and test, the compute nodes are dual-socket 12 core Intel Golds.

Update 2: Using ./mfc.sh test -j 24 -b mpirun -a dispatches 24 jobs to 1 core on a 24 core node. Core 0 is saturated at 100% utilization (per htop) but others are idle. Is this an easy fix?

Copy link
Member

@henryleberre henryleberre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my latest PR, we will build and test on a compute node, by not issuing a build command, only a test command. This also goes around the problem where you end up building multiple times.

I would have to look into this post_process issue but I recall this only being a problem for debug builds with HDF5.

@sbryngelson
Copy link
Member Author

In my latest PR, we will build and test on a compute node, by not issuing a build command, only a test command. This also goes around the problem where you end up building multiple times.

I would have to look into this post_process issue but I recall this only being a problem for debug builds with HDF5.

Thanks! Also (per above) using ./mfc.sh test -j 24 -b mpirun -a dispatches 24 jobs to 1 core on a 24 core node. Core 0 is saturated at 100% utilization (per htop) but others are idle. Is this an easy fix?

@sbryngelson
Copy link
Member Author

In my latest PR, we will build and test on a compute node, by not issuing a build command, only a test command. This also goes around the problem where you end up building multiple times.

I would have to look into this post_process issue but I recall this only being a problem for debug builds with HDF5.

Can we specify the submit partition and other sbatch options?

@henryleberre
Copy link
Member

In my latest PR, we will build and test on a compute node, by not issuing a build command, only a test command. This also goes around the problem where you end up building multiple times.

I would have to look into this post_process issue but I recall this only being a problem for debug builds with HDF5.

Thanks! Also (per above) using ./mfc.sh test -j 24 -b mpirun -a dispatches 24 jobs to 1 core on a 24 core node. Core 0 is saturated at 100% utilization (per htop) but others are idle. Is this an easy fix?

That's interesting. Could you try prepending numactl --all to the mfc.sh invocation?

@henryleberre
Copy link
Member

One issue with the embers queue is that the job can get killed by the submission of another with a higher priority. Do you know if PACE is configured to relaunch the jobs or if there is a way around it?

@sbryngelson
Copy link
Member Author

One issue with the embers queue is that the job can get killed by the submission of another with a higher priority. Do you know if PACE is configured to relaunch the jobs or if there is a way around it?

True, but I don't think this has ever happened to us. So, I'm not worried about it for now.

@sbryngelson
Copy link
Member Author

sbryngelson commented Dec 17, 2023

In my latest PR, we will build and test on a compute node, by not issuing a build command, only a test command. This also goes around the problem where you end up building multiple times.

I would have to look into this post_process issue but I recall this only being a problem for debug builds with HDF5.

Thanks! Also (per above) using ./mfc.sh test -j 24 -b mpirun -a dispatches 24 jobs to 1 core on a 24 core node. Core 0 is saturated at 100% utilization (per htop) but others are idle. Is this an easy fix?

That's interesting. Could you try prepending numactl --all to the mfc.sh invocation?

No luck (with numactl --all ./mfc.sh test -j 24 -b mpirun -a) where one notices 100/24 ~ 4%

Screenshot 2023-12-17 at 19 02 36

Screenshot 2023-12-17 at 19 03 26

@sbryngelson
Copy link
Member Author

sbryngelson commented Dec 18, 2023

@henryleberre I noticed that all the running binaries from ./mfc.sh test -j 24 have the same CPU affinity (it's 1) ->

atl1-1-01-006-14-1: 6/sbryngelson3 $ taskset -cp 232235
pid 232235's current affinity list: 1
atl1-1-01-006-14-1: 6/sbryngelson3 $ taskset -cp 232224
pid 232224's current affinity list: 1

@sbryngelson sbryngelson changed the title Update run-phoenix-release-cpu.sh Run Phoenix jobs using available CPU cores Dec 18, 2023
@sbryngelson
Copy link
Member Author

sbryngelson commented Dec 18, 2023

Requesting advice from @henryleberre on the CPU affinity/subprocesses issue since the slowness of the Phoenix CPU runner was the real reason for this PR.

If this will be fixed in PR #257 then I can just merge this PR.

Update: Just to double check, indeed ./mfc.sh test -j 8 runs on 8 cores on my MacBook. So, I suspect this is something to do with the invocation in the Slurm-allocated compute node. I tried invoking srun ./mfc.sh test -j X but this is not doing the right thing.

@henryleberre
Copy link
Member

I discovered that adding --bind-to none to the mpirun invocation fixes it on Phoenix. I'm adding this to #257.

@sbryngelson sbryngelson merged commit 0131901 into master Dec 20, 2023
15 checks passed
@sbryngelson sbryngelson deleted the sbryngelson-patch-1 branch December 20, 2023 08:43
JRChreim pushed a commit to JRChreim/MFC-JRChreim that referenced this pull request Dec 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants