Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MFC_CUDA_CC flag crashes compilation on Phoenix #687

Open
max-Hawkins opened this issue Nov 5, 2024 · 0 comments
Open

MFC_CUDA_CC flag crashes compilation on Phoenix #687

max-Hawkins opened this issue Nov 5, 2024 · 0 comments

Comments

@max-Hawkins
Copy link

Building GPU-enabled MFC on non-compute nodes builds the GPU kernels for many compute capabilities. This causes the build to take very long (>~ 30 minutes). Using MFC_CUDA_CC is one solution to specify a subset of compute capabilities to compile kernels for. This works as expected on Delta but causes the builds to crash on Phoenix with the following error:

[ 25%] Preprocessing (Fypp) syscheck.fpp
[ 50%] Building Fortran object CMakeFiles/syscheck_lib.dir/fypp/syscheck/syscheck.fpp.f90.o
nvfortran-Error-A CUDA toolkit matching the current driver version (0) or a supported older version (11.8) was not installed with this HPC SDK.
gmake[3]: *** [CMakeFiles/syscheck_lib.dir/build.make:80: CMakeFiles/syscheck_lib.dir/fypp/syscheck/syscheck.fpp.f90.o] Error 1
gmake[2]: *** [CMakeFiles/Makefile2:111: CMakeFiles/syscheck_lib.dir/all] Error 2
gmake[1]: *** [CMakeFiles/Makefile2:92: CMakeFiles/syscheck.dir/rule] Error 2
gmake: *** [Makefile:170: syscheck] Error 2

Full error:

[mhawkins60@login-phoenix-rh9-2 MFC]$ ./mfc.sh build --gpu -j 8
mfc: OK > (venv) Entered the Python 3.10.10 virtual environment (>= 3.9).

      .=++*:          -+*+=.        | [email protected] [Linux]
     :+   -*-        ==   =* .      | ------------------------------------------------------
   :*+      ==      ++    .+-       | 
  :*##-.....:*+   .#%+++=--+=:::.   | --jobs 8
  -=-++-======#=--**+++==+*++=::-:. | --mpi --gpu --no-debug --no-gcov --no-unified
 .:++=----------====+*= ==..:%..... | --targets pre_process, simulation, and post_process
  .:-=++++===--==+=-+=   +.  :=     | 
  +#=::::::::=%=. -+:    =+   *:    | ----------------------------------------------------------
 .*=-=*=..    :=+*+:      -...--    | $ ./mfc.sh (build, run, test, clean, count, packer) --help

 Build | syscheck, pre_process, simulation, and post_process | Generic Build

 $ cmake -DMFC_HIPFORT=ON -Wno-dev --no-warn-unused-cli -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH= -DCMAKE_FIND_ROOT_PATH= -DCMAKE_FIND_PACKAGE_REDIRECTS_DIR= -DCMAKE_INSTALL_PREFIX=/storage/home/hcoda1/7/mhawkins60/local/MFC/build/install/hipfort -S /storage/home/hcoda1/7/mhawkins60/local/MFC/toolchain/dependencies -B /storage/home/hcoda1/7/mhawkins60/local/MFC/build/staging/hipfort

Not searching for unused variables given on the command line.
-- The C compiler identification is NVHPC 24.5.0
-- The CXX compiler identification is NVHPC 24.5.0
-- The Fortran compiler identification is NVHPC 24.5.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/local/pace-apps/manual/packages/nvhpc/24.5/Linux_x86_64/24.5/compilers/bin/nvc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/local/pace-apps/manual/packages/nvhpc/24.5/Linux_x86_64/24.5/compilers/bin/nvc++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Detecting Fortran compiler ABI info
-- Detecting Fortran compiler ABI info - done
-- Check for working Fortran compiler: /usr/local/pace-apps/manual/packages/nvhpc/24.5/Linux_x86_64/24.5/compilers/bin/nvfortran - skipped
CMake Warning at CMakeLists.txt:116 (message):
  The Fortran compiler vendor is not Cray so HIPFORT will not be built.


-- Configuring done (3.4s)
-- Generating done (0.0s)
-- Build files have been written to: /storage/home/hcoda1/7/mhawkins60/local/MFC/build/staging/hipfort

 $ cmake --build /storage/home/hcoda1/7/mhawkins60/local/MFC/build/staging/hipfort --target hipfort --parallel 8 --config Release

Built target hipfort

 $ cmake --install /storage/home/hcoda1/7/mhawkins60/local/MFC/build/staging/hipfort

-- Install configuration: "Release"

 Generating case.fpp.
   Writing a (new) custom case.fpp file.
 $ cmake -DMFC_SYSCHECK=ON -Wno-dev --no-warn-unused-cli -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH=/storage/home/hcoda1/7/mhawkins60/local/MFC/build/install/hipfort -DCMAKE_FIND_ROOT_PATH=/storage/home/hcoda1/7/mhawkins60/local/MFC/build/install/hipfort -DCMAKE_FIND_PACKAGE_REDIRECTS_DIR=/storage/home/hcoda1/7/mhawkins60/local/MFC/build/install/hipfort -DCMAKE_INSTALL_PREFIX=/storage/home/hcoda1/7/mhawkins60/local/MFC/build/install/ab29bd3004 -DMFC_MPI=ON -DMFC_OpenACC=ON -DMFC_GCov=OFF -DMFC_Unified=OFF -S /storage/home/hcoda1/7/mhawkins60/local/MFC/ -B /storage/home/hcoda1/7/mhawkins60/local/MFC/build/staging/ab29bd3004

Not searching for unused variables given on the command line.
-- The C compiler identification is NVHPC 24.5.0
-- The CXX compiler identification is NVHPC 24.5.0
-- The Fortran compiler identification is NVHPC 24.5.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/local/pace-apps/manual/packages/nvhpc/24.5/Linux_x86_64/24.5/compilers/bin/nvc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/local/pace-apps/manual/packages/nvhpc/24.5/Linux_x86_64/24.5/compilers/bin/nvc++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Detecting Fortran compiler ABI info
-- Detecting Fortran compiler ABI info - done
-- Check for working Fortran compiler: /usr/local/pace-apps/manual/packages/nvhpc/24.5/Linux_x86_64/24.5/compilers/bin/nvfortran - skipped
-- Found $MFC_CUDA_CC specified. GPU code will be generated for 80.
-- Performing Test SUPPORTS_MARCH_NATIVE
-- Performing Test SUPPORTS_MARCH_NATIVE - Success
-- Performing IPO using -Mextract followed by -Minline
-- Found MPI_Fortran: /storage/pace-apps/manual/packages/hpcx/2.19/nvhpc-24.5/hpcx-v2.19-gcc-mlnx_ofed-redhat9-cuda12-x86_64/hpcx-rebuild-nvc/lib/libmpi_usempi_ignore_tkr.so (found version "3.1") 
-- Found MPI: TRUE (found version "3.1") found components: Fortran 
-- Found OpenACC_C: -acc  
-- Found OpenACC_CXX: -acc  
-- Found OpenACC_Fortran: -acc  
CMake Warning at CMakeLists.txt:467 (find_package):
  By not providing "FindcuTENSOR.cmake" in CMAKE_MODULE_PATH this project has
  asked CMake to find a package configuration file provided by "cuTENSOR",
  but CMake did not find one.

  Could not find a package configuration file provided by "cuTENSOR" with any
  of the following names:

    cuTENSORConfig.cmake
    cutensor-config.cmake

  Add the installation prefix of "cuTENSOR" to CMAKE_PREFIX_PATH or set
  "cuTENSOR_DIR" to a directory containing one of the above files.  If
  "cuTENSOR" provides a separate development package or SDK, be sure it has
  been installed.
Call Stack (most recent call first):
  CMakeLists.txt:547 (MFC_SETUP_TARGET)


CMake Warning at CMakeLists.txt:469 (message):
  Failed to locate the NVIDIA cuTENSOR library.  MFC will be built without
  support for it, disallowing the use of cu_tensor=T.  This can result in
  degraded performance.
Call Stack (most recent call first):
  CMakeLists.txt:547 (MFC_SETUP_TARGET)


-- Found CUDAToolkit: /usr/local/pace-apps/spack/packages/linux-rhel9-x86_64_v3/gcc-11.3.1/cuda-12.1.1-ebglvvqo7uhjvhvff2qlsjtjd54louaf/include (found version "12.1.105") 
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  
CMake Warning at CMakeLists.txt:467 (find_package):
  By not providing "FindcuTENSOR.cmake" in CMAKE_MODULE_PATH this project has
  asked CMake to find a package configuration file provided by "cuTENSOR",
  but CMake did not find one.

  Could not find a package configuration file provided by "cuTENSOR" with any
  of the following names:

    cuTENSORConfig.cmake
    cutensor-config.cmake

  Add the installation prefix of "cuTENSOR" to CMAKE_PREFIX_PATH or set
  "cuTENSOR_DIR" to a directory containing one of the above files.  If
  "cuTENSOR" provides a separate development package or SDK, be sure it has
  been installed.
Call Stack (most recent call first):
  CMakeLists.txt:547 (MFC_SETUP_TARGET)


CMake Warning at CMakeLists.txt:469 (message):
  Failed to locate the NVIDIA cuTENSOR library.  MFC will be built without
  support for it, disallowing the use of cu_tensor=T.  This can result in
  degraded performance.
Call Stack (most recent call first):
  CMakeLists.txt:547 (MFC_SETUP_TARGET)


-- Configuring done (29.7s)
-- Generating done (0.1s)
-- Build files have been written to: /storage/home/hcoda1/7/mhawkins60/local/MFC/build/staging/ab29bd3004

 Generating case.fpp.
   Writing a (new) custom case.fpp file.
 $ cmake --build /storage/home/hcoda1/7/mhawkins60/local/MFC/build/staging/ab29bd3004 --target syscheck --parallel 8 --config Release

[ 25%] Preprocessing (Fypp) syscheck.fpp
[ 50%] Building Fortran object CMakeFiles/syscheck_lib.dir/fypp/syscheck/syscheck.fpp.f90.o
nvfortran-Error-A CUDA toolkit matching the current driver version (0) or a supported older version (11.8) was not installed with this HPC SDK.
gmake[3]: *** [CMakeFiles/syscheck_lib.dir/build.make:80: CMakeFiles/syscheck_lib.dir/fypp/syscheck/syscheck.fpp.f90.o] Error 1
gmake[2]: *** [CMakeFiles/Makefile2:111: CMakeFiles/syscheck_lib.dir/all] Error 2
gmake[1]: *** [CMakeFiles/Makefile2:92: CMakeFiles/syscheck.dir/rule] Error 2
gmake: *** [Makefile:170: syscheck] Error 2
 

Error: Failed to build the syscheck target.

Terminated

mfc: ERROR > main.py finished with a 143 exit code.
mfc: (venv) Exiting the Python virtual environment.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant