Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Many tests fail #1015

Open
yurivict opened this issue Sep 3, 2024 · 7 comments
Open

Many tests fail #1015

yurivict opened this issue Sep 3, 2024 · 7 comments

Comments

@yurivict
Copy link
Contributor

yurivict commented Sep 3, 2024

Describe the bug

In the test log there are many failures:

 Running tests/esp_uhf/esp_uhf
  
     cleaning scratch
     copying input and verified output files
     running nwchem (/usr/ports/science/nwchem/work/nwchem-7.2.3-release/src/../bin/LINUX64/nwchem)  with 1 processors
     
     verifying output ... 0:ga_iter_lsolve: dgesv failed:Received an Error in Communication
Abort(0) on node 0 (rank 0 in comm 496): application called MPI_Abort(comm=0x84000000, 0) - process 0
failed
@@@     Comparison of Output Files
@@ -1,2 +1 @@                                                    
 Effective nuclear repulsion energy (a.u.) 107.60 
-Total SCF energy = -476.73491
     
Failed
 Running tests/bsse_tce_mult/bsse_tce_mult

     cleaning scratch
     copying input and verified output files
     running nwchem (/usr/ports/science/nwchem/work/nwchem-7.2.3-release/src/../bin/LINUX64/nwchem)  with 1 processors
     
     verifying output ... 0:tce_diis: LU decomposition failed:Received an Error in Communication
Abort(0) on node 0 (rank 0 in comm 496): application called MPI_Abort(comm=0x84000000, 0) - process 0
failed
@@@     Comparison of Output Files
@@ -1,13 +1,2 @@
 Effective nuclear repulsion energy (a.u.) 20.88
 Total SCF energy = -39.76410
-CCSD total energy / hartree = -39.9580035
-Effective nuclear repulsion energy (a.u.) 0.00
-Effective nuclear repulsion energy (a.u.) 0.00
-Total SCF energy = -39.34734
-CCSD total energy / hartree = -39.5301286
-Total SCF energy = -39.34986
-CCSD total energy / hartree = -39.5342256
-Total SCF energy = -0.49928
-CCSD total energy / hartree = -0.4992784
-Total SCF energy = -0.49931
-CCSD total energy / hartree = -0.4993071
     
Failed
 Running tests/sad_ch3hf/sad_ch3hf

     cleaning scratch
     copying input and verified output files
     running nwchem (/usr/ports/science/nwchem/work/nwchem-7.2.3-release/src/../bin/LINUX64/nwchem)  with 1 processors

     verifying output ... 0:ga_iter_lsolve: dgesv failed:Received an Error in Communication
Abort(0) on node 0 (rank 0 in comm 496): application called MPI_Abort(comm=0x84000000, 0) - process 0
failed
@@@     Comparison of Output Files
@@ -1,10 +1,2 @@
 Effective nuclear repulsion energy (a.u.) 33.31
 Effective nuclear repulsion energy (a.u.) 33.31
-Effective nuclear repulsion energy (a.u.) 33.34
-Effective nuclear repulsion energy (a.u.) 33.60
-Effective nuclear repulsion energy (a.u.) 33.66
-Effective nuclear repulsion energy (a.u.) 33.65
-Effective nuclear repulsion energy (a.u.) 33.66
-Effective nuclear repulsion energy (a.u.) 33.66
-Effective nuclear repulsion energy (a.u.) 33.66
-Effective nuclear repulsion energy (a.u.) 33.66

Failed
 Running tests/pspw_blyp_h2o/pspw_blyp_h2o 
     
     cleaning scratch
     copying input and verified output files
     running nwchem (/usr/ports/science/nwchem/work/nwchem-7.2.3-release/src/../bin/LINUX64/nwchem)  with 1 processors
 
     verifying output ... failed
@@@     Comparison of Output Files
@@ -1,4 +1,4 @@
 Effective nuclear repulsion energy (a.u.) 9.08
-Total PSPW energy : -16.52427
+Total PSPW energy : -8.36881
 Total PSPW energy : -17.09442
 Total PSPW energy : -17.11908
 
Failed 
 Running tests/pspw_pbesol_h2o/pspw_pbesol_h2o

     cleaning scratch
     copying input and verified output files
     running nwchem (/usr/ports/science/nwchem/work/nwchem-7.2.3-release/src/../bin/LINUX64/nwchem)  with 1 processors

     verifying output ... failed
@@@     Comparison of Output Files
@@ -1,10 +1,10 @@
 Effective nuclear repulsion energy (a.u.) 9.08
-Total PSPW energy : -16.55097
-Total PSPW energy : -17.10629
+Total PSPW energy : -8.53817
+Total PSPW energy : -8.53992
 Total PSPW energy : -17.11942
 Total PSPW energy : -17.13072
 Effective nuclear repulsion energy (a.u.) 9.08
 Total PSPW energy : -17.13072
-Total PSPW energy : -17.13106
+Total PSPW energy : -17.13105
 Effective nuclear repulsion energy (a.u.) 9.12
 Total PSPW energy : -17.13272

Failed
 Running tests/pspw_pbesol_h2o/pspw_pbesol_h2o

     cleaning scratch
     copying input and verified output files
     running nwchem (/usr/ports/science/nwchem/work/nwchem-7.2.3-release/src/../bin/LINUX64/nwchem)  with 1 processors

     verifying output ... failed
@@@     Comparison of Output Files
@@ -1,10 +1,10 @@
 Effective nuclear repulsion energy (a.u.) 9.08
-Total PSPW energy : -16.55097
-Total PSPW energy : -17.10629
+Total PSPW energy : -8.53817
+Total PSPW energy : -8.53992
 Total PSPW energy : -17.11942
 Total PSPW energy : -17.13072
 Effective nuclear repulsion energy (a.u.) 9.08
 Total PSPW energy : -17.13072
-Total PSPW energy : -17.13106
+Total PSPW energy : -17.13105
 Effective nuclear repulsion energy (a.u.) 9.12
 Total PSPW energy : -17.13272

Failed

Describe settings used
USE_LIBXC=Y USE_MPI=Y PYTHONVERSION=3.11 NWCHEM_MODULES="all python" F77="gfortran13" F90="gfortran13" FC="gfortran13" FFLAGS="-O -Wl,-rpath=/usr/local/lib/gcc13" F90FLAGS="-O -Wl,-rpath=/usr/local/lib/gcc13" FCFLAGS="-Wl,-rpath=/usr/local/lib/gcc13" PERL_USE_UNSAFE_INC=1 XDG_DATA_HOME=/usr/ports/science/nwchem/work XDG_CONFIG_HOME=/usr/ports/science/nwchem/work XDG_CACHE_HOME=/usr/ports/science/nwchem/work/.cache HOME=/usr/ports/science/nwchem/work PATH=/usr/local/libexec/ccache:/usr/ports/science/nwchem/work/.bin:/home/yuri/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/sbin:/usr/local/bin PKG_CONFIG_LIBDIR=/usr/ports/science/nwchem/work/.pkgconfig:/usr/local/libdata/pkgconfig:/usr/local/share/pkgconfig:/usr/libdata/pkgconfig MK_DEBUG_FILES=no MK_KERNEL_SYMBOLS=no SHELL=/bin/sh NO_LINT=YES ADDR2LINE="/usr/local/bin/addr2line" AR="/usr/local/bin/ar" AS="/usr/local/bin/as" CPPFILT="/usr/local/bin/c++filt" GPROF="/usr/local/bin/gprof" LD="/usr/local/bin/ld" NM="/usr/local/bin/nm" OBJCOPY="/usr/local/bin/objcopy" OBJDUMP="/usr/local/bin/objdump" RANLIB="/usr/local/bin/ranlib" READELF="/usr/local/bin/readelf" SIZE="/usr/local/bin/size" STRINGS="/usr/local/bin/strings" PREFIX=/usr/local LOCALBASE=/usr/local CC="cc" CFLAGS="-O2 -pipe -fstack-protector-strong -fno-strict-aliasing " CPP="cpp" CPPFLAGS="" LDFLAGS=" -Wl,-rpath=/usr/local/lib/gcc13 -L/usr/local/lib/gcc13 -fstack-protector-strong " LIBS="" CXX="c++" CXXFLAGS="-O2 -pipe -fstack-protector-strong -fno-strict-aliasing " CCACHE_DIR="/tmp/.ccache" BSD_INSTALL_PROGRAM="install -s -m 555" BSD_INSTALL_LIB="install -s -m 0644" BSD_INSTALL_SCRIPT="install -m 555" BSD_INSTALL_DATA="install -m 0644" BSD_INSTALL_MAN="install -m 444"

Attach log files
Attach as many log files as possible.

  • stdout/stderr of the NWChem execution
  • complete makefile log
  • $NWCHEM_TOP/src/tools/build/config.log (no such file is present)
  • $NWCHEM_TOP/src/tools/build/comex/config.log (no such file is present)
  • debugging stack

To Reproduce
Run tests.

clang-18
OS: FreeBSD 14.1

@edoapra
Copy link
Collaborator

edoapra commented Sep 3, 2024

Your linear algebra settings are likely to be the culprit.
Did you set BLAS_SIZE in a consistent way between GlobalArrays and NWChem?
I don't see BLAS_SIZE mentioned in this issue.
Please post the autoconf options use of configure Global Arrays, too.

@edoapra
Copy link
Collaborator

edoapra commented Sep 3, 2024

What's the URL associated with nwchemgit-nwchem-v7.2.3-release_GH0.tar.gz on the https://github.com/nwchemgit/nwchem release page?

@yurivict
Copy link
Contributor Author

yurivict commented Sep 3, 2024

BLAS_SIZE is equal to 4 in both cases.

ga is configured with BLAS_SIZE=4 through configure arguments:

--enable-peigs --enable-shared --disable-static --with-scalapack --with-blas4 --prefix=/usr/local ${_LATE_CONFIGURE_ARGS}

nwchem is also configured with BLAS_SIZE=4 through make arguments:

NWCHEM_TOP=/usr/ports/science/nwchem/work/nwchem-7.2.3-release/src/.. NWCHEM_MODULES=all NWCHEM_LONG_PATHS=Y NWCHEM_TARGET=LINUX64 USE_INTERNALBLAS=Y EXTERNAL_GA_PATH=/usr/local USE_64TO32=y BLAS_SIZE=4 DESTDIR=/usr/ports/science/nwchem/work/stage

The 64_to_32 target wasn't run because the GitHub release tarball already has this done.

@yurivict
Copy link
Contributor Author

yurivict commented Sep 3, 2024

@edoapra
Copy link
Collaborator

edoapra commented Sep 3, 2024

The taball URL is: https://codeload.github.com/nwchemgit/nwchem/tar.gz/v7.2.3-release?dummy=/nwchemgit-nwchem-v7.2.3-release_GH0.tar.gz

This is an automatically generated tarball that does NOT have gone through the make 64_to_32 step.
This one is the one you need to fetch

https://github.com/nwchemgit/nwchem/releases/download/v7.2.3-release/nwchem-7.2.3-release.revision-d690e065-src.2024-08-27.tar.bz2

@yurivict
Copy link
Contributor Author

yurivict commented Sep 3, 2024

I see. I will change this.

Does it in general make more sense to use BLAS_SIZE=8 on amd64 systems?

@edoapra
Copy link
Collaborator

edoapra commented Sep 4, 2024

I see. I will change this.

Does it in general make more sense to use BLAS_SIZE=8 on amd64 systems?

It makes sense on all 64-bit architectures so that you can skip the make 64_to_32 step

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants