-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test segmentation fault with version 0.3.27 on RISC-V #4719
Comments
Strange as the cpu on the Visionfive2 does not provide rvv support (as far as I know), so the build should be using the "generic" plain C kernels. Which compiler and build options did you use ? |
It's using GCC 13.3.0 with |
Not reproducible with GCC 13.2.0 and default build options (which include |
I've just started a build with EasyBuild and GCC 13.2.0 as well, let's see if that does work. |
I'm guessing it may be either the |
Using GCC 13.2.0 instead of 13.3.0 didn't make a difference, so I'll try changing those flags (the same ones did work fine with version 0.3.24). |
I also did not get a segfault with all the flags that you mentioned (including the -ftree-vectorize) added to the COMMON_OPT line in Makefile.rule) - the only caveat is that I am currently testing with the |
Still not reproducible with 0.3.27 - I think there must be something else peculiar to your build or hardware. (Note that SDSDOT is actually the last test run in sblat1 - is it this or one of the other tests in the test folder that appears to be failing for you ?) |
I've now tried it without EasyBuild, and instead just used the GCC of my OS:
And I compiled OpenBLAS using similar commands as EasyBuild would normally use:
The latter fails with:
|
I've just tried the exact same thing for version 0.3.26, and that does work fine. |
Rerunning my build in the gcc compile farm now - maybe it is the USE_OPENMP=1 that makes the difference, I did not notice that in your initial report. (There have been some OpenMP-related changes in 0.3.27, but on the other hand they should be affecting all platforms) |
It looks like that's indeed causing the failures. I've tried the same thing now with 0.3.27 and |
Reproduced, but gdb's backtraces only lead to the implementation of the CLOSE() function in the Fortran runtime library. (I guess this could still hint at a memory management problem, but sadly that gccfarm machine does not appear to have valgrind installed). The C-only openblas_utest and openblas_utest_ext run without errors. |
Bisecting now, but this will probably take me until tomorrow due to real life. |
Bisected to
not sure yet if that is actually true (RISCV64_GENERIC does not use any of the |
Seems to be the static linking of libgfortran imposed by that PR (in Makefile.riscv64) that is causing the segfault. I do not recall a reason being given for why it must be |
Hi Martin, It was added long time ago, before I involved this project. I guess it was copied from C910V' case. |
Thank you for the quick response - I guess it may have helped with cross-compilation in the early days, but now it is probably best to remove it. (I will try to check the C910V case as soon as I get a rvv-0.71 capable gcc fork to build on my hardware) |
I've tried the fix from #4733, and it does indeed solve the segmentation fault. I'm not sure if it's in any way related to this issue, but I do see quite a lot of failing LAPACK tests:
|
Hmm, that looks seriously weird (although without knowing the magnitude of the individual errors it could still be harmless - as it is meant to be the internal testsuite of the reference implementation, it normally assumes the unoptimized reference BLAS to be used). Lots of deviations for one particular precision does look suspect, I'll see if I can reproduce this in the gcc compile farm. |
gcc12 build only has the single errors in REAL and COMPLEX, now trying gcc13 |
wait, if you are building 0.3.27 the test errors are almost certainly coming from the lapack testsuite bug that has since been fixed by PR #4647 |
Awesome, that solved it:
Thanks a lot for the quick replies and solutions! |
Hi @martin-frbg , We saw a similar issue in AL2 on x86_64 on 3.27: Tried the latest 3.28 still having issues. Using gcc 13.2 (manually compiled with
Thanks. |
@peterzhuamazon not sure your case is related in any way - what hardware is it running/crashing on ? (If in doubt, you can use |
Ignore: actually that is compiled on gcc10. |
Failed at the same place with the skylakeX core x86_64 |
When trying to install OpenBLAS 0.3.27 with EasyBuild on a Starfive VisionFive 2 RISC-V development board, I'm getting:
cc @SebastianAchilles who encountered the same error on a SiFive HiFive Unmatched board.
The text was updated successfully, but these errors were encountered: