-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Building OpenBLAS 0.3.24 fails on MacPorts running macOS 13.5.2 #4239
Comments
Is that Apple's clang you used, or something from macports as well ? I see the Fortran part seems to have been compiled with gfortran from GCC12, but I can't seem to find the clang version in your log. (No problems are/were seen in CI with clang14.0.6 and gfortran 12.2 (both from homebrew) under macOS 12) |
I am facing the same issue and was able to reproduce the issue just running
When building with
With that, the segfault is gone. However, the tests fails at a later point in time:
|
@michaellass out of curiosity may you try to test MacPorts Clang-15 and possible Clang-16? |
Sure:
So it looks like it's not an Apple clang issue but a clang 15 issue. I guess I should also test with clang 14 but I first need to install that. |
@michaellass good. MacPorts has clang-17 now (it was added yesterday), if you have time I suggest to check it at it also, and if it fails => seems like a strong indicator that it is regression in clang/llvm which should be backported to upstream as well. After your test I'll handle blacklisting of the bad clang at MacPorts to make life of users easy :) Thanks! |
Update on more versions:
So clang versions 13, 14 and 15 show the error for me. I am a bit surprised that this issue only now popped up as I would think that XCode 14 came with a newer clang than version 12. And it also contradicts the CI that successfully builds with clang 14. With clang 17, I get errors during compilation:
Those files are present though, so I think this may be an issue with clang 17 from macports. |
you would need to edit utest/Makefile and remove all references to test_kernel_regress. The "missing symbol" error is very strange - and btw that test is neither specific to skylakex,nor to avx - it compares DGEMM results to their non-DGEMM counterpart. The similarity to the other failing tests is that it contains calls from C to Fortran. |
No, I have an M1 processor. Probably I'm missing some compilation flags. When building OpenBLAS via macports, that test is not an issue. |
ok, so the NO_AVX=1 in your make commands is entirely spurious. unfortunately I do not have M1 locally so I'm limited to whats available or reasonably installable on Cirrus |
I wonder if there is anything that MacPorts does fundamentally differently from homebrew - there are still no errors in the CI job when I update to the "brew" builds of LLVM 16 or 17 and gfortran from GCC13.2 . |
Please note that the segfault occurs with LLVM 13, 14 and 15 and not with 16 and 17. Could you please do a run similar to e15d717 but with llvm-15 and clang-15? |
@michaellass inside MacPorts issue we're discovered that it is seem to be well known issue with Xcode SDK, see: https://trac.macports.org/ticket/68225#comment:23 |
Thanks for this update - and I have realized that Cirrus CI still gives me the same old MacOS Monterey when I request their "Ventura-xcode-latest", so my test have been with xcode <15 all along |
Oh, thanks for the hint. I forgot to push the "CC me" button in macport's trac and missed that development entirely. So right now, this issue looks unrelated to OpenBLAS itself and instead it is just caused by GCC-generated object code being linked with the new XCode 15 linker. |
@martin-frbg base on macports/macports-ports#21354 (comment) I'd like to assume that this issue wasn't fixed on 0.3.25. Cc: @szhorvat |
Can you please tell me if the log for your failed build contains the |
@martin-frbg I suggest to ask @szhorvat because I haven't able to reproduce that issue. |
@martin-frbg I see the following line in my build log multiple times (Ventura 13.6.2, Silicon M2 ARM64, XCode 15, MacPorts GCC 13.2, OpenBlas 0.3.25):
Did Apple Clang 15 deprecate/remove the |
@lepus2589 yes, it is. Soon ld-classic will be removed as well 🥲 |
But for XCode 15, they are officially suggesting it as a workaround (https://developer.apple.com/documentation/xcode-release-notes/xcode-15-release-notes#Linking). Why does my XCode Clang 15 not accept this? |
From their changelogs it appears that they assume the issue is fixed in XCode 15.1 (or one of its betas) - no idea what to make of that. I see that they also mention using |
Here's the logs from the MacPorts install: The flag selecting the classic linker is present. For some reason, tests are run during build (is this normal with the MacPorts setup, @catap ?) It is the tests that fail, not the build itself. Some tests crash, and some hang. Here are the last few lines of output (not present in the above log) when I killed the tests after hanging for a while.
I am not really familiar with the OpenBLAS build process. I did try to build it using cmake, directly from this repo, and choosing the My system:
|
Thanks, that looks as if the build ran as intended but the problem is (still/again) there. Running some tests with the build is normal (unless cross-compiling) and the library is very likely unusable if tests fail.No idea at present apart from trying ld64 instead of ld_classic as mentioned, or trying the other option(s) related to the handling of weak symbols that are mentioned in the release notes for xcode 15.x |
@martin-frbg before I'd like to reproduce an issue on this machine. Anyway, MacPorts builds everything with actual |
Here's the output from running the tests after building OpenBLAS 0.3.25 from the release tarball using CMake. See the first line for how testing was set up. Lots of tests hang. https://gist.github.com/szhorvat/a1f848b239d059d787f56ed00e91f5e1 |
Same thing (sans the segfault, which I think I know why it happens) if I build using gcc 13.2.0 instead of Apple's Clang. |
My log is here - the CirrusCI job runs in a VM that has both XCode and gfortran-13 preinstalled, I suspect "their" gfortran is |
This is the build log, but do the tests hang when you run them? |
The file is truncated in github's web view, I think you should get a "view the full file" link at the top ? All tests pass for me, only difference in the ctest setup is that mine does not have the |
BTW I was able to reproduce it on my machine as well:
|
The log line to build
no |
Do you see it applied during the build of the library ? CCOMMON_OPT should carry over to the Fortran flags eventually, but you could try adding the |
@martin-frbg
|
Thanks. Just noticed myself that the |
@martin-frbg I've started to migrate Portfile to cmake, but haven't figured out how to make |
@catap The segfault in the test gets resolved if I use |
@szhorvat in my test system were enough to add |
Just to confirm, did you see the hang at all? |
@szhorvat yes, I did. I had upgraded my test mini and it had booted (I’m in few thousand kilometers), and was able to reproduce that issue after update |
|
I've tried it on my end and it works as well |
Curiously it does not appear to end up in the gfortran command line when I try it on cirrus (or cirrus is borked and has not picked up that change yet), and consequently the tests still segfault. Did you remember to remove the FLDFLAGS addition again ? |
Let me double check it. I've run:
and the last step was I just repeated all that steps and it works. |
Thank you - then Cirrus CI is somehow not applying the patch although the changes are shown correctly in github. I have seen (or at least suspected) this happen with them before. Annoying as they are the only semi-free provider of CI for the M1... |
@martin-frbg if you create a branch I may prepare an update for |
PR #4328 now |
Cirrus is doing what multiple other CI systems are also doing - taking a PR branch as-is, and not automatically merging the default branch into the PR branch. It's a little annoying to convince it to do the merge: https://github.com/numpy/numpy/blob/18c6157f5da5736575b6d8d492aacca3b5d69551/tools/ci/cirrus_arm.yml#L1-L34 |
@rgommers I'm not sure I understand - surely cloning the default branch and merging the PR would be the standard operation for any CI setup here, and it does work "most of the time" on Cirrus and all the time everywhere else. What Cirrus seems to be doing occasionally is merging an outdated version of the PR in question |
It's not actually, e.g. Circle CI does the same, you have to do the merge in your job if you want it: https://github.com/scipy/scipy/blob/f3cda9db9ad67cc897953fc2d1786edc651c1d4e/.circleci/config.yml#L38-L50. I agree that it is pretty annoying coming from GitHub Actions; I am unsure if that Cirrus/Circle behavior is due to a philosophy on workflows (e.g., "will still run when there are merge conflicts") or just due to a not implemented feature and then backwards compat. |
@martin-frbg seems that this isn't enough. You may read discussion with @szhorvat here macports/macports-ports#21452 ; long story short: |
@martin-frbg and I've found one more issue with Xcode 15. Since 15 it drop support of |
Description
Updating to OpenBLAS 0.3.24 on MacPorts seg faults due to invalid memory reference. Issue reported to MacPorts with response from MacPorts maintainers to escalate upstream to OpenBLAS.
Error message below:
Environment
Apple M1 Pro, macOS 13.5.2 (22G91)
Steps to Reproduce
Upgrade OpenBLAS from 0.3.23_0 to 0.3.24_0
Expected Result
OpenBLAS 0.3.24 is built and installed.
Actual Result
Build hangs, no backtrace is emitted in build log.
openblas-upgrade.log
The text was updated successfully, but these errors were encountered: