-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BLAS] Fix symbol conflicts between backends and reference #251
[BLAS] Fix symbol conflicts between backends and reference #251
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this pull request and the thorough investigation! I have a question about this:
To simulate the situation of conflicting symbols, the commit 634d7d2 right before PR #210 was used.
Why this particular commit? The problem of using reference blas for both baseline and test persists even after #210.
izamax_res = cblas_izamax_p(n, x, incx); | ||
} | ||
return izamax_res; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is already huge, I suggest putting everything above this point in a separate file (maybe named reference_blas_wrappers.hpp
or something similar) and including it in reference_blas_templates.hpp
. I actually wish there was a way to avoid so much wrapping and indirection but probably this is the best we can do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem of using reference blas for both baseline and test persists even after #210.
That's weird. I didn't see the problem in the develop using either gdb or output messages. Could you share what you observed please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This open source product calls oneMKL CPU backend via DPC++ interfaces, which then call cblas symbols. I recall that those symbols were resolving to netlib even after #210. This is what I meant with this comment. But you have investigated this more closely than I did, I might be wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that makes sense, the _64 wrappers were a separate change. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! I have several questions to the changes.
tests/unit_tests/CMakeLists.txt
Outdated
PROPERTIES TEST_PREFIX ${DOMAIN_PREFIX}/RT/ | ||
DISCOVERY_TIMEOUT 30 | ||
) | ||
endif() | ||
|
||
gtest_discover_tests(test_main_${domain}_ct | ||
PROPERTIES BUILD_RPATH ${CMAKE_BINARY_DIR}/lib | ||
PROPERTIES ENVIRONMENT LD_LIBRARY_PATH=${CMAKE_BINARY_DIR}/lib:$ENV{LD_LIBRARY_PATH} | ||
PROPERTIES ENVIRONMENT LD_LIBRARY_PATH=${CMAKE_BINARY_DIR}/lib:${CBLAS_LIB_DIR}:$ENV{LD_LIBRARY_PATH} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add domain
check here if possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in the commit f762e6f.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I have one more question regarding Windows build, we define LD_LIBRARY_PATH here for Netlib libraires but we don't define PATH on Windows, does it make sense to update PATH instead of LD_LIBRARY_PATH in case of Windows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought this is straightforward. I was wrong...
On Windows, if we try to set PATH, which is a semicolon-separated list, in gtest_discover_tests
, for example:
gtest_discover_tests(test_main_${domain}_ct PROPERTIES ENVIRONMENT "PATH=C:\path1;C:\path2;C:\path3")
After CMake's internal processing, it turns out that the actual setting will only be PATH=C:\path1
. This seems to be a known CMake issue.
Do you think we should open an issue to investigate a workaround? Right now, because users have to include Netlib in PATH to build on Windows, it's fine if they run tests directly after building. But they would encounter the "failed to load CBLAS library" error if they run tests in a separate session.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worth adding a note to our README about Netlib required in PATH on Windows. And we also can investigate potential W/A as separated issue. I don't think we need to mix the current issue and the Windows one in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will be tracked in issue #252.
Log of the commit f17ce6c: blas_lnx_log_f17ce6c.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look good to me. Thank you!
Description
When a backend respects CBLAS symbols, instead of comparing the backend against the reference libcblas library, the current testing structure would compare the reference against itself due to the conflicting symbols. This PR resolves this issue by loading reference functions at runtime into a local namespace.
Fixes #204
Tests
Diagnostic messages were inserted in the Netlib
cblas_saxpy
and the oneMKLaxpy
unit test to check when the reference was called. More specifically:We use the _ct test as an example in the following. The results of the _rt test are the same.
Since the conflicts of symbols between the MKL backend and the reference have been fixed in PR #210, the reference function was called appropriately both with and without this PR:
To simulate the situation of conflicting symbols, the commit 634d7d2 right before PR #210 was used. Based on the commit, the diagnostic messages were:
We then applied the changes in this PR to the commit. We can see from the messages that the changes fixed the issue.
Checklist
All Submissions