Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit the lifetime of the matrix handle to MKLSparse call #52

Merged
merged 6 commits into from
Sep 17, 2024

Conversation

alyst
Copy link
Contributor

@alyst alyst commented Sep 16, 2024

At the moment, the wrapper around the MKLSparse matrix handle, MKLSparseMatrix, is mutable, so the handle is destroyed during GC.
So far, MKLSparseMatrix is created within generic SPBLAS wrapper and not passed outside. The unused handles accumulate until the next GC.
What is problematic is that these MKLSparse handles refer to the SparseMatrixCSC arrays allocated by Julia.
These Julia arrays might be already freed at the time of MKLSparseMatrix GC, which may lead to undefined behavior (and it could be related to #47).
In general, having many unused matrix handles between GC calls may degrade MKLSparse performance.

So this PR converts it to a struct, and the handle must be explicitly destroyed by destroy(mtx) call.
At the moment the matrix handle is created within the "generic" wrapper and destroyed just after the library MKLSparse call.

Also, it adds S sparse matrix storage type parameter: MKLSparseMatrix{S}.
It helps with conversions and extracting data, as MKLSparse does not have API to report the storage type of the matrix handle, it just errors when the handle storage is passed to an incompatible function.

There's also a bit of housekeeping with the generic calls argument names: they are adjusted to match official MKLSparse API,
and since B matrices may also have trans descriptor (transB), the trans-related methods are renamed to avoid confusion.

@alyst alyst force-pushed the alyst/limit_handle_lifetime branch 2 times, most recently from 6c06277 to c7ecd45 Compare September 16, 2024 22:15
Copy link

codecov bot commented Sep 16, 2024

Codecov Report

Attention: Patch coverage is 80.23256% with 17 lines in your changes missing coverage. Please review.

Project coverage is 32.98%. Comparing base (80572fa) to head (c3921b4).
Report is 3 commits behind head on master.

Files with missing lines Patch % Lines
src/deprecated.jl 0.00% 12 Missing ⚠️
src/mklsparsematrix.jl 87.17% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master      #52      +/-   ##
==========================================
+ Coverage   24.86%   32.98%   +8.12%     
==========================================
  Files           7        7              
  Lines        1693     1258     -435     
==========================================
- Hits          421      415       -6     
+ Misses       1272      843     -429     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

that needs a manual destroy() call, so that the lifetime
of the matrix handle is limited to the wrapper call,
and not extended until the GC removes MKLSparseMatrix

add S sparse storage type parameter, so that it is
automatically known by API calls that need it
@alyst alyst force-pushed the alyst/limit_handle_lifetime branch from c7ecd45 to c3921b4 Compare September 16, 2024 22:51
@alyst
Copy link
Contributor Author

alyst commented Sep 16, 2024

Looks like it doesn't fix #47, as the failure in x64-win is again in COO multiplication.

@alyst alyst requested a review from amontoison September 16, 2024 23:23
@amontoison
Copy link
Member

Didn't I implemented a finalizer function to call the associated MKL destoyer routines when a SparseMatrix should be GC?

@alyst
Copy link
Contributor Author

alyst commented Sep 16, 2024

Didn't I implemented a finalizer function to call the associated MKL destoyer routines when a SparseMatrix should be GC?

Exactly, but the finalizer is not called until GC, whereas we don't use MKLSparseMatrix anywhere outside of the "generic" wrapper
(I had to fix the tests for MKL calls counting, because before this PR the finalizer was incrementing the calls counter, but it was never called within the @blas(...) block).
Since there could be quite some time after the wrapper execution until GC, the memory referenced by the handle may become freed/rewritten etc.
Also, I don't know how many of these handles would be waiting for GC, maybe thousands (in intensive workloads) -- that may confuse MKL.

@amontoison
Copy link
Member

amontoison commented Sep 17, 2024

But if you want to perform multiple sparse matrix - vector products like in a Krylov method, you don't want to recreate a MKLSparseMatrix because the user could have called a "preprocess" routine such that future matrix-vector products are faster.

@alyst
Copy link
Contributor Author

alyst commented Sep 17, 2024

But if you want to want to perform multiple sparse matrix - vector products like in a Krylov method, you don't to recreate a MKLSparseMatrix because the user could have called a "preprocess" routine such that future matrix-vector products are faster.

Currently, MKLSparse does not support it anyway -- MKLSparseMatrix is only created within the generic wrapper, there is no support for passing MKLSparseMatrix to linear algebra methods.
The only difference to the current code is that the PR switches to the "manual" management of the handle lifetime.

I agree that eventually MKLSparse should enable the usecases like you mentioned.
But probably it needs more advanced design. I expect it would be hard to implement AbstractSparseMatrix interface for the plain handle wrapper to make it interchangeable with the other sparse matrix implementations.
I was thinking about some wrapper structure that wraps both the SparseMatrixCSC (or whatever Julia sparse type), and the MKLSparse handle (the handle can utilize the Julia sparse storage memory).
That way we can bind the lifetimes of the matrix and the handle and potentially automate some preprocessing ("analysis" and "hint" calls).

@amontoison amontoison merged commit cb0852e into master Sep 17, 2024
10 checks passed
@amontoison amontoison deleted the alyst/limit_handle_lifetime branch September 17, 2024 00:33
@amontoison
Copy link
Member

Thank you @alyst!
For the "preprocess" routines, I only added them in oneAPI.jl.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants