-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ginkgo ILU #1684
base: develop
Are you sure you want to change the base?
ginkgo ILU #1684
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some questions and comments, but overall I am excited about this! Thanks @yhmtsai
core/factorization/ilu.cpp
Outdated
local_system_matrix->get_const_row_ptrs()))); | ||
ilu = | ||
gko::experimental::factorization::Lu<ValueType, IndexType>::build() | ||
.with_checked_lookup(true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, would prefer with_safe_lookup
or something like that, personally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this should rather be a new type, LU and ILU(...) have very different use cases. That would also allow us to build a new, improved ILU preconditioner interface based on the Factorization
type instead of the slightly hacky Composition
-like type ILU is currently.
core/factorization/ilu.cpp
Outdated
.with_symbolic_factorization(sparsity) | ||
.on(exec) | ||
->generate(local_system_matrix) | ||
->get_combined(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do get_combined()
only to separate the factors out again later? Could we just get the L and U factors out of the Composition
and skip the L/U initialization stuff for the syncfree case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LU always generates the combined matrix first.
We can call unpack
or rely on the original code here. I do not check there's any difference between the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, LU creates combined matrix, but my question is, why call get_combined
directly (which returns a sparse matrix, right?) when we could store the Factorization
and get the L and U factors from that? I guess there's no reason to expect the performance would be different between the Factorization
method for splitting the factors from a combined matrix vs the kernels called for turning the cuSPARSE output into factors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right. I did it because it reuse the following part and make the condition block smaller.
I can also use early return to avoid a large condition block actually.
I am not big fan of large condition block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To enable this code path, I also add strategy argument to unpack function.
* syncfree is Ginkgo's implementation through the Lu factorization with given | ||
* sparsity. | ||
*/ | ||
enum class factorize_algorithm { sparselib, syncfree }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is specific to incomplete factorizations only (at least for now), would it be better to use ilu_algorithm
or something like that? Since it's only in the factorization
namespace, nothing specific about ILU there.
Also, nit: cuda
and hip
aren't capitalized here, but they are in the comment for the factory parameter. I think we should be consistent: either Cuda
/Hip
(like the Executors) or CUDA
/HIP
(correct names for the actual software).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will also use it to IC, so it will be for incomplete factorize algorithm at least.
maybe incomplete_factorize_algorithm
if the original one is too general?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. I'm mostly thinking about code readability/logicality on the user side. I do think it would be good to specify that this is only for incomplete somehow, though incomplete_factorize_algorithm
is pretty long. What if we had the same enum for both ilu and ic but with different names (ilu_algorithm
and ic_algorithm
)? But we can also go with the longer name. Maybe @upsj has thoughts on this?
@@ -97,6 +97,15 @@ class Lu | |||
* incorrect results or crash. | |||
*/ | |||
bool GKO_FACTORY_PARAMETER_SCALAR(skip_sorting, false); | |||
|
|||
/** | |||
* The symbolic factoization should contains the fill-in information. If |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: suggested minor changes:
"The symbolic factorization contains the fill-in for the matrix. If it does not have full fill-in, as in Ilu, this parameter must be set to true in order to avoid the possibility of hanging or illegal memory accesses during the factorization process. When this is true, the symbolic factorization must still contain the non-zero locations in the original matrix, at minimum."
On a related note, how hard would it be to remove the requirement that the incomplete factorization has to contain all the locations of the original matrix? It would mostly just affect the initialization in Lu, right? The safe lookup could handle the rest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the related note, it should work if the algorithm is okay with ignoring the value from A but missing in LU
auto out_idx = hashmap[hash]; | ||
// linear probing with invalid_index sentinel to avoid infinite loop | ||
while (out_idx >= 0 && local_cols[out_idx] != col) { | ||
hash++; | ||
if (hash >= hashmap_size) { | ||
hash = 0; | ||
} | ||
GKO_ASSERT(hashmap[hash] >= 0); | ||
out_idx = hashmap[hash]; | ||
GKO_ASSERT(hashmap[hash] < row_nnz); | ||
} | ||
const auto out_idx = hashmap[hash]; | ||
// out_idx is either correct or invalid_index, the hashmap sentinel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused here. With this change, lookup_hash_unsafe
is the same as lookup_hash
(except for one extra assertion for unsafe
-- the >=0
-- not sure why?). So...lookup_hash_unsafe
isn't really unsafe anymore? Why do we need any changes to unsafe
at all, when in the factorization, we switch to use the safe lookup if requested? (lookup[col]
rather than lookup.lookup_unsafe(col)
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lookup_hash_unsafe
is mostly equal to lookup_hash
up to one assert as you mentioned.
in my mind. when you call lookup_unsafe
, you will assume it always returns a valid value back without any checks.
when calling lookup
, you will always checks whether it is valid or not.
I keep them because of the extra assert. unsafe should report the index is invalid in debug mode because it may not be checked in the code.
I want to change the unsafe lookup because the infinite loop leads debug difficult and this change does not affect the performance.
BTW, I do not think there's performance difference between safe and unsafe version, but I will keep this behavior first.
Then, we can see whether to use safe-only lookup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Mike, I'm happy to see how simple this was to extend to ILU. Some design and naming suggestions I'd like to see discussed, and maybe we should build a roadmap for the future ILU preconditioner interface.
@@ -31,10 +31,25 @@ GKO_REGISTER_OPERATION(initialize_l, factorization::initialize_l); | |||
|
|||
template <typename ValueType, typename IndexType> | |||
std::unique_ptr<Factorization<ValueType, IndexType>> | |||
Factorization<ValueType, IndexType>::unpack() const | |||
Factorization<ValueType, IndexType>::unpack( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want to improve the strategy type with 2.0 (or earlier), and the strategies have no impact on triangular solvers, so I think we should avoid adding new strategy-related functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think 2.0 happens soon
if we do not want to introduce here, I can only unpack them by original code path not the unpack them.
@nbeams it should be fine, right? the unpacking codes are the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean revert back to before 57c0051? It should be fine, as far as I know. I tested it before this commit, anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
- bitmap: lead wrong answer or segfault - hashmap: infinite loop
…pposite behavior). Co-authored-by: Tobias Ribizel <[email protected]> Co-authored-by: Natalie Beams <[email protected]>
This pr enables the ginkgo's implementation ILU through the current LU implementation.
To enable that, we need to allow LU can ignore the missing fill-in entries.
It is controlled by checked_lookup parameter, and user needs to pass the same sparsityCsr as the original one as the symbolic factor to get ILU.
Moreover, this pr also adds a parameter for algorithm selection. It will forward the information to LU and unpack them directly to return the same structure as the original sparselib.
Note. LU by default gives combined matrix but ILU gives the composition.
The first commit shows that the hashmap lookup leads the hang issue and bitmap lookup leads illegal memory access (incorrect result with -1 index return).
The second commit shows the results are wrong if we do not ignore missing fill-in entries update after fixing infinite loop of hashmap lookup
After this pr, we provide another way to factorize ILU without relying on the sparse library.
TODO