-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ [Update] BoxMatcher matching criteria #125
Conversation
Added an additional validity criterium in get_valid_matrix, which masks out anchors from targets, that are too large to predict with the given reg_max and stride values. Implemented a new function: ensure_one_anchor, which adds a single best suited anchor for valid targets without valid anchors. It is a fallback mechanism, which enables too small or too large targets to be trained to be predicted as well, even if not perfectly. Fixed the filter_duplicate function to use the topk_masked iou_mat for the selection, which previously sometimes matched invalid targets to anchors with duplicates. Updated docsstrings across the BoxMatcher functions to match the changes.
Hey @henrytsui000, I'm going to be testing these changes in the upcoming days. I created a pull request so you can try them out as well if you'd like :) |
Hi, Thank you for your contribution! I’m currently busy with a rebuttal deadline for ICLR, so my replies might be a bit slow for the time being. Thank you for your understanding! Best regards, |
Hey @henrytsui000, Resolved the recently emerged merge conflict. Best regards, |
Thanks! I’ll try this on my local setup later. By the way, I implemented EMA for single GPU training, and the convergence looks great. However, when training on multiple GPUs, something seems to be missing—the mAP is significantly lower. Best regards, |
Hey @henrytsui000, About EMA: If it's only broken with EMA enabled and using multi-GPUs, I'd assume the EMA update might happen before the weight updates are aggregated from all the GPUs causing the problem. I'd check the documentation of the lightning module to try to find out where and when aggregation of gradients/updated weight happen to ensure the EMA update only happens after. About my proposed changes in this PR: I could add 2 small config parameters to the BoxMatcher inside train.yaml, for example:
which if both set to False would make the BoxMatcher work exactly like the original implementation, so that my proposed changes can be simply turned on/off within the train config without having to commit to make them permanent in the code. Best regards, |
to a more efficient solution, without using torch.nn.functional. torch.nn.functional.one_hot always returns a long tensor, consuming a lot of memory for tensors, which are only used as masks.
Added an additional validity criterium in get_valid_matrix, which masks out anchors from targets, that are too large to predict with the given reg_max and stride values.
Implemented a new function: ensure_one_anchor, which adds a single best suited anchor for valid targets without valid anchors. It is a fallback mechanism, which enables too small or too large targets to be trained to be predicted as well, even if not perfectly.
Fixed the filter_duplicate function to use the topk_masked iou_mat for the selection, which previously sometimes matched invalid targets to anchors with duplicates.
Updated docsstrings across the BoxMatcher functions to match the changes.
Implements #124