Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ [Update] BoxMatcher matching criteria #125

Merged
merged 4 commits into from
Jan 3, 2025

Conversation

Adamusen
Copy link
Contributor

Added an additional validity criterium in get_valid_matrix, which masks out anchors from targets, that are too large to predict with the given reg_max and stride values.

Implemented a new function: ensure_one_anchor, which adds a single best suited anchor for valid targets without valid anchors. It is a fallback mechanism, which enables too small or too large targets to be trained to be predicted as well, even if not perfectly.

Fixed the filter_duplicate function to use the topk_masked iou_mat for the selection, which previously sometimes matched invalid targets to anchors with duplicates.

Updated docsstrings across the BoxMatcher functions to match the changes.

Implements #124

Added an additional validity criterium in get_valid_matrix, which masks out anchors from targets, that are too large to predict with the given reg_max and stride values.

Implemented a new function: ensure_one_anchor, which adds a single best suited anchor for valid targets without valid anchors. It is a fallback mechanism, which enables too small or too large targets to be trained to be predicted as well, even if not perfectly.

Fixed the filter_duplicate function to use the topk_masked iou_mat for the selection, which previously sometimes matched invalid targets to anchors with duplicates.

Updated docsstrings across the BoxMatcher functions to match the changes.
@Adamusen
Copy link
Contributor Author

Hey @henrytsui000,

I'm going to be testing these changes in the upcoming days. I created a pull request so you can try them out as well if you'd like :)

@henrytsui000
Copy link
Collaborator

Hi,

Thank you for your contribution! I’m currently busy with a rebuttal deadline for ICLR, so my replies might be a bit slow for the time being. Thank you for your understanding!

Best regards,
Henry Tsui

@Adamusen
Copy link
Contributor Author

Hey @henrytsui000,

Resolved the recently emerged merge conflict.
My proposed changes are looking good in my own tests :) I'd like you try them out :)

Best regards,
Adam

@henrytsui000
Copy link
Collaborator

Thanks!

I’ll try this on my local setup later.

By the way, I implemented EMA for single GPU training, and the convergence looks great. However, when training on multiple GPUs, something seems to be missing—the mAP is significantly lower.

Best regards,
Henry Tsui

@Adamusen
Copy link
Contributor Author

Hey @henrytsui000,

About EMA: If it's only broken with EMA enabled and using multi-GPUs, I'd assume the EMA update might happen before the weight updates are aggregated from all the GPUs causing the problem. I'd check the documentation of the lightning module to try to find out where and when aggregation of gradients/updated weight happen to ensure the EMA update only happens after.

About my proposed changes in this PR: I could add 2 small config parameters to the BoxMatcher inside train.yaml, for example:

  • in_reg_max: True/False
  • ensure_one: True/False

which if both set to False would make the BoxMatcher work exactly like the original implementation, so that my proposed changes can be simply turned on/off within the train config without having to commit to make them permanent in the code.
Should I implement this? :)

Best regards,
Adam

to a more efficient solution, without using torch.nn.functional.

torch.nn.functional.one_hot always returns a long tensor, consuming a lot of memory for tensors, which are only used as masks.
@henrytsui000 henrytsui000 merged commit f080104 into WongKinYiu:main Jan 3, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants