Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kwta activation + self attention #646

Open
DanTaranis opened this issue Dec 18, 2022 · 1 comment
Open

kwta activation + self attention #646

DanTaranis opened this issue Dec 18, 2022 · 1 comment

Comments

@DanTaranis
Copy link

Hey -
first of all - thank you for your inspiring research.

there's a lot of work around how to make efficient self attention - especially as the sequence length increases.
it seems to me that in assumption of kwta - you could skip the vast majority of calculations due to the inherent extreme sparsity.
and the best part is it would be complementary to many of the linear complexity attention methods that are coming out.

Are you experimenting with something like that?

Regards,
Dan

@DanTaranis
Copy link
Author

fyi - I did a quick poc with cifar 10 + a small ViT trained with and without kwta (90% sparsity) - and the kwta actually worked a bit like a regularization (slightly higher max validation accuracy + slower convergence).

so looks like this definitely has potential. my team /I may look farther into this if you want to collaborate on a paper or something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant