Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring: turn the attribute _return_attention_scores into a method argument #20802

Open
apehex opened this issue Jan 23, 2025 · 0 comments
Open
Assignees
Labels
type:feature The user is asking for a new feature.

Comments

@apehex
Copy link

apehex commented Jan 23, 2025

Hey there! 👾

Issue

The class MultiHeadAttention is using a private attribute to decide whether or not to return the attention scores.

This behavior is not apparent from the signature of _compute_attention, because there are no corresponding argument.

So, if a sub-class fails to set _return_attention_scores in its custom call method, then _compute_attention never returns the scores.

For example, the child class CachedMultiHeadAttention from KerasHub uses _compute_attention and receives None as attention scores in all cases.

Additional context

The attribute _return_attention_scores is used in a single method, _compute_attention.

Proposition

IMHO the attribute _return_attention_scores should be removed.

Instead, _compute_attention should have an extra parameter return_attention_scores.
Then the call methods could just forward this value to _compute_attention explicitely.

I'll write a PR shortly if you find this relevant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:feature The user is asking for a new feature.
Projects
None yet
Development

No branches or pull requests

3 participants