Cross Attention variant? #13

Xynonners · 2024-05-02T07:14:32Z

Hi there,

Sorry if this is a stupid issue but I was wondering if it would be possible to apply Ring Attention to Cross Attention? I was thinking of using RingFlashAttentionCUDAFunction directly but it seems like the transformer block itself has modifications.

Thanks

lucidrains · 2024-05-02T13:10:05Z

@Xynonners hey, it is a good question and i'm still thinking about how to approach it

i think it ties together with inference w/ kv caching, so i'll prob get a solution out for both around the same time

can keep this issue open until then

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cross Attention variant? #13

Cross Attention variant? #13

Xynonners commented May 2, 2024

lucidrains commented May 2, 2024

Cross Attention variant? #13

Cross Attention variant? #13

Comments

Xynonners commented May 2, 2024

lucidrains commented May 2, 2024