You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
rotary embedding - offset main positions by 10000, and keep all memories at position 0
What is the intuition for this design choice? I don't see this detail anywhere in the RMT papers; did I miss something? Do you have a reference that does this kind of offsetting?
Thanks!
The text was updated successfully, but these errors were encountered:
I was looking at the rotational position embedding code path (
recurrent-memory-transformer-pytorch/recurrent_memory_transformer_pytorch/recurrent_memory_transformer.py
Line 414 in 35cd18d
rotary embedding - offset main positions by 10000, and keep all memories at position 0
What is the intuition for this design choice? I don't see this detail anywhere in the RMT papers; did I miss something? Do you have a reference that does this kind of offsetting?
Thanks!
The text was updated successfully, but these errors were encountered: