What is the purpose of positional offset in the rotary positional embedding implementation? #24

ifed-ucsd · 2024-07-01T16:22:12Z

I was looking at the rotational position embedding code path (

recurrent-memory-transformer-pytorch/recurrent_memory_transformer_pytorch/recurrent_memory_transformer.py

Line 414 in 35cd18d

mem_rel_dist = 10000

) and noticed this comment:

rotary embedding - offset main positions by 10000, and keep all memories at position 0

What is the intuition for this design choice? I don't see this detail anywhere in the RMT papers; did I miss something? Do you have a reference that does this kind of offsetting?

Thanks!

ifed-ucsd mentioned this issue Jul 2, 2024

Positional encoding of memory tokens for llama-type models with rope OswaldHe/HMT-pytorch#11

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the purpose of positional offset in the rotary positional embedding implementation? #24

What is the purpose of positional offset in the rotary positional embedding implementation? #24

ifed-ucsd commented Jul 1, 2024

What is the purpose of positional offset in the rotary positional embedding implementation? #24

What is the purpose of positional offset in the rotary positional embedding implementation? #24

Comments

ifed-ucsd commented Jul 1, 2024

rotary embedding - offset main positions by 10000, and keep all memories at position 0