Releases: dingo-actual/infini-transformer
v0.2.7 - Beta
YaRN has now been implemented.
Additionally, position embedders are no longer implicitly instantiated through keyword arguments to CompressiveMemory
, InfiniTransformer
, or MoDInfiniTransformer
. Now, the classes RoPEEmbeddings
and YaRNEmbeddings
are exposed and can be passed to CompressiveMemory
, InfiniTransformer
, and MoDInfiniTransformer
via the position_embedder
argument.
v0.2.6 - Positional Embeddings
Implemented RoPE embeddings from "RoFormer: Enhanced Transformer with Rotary Position Embedding" by Su et. al. (https://arxiv.org/abs/2104.09864). This is the first step toward the implementation of best practices for positional embeddings: a combination of YaRN (https://arxiv.org/abs/2309.00071) with PoSE (https://arxiv.org/abs/2309.10400).
Note that positional embeddings only affect the SDP attention portion of CompressiveMemory
. The calculations for the recurrent memory-based attention component are carried out along the key/value dimension and therefore are unable to utilize positional information. As such, the utility of adding positional embeddings for a given transformer block will be dependent on it learned mixing parameters (
Using RoPE with either InfiniTransformer
or MoDInfiniTransformer
is as simple as adding positional_embeddings="rope"
when creating either module.