A comprehensive implementation of the Transformer architecture from the ground up, inspired by the seminal "Attention is All You Need" paper by Vaswani et al. This project meticulously constructs each component of the Transformer model—ranging from the multi-head self-attention mechanism to the positional encodings—providing a clear, step-by-step exploration of how these elements come together to form one of the most powerful models in natural language processing.
In addition to the model architecture, this repository also includes fully implemented training and validation loops, allowing you to train the Transformer model on real-world datasets. As a demonstration of its capabilities, the model is applied to the OPUS Books dataset for language translation, showcasing the potential of Transformers in machine translation tasks.
This project is an ideal resource for anyone looking to gain a deeper understanding of Transformers by building and experimenting with the model from scratch.
References: Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is All You Need. Advances in Neural Information Processing Systems, 30.