Skip to content

A comprehensive implementation of the Transformer architecture from the ground up.

Notifications You must be signed in to change notification settings

niyantha23/Transformer-From-Scratch

Repository files navigation

Transformer-From-Scratch

A comprehensive implementation of the Transformer architecture from the ground up, inspired by the seminal "Attention is All You Need" paper by Vaswani et al. This project meticulously constructs each component of the Transformer model—ranging from the multi-head self-attention mechanism to the positional encodings—providing a clear, step-by-step exploration of how these elements come together to form one of the most powerful models in natural language processing.

In addition to the model architecture, this repository also includes fully implemented training and validation loops, allowing you to train the Transformer model on real-world datasets. As a demonstration of its capabilities, the model is applied to the OPUS Books dataset for language translation, showcasing the potential of Transformers in machine translation tasks.

This project is an ideal resource for anyone looking to gain a deeper understanding of Transformers by building and experimenting with the model from scratch.

References: Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is All You Need. Advances in Neural Information Processing Systems, 30.

About

A comprehensive implementation of the Transformer architecture from the ground up.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published