Infinite Shakespear

This is a decoder only transformer, similar to a GPT structure that generates Shakespear-like text.

The Theoretical part of the model follows the decoder structure explained in the famous Attention is All You Need paper.

Other than the basic Multihead Self Attention and Feedforward Structure, also implemented LayerNorm, Residual Connections from Deep Residual Learning for Image Recognition, and Dropout from Dropout: A Simple Way to Prevent Neural Networks from Overfitting

lecture.ipynb follows the instruction by Andrej Karpathy's lecture Let's Build GPT

Final.py is the final Decoder transformer model with 10 million parameters and input.txt as all the shakespear work input.