An implementation of the transformer NLP model architecture
This is an attempt to create the transformer as described in the paper: 'Attention is all you need' - https://arxiv.org/pdf/1706.03762.pdf
This project has been halted as of right now, as my PC struggles to handle this model (possibly due to RAM limitations). This makes it difficult to test or train the model