A proof of concept implementation of evolutionary sparsity in the Transformer model architecture.
Sparse variant architecture, trained on the original data (29.000 samples in training set, 1024 samples in test set)
python3 en2de_main.py sparse origdata
*Original architecture with a rewritten trainingsloop and using custom transfer-function in order to validate the obtained results *
python3 en2de_main.py originalWithTransfer origdata
Loads the saved model from previous training epochs and continues training this model
sets the dataset to be used for the trainings-task
- 'origdata': Use the WMT 2016 German-to-English dataset for training
- 'testdata': Use a very small subset of the original trainings-task
- The Transformer original paper:
"Attention is All You Need" (Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, arxiv, 2017) - SET-procedure original paper:
Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science (Decebal Constantin Mocanu, Elena Mocanu, Peter Stone, Phuong H. Nguyen, Madeleine Gibescu & Antonio Liotta)
- Transformer implementation in Keras by LSdefine:
The Transformer model in Attention is all you need:a Keras implementation. - Attention is all you need - A Pytorch implementation
Jadore801120/attention-is-all-you-need-pytorch. - Sparsity SET-procedure based on the proof-of-concept code of:
Dr. D.C. Mocanu - TU/e
- The test sys argument gives me error: UnicodeEncodeError: 'ascii' codec can't encode character '\xe4' in position 6: ordinal not in range(128).
Solution: run in terminal:export LC_CTYPE=C.UTF-8