Introducing Sparsity in the Transformer model (Keras Implementation)

A proof of concept implementation of evolutionary sparsity in the Transformer model architecture.

How To Run:

Sparse Variant of Transformer

Sparse variant architecture, trained on the original data (29.000 samples in training set, 1024 samples in test set)

python3 en2de_main.py sparse origdata

Original Transformer

*Original architecture with a rewritten trainingsloop and using custom transfer-function in order to validate the obtained results *

python3 en2de_main.py originalWithTransfer origdata

Flags:

load_existing_model

Loads the saved model from previous training epochs and continues training this model

Datasets

sets the dataset to be used for the trainings-task

'origdata': Use the WMT 2016 German-to-English dataset for training
'testdata': Use a very small subset of the original trainings-task

Research papers:

The Transformer original paper:
"Attention is All You Need" (Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, arxiv, 2017)
SET-procedure original paper:
Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science (Decebal Constantin Mocanu, Elena Mocanu, Peter Stone, Phuong H. Nguyen, Madeleine Gibescu & Antonio Liotta)

Code based on / uses parts of:

Transformer implementation in Keras by LSdefine:
The Transformer model in Attention is all you need：a Keras implementation.
Attention is all you need - A Pytorch implementation
Jadore801120/attention-is-all-you-need-pytorch.
Sparsity SET-procedure based on the proof-of-concept code of:
Dr. D.C. Mocanu - TU/e

F.A.Q

The test sys argument gives me error: UnicodeEncodeError: 'ascii' codec can't encode character '\xe4' in position 6: ordinal not in range(128).
Solution: run in terminal: export LC_CTYPE=C.UTF-8

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
data		data
README.md		README.md
dataloader.py		dataloader.py
en2de_main.py		en2de_main.py
helper.py		helper.py
ljqpy.py		ljqpy.py
logger.py		logger.py
rnn_s2s.py		rnn_s2s.py
sparsity.py		sparsity.py
transfer_model.py		transfer_model.py
transformer.py		transformer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introducing Sparsity in the Transformer model (Keras Implementation)

How To Run:

Sparse Variant of Transformer

Original Transformer

Flags:

load_existing_model

Datasets

Research papers:

Code based on / uses parts of:

F.A.Q

About

Releases

Packages

Languages

jgcbrouns/Introducing-Sparsity-in-the-Transformer

Folders and files

Latest commit

History

Repository files navigation

Introducing Sparsity in the Transformer model (Keras Implementation)

How To Run:

Sparse Variant of Transformer

Original Transformer

Flags:

load_existing_model

Datasets

Research papers:

Code based on / uses parts of:

F.A.Q

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages