Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic N-Grams task #12

Open
tristandeleu opened this issue Oct 29, 2015 · 1 comment
Open

Dynamic N-Grams task #12

tristandeleu opened this issue Oct 29, 2015 · 1 comment
Labels

Comments

@tristandeleu
Copy link
Collaborator

Dynamic N-Grams task

I will gather all the progress on the Dynamic N-Grams task in this issue. I will likely update this issue regularly (hopefully), so you may want to unsubscribe from this issue if you don't want to get all the spam.

@tristandeleu
Copy link
Collaborator Author

Context length mismatch

I trained the NTM on the full Dynamic N-Grams task. The training is a lot longer than the previous tasks (maybe is due to the length of the input sequences being longer than usual). Like in the original paper, I trained the NTM on length 200 binary inputs sampled from a 6-grams look-up table ; this look-up table being sampled with a beta(1/2, 1/2) distribution.
ngrams-06-fail
Left: Write weights. Middle: Read weights. Right, from top to bottom: The input sequence, the bayesian optimum as computed in the original paper, the prediction from the NTM

The results are not very good but show an interesting behavior of the NTM. The model actually managed to keep track to some context which upon closer look does not correspond to a 6-gram model but rather to a 4-gram model (with contexts of length 3 instead of 5). This may be due to an initial look-up table that is "degenerated", where for a fixed length 3 context only 1 out of the 4 length 5 contexts have a significant probability. This seems to be confirmed by shorter inputs.
ngrams-03-fail
Here the bayesian optimum is computed on contexts of length 3 instead of 5 and the predictions appear to be more similar.

In this previous example, a analysis of the read weights suggest that certain locations in the memory correspond to given length 3 contexts

Location in the memory Context
~75 011
~22 101
~109 & 110 111
~104 + ~34 110

An issue with this current model is that the NTM does not write anything in memory and only relies on what it reads from the memory. The challenge here is that both heads have to be "in sync" as it has to read and write from the same locations in the memory, which means that both heads have to independently figure out where the contextual information is in the memory. Maybe we can improve that by tying parameters for both heads.

Parameters of the experiment

I used a setup which is similar to the ones used in previous tasks. I only added sign parameters (with linear activation clipped at [-1, 1]) for key (on both heads) and add (on the write head) to allow positive and negative value for elements in memory while maintaining the sparsity provided by the rectify activation function. In other words, I replaced key -> sign * key and add -> sign * add.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant