-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing EOS? #12
Comments
Hi @Zadagu , I understand the confusion - the order of targets is not their sequential order. Because it is a full token (marked with the comma in Does that make sense? |
Hi @urialon,
or is it just:
|
Hi @Zadagu , Let me separate my answer into two parts: TrainingIf If Beam SearchFor simplicity, let's assume that we're doing beam search with a beam size of 1. But the same idea holds from a beam of any size. During beam search, the model computes the score for each of "idle", "sources", and "idle,sources". Then, we just take the argmax (or top-k if the beam is greater than 1). If the argmax is "idle,sources", beam search can consider this token as "done", without needing to predict EOS. Does that make sense? |
Thank you Uri. This makes sense. |
Hi @urialon, I wondering what are dimensions of |
Hi @Zadagu , There is a minor inaccuracy in the paper, in the function if I hope it helps :-) |
Hi @urialon, thank you for the clearification. No I wonder, what I might have forgotten.
For most of the weight matrices I'm already using a FC-Layer with bias. Are you using a Positional Embedding like in "Attention is all you need" for the child ids? Or is this just another randomly initialized matrix? Kind regargds, |
Hi @Zadagu , I'm not sure why I did not mention in the paper, but I think that the source of discrepancy is as follows: after the LSTM layers, I projected the states using an FC layer (+bias) to a dimension of 512. I think that I did that to help the model distinguishing between the "standard paths" (blue paths in Figure 2) and the "root path" (the orange path in Figure 2). So this adds some weights and increases the size of other tensors. For example, Ci must be Sorry for this inconsistency, let me know how many parameters you are getting now. |
Hi @urialon, I'm very grateful for your reply. Thank you very much for looking into this. After installing the projection layer, i got a total of 21.5M parameters. Kind regargds, |
Hi Zadagu,
Thank you for reporting this.
Yes, I am using the projected paths everywhere, including the copy
mechanism, not only in the transformer.
But I found another possible difference:
Due to GPU memory constraints, in the final model, I reduced the size of
the feedforward layer of the transformer:
Usually, the feedforward layer of the transformer has a hidden size of d X
4, so in your implementation it is probably 512x4.
But since it kept exploding the GPU memory, I had to reduce this to d X 2,
so the hidden size of the feedforward layer is only 1024.
Best,
Uri
…On Thu, Jun 17, 2021 at 3:08 PM Zadagu ***@***.***> wrote:
Hi @urialon <https://github.com/urialon>,
I'm very grateful for your reply. Thank you very much for looking into
this.
After installing the projection layer, i got a total of 21.5M parameters.
Are you using the projected paths embeddings everywhere (e.g. in the copy
mechanism) or just in the transformer?
That could be an explanation for the difference, because I use the bigger
size everywhere.
Kind regargds,
Zadagu
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#12 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADSOXMFTF6G65AAJWEGE473TTHQTBANCNFSM45KTY3MQ>
.
|
Hi @urialon, We are getting closer. Now I got 17.2M Parameters. 🤓 Kind regards, |
Hi Zadagu,
In my implementation, the first dimension is |
Hi,
I'm looking through your training data (the json representation).
I found a instance where tokens are not followed by a EOS node.
Could you please elaborate why there is no EOS after "idle,sources" in this case?
The text was updated successfully, but these errors were encountered: