Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing information from paper #4

Open
ozppupbg opened this issue Jan 16, 2025 · 3 comments
Open

Missing information from paper #4

ozppupbg opened this issue Jan 16, 2025 · 3 comments

Comments

@ozppupbg
Copy link

Hello,

I don't know if you have done this, but looking through the paper, there are a lot of implementation details, that are missing.
Here is my current list:

  • what are the model dimensions?
  • who many layers (of LMM, MAC, etc.) did they use?
  • In case the memory layer is updated and retrieved at the same time, is the output the retrieval before or after the update?
  • How are the architecture components connected in detail?
  • How is the neural memory initialized?

Did you identify more gaps?
And how did you fill them in in your implementation?

@lucidrains
Copy link
Owner

for your third bulletpoint, I too have questions about this, as it appears in a diagram they retrieve, attend, then store + gate

but I feel like it could make sense to first store then retrieve? I'm not sure but can offer both

@ozppupbg
Copy link
Author

It could depend on, what the retrieved result is used for.
But my intuition for the general use-case as "memory" is that I would be mainly interested in what I already know about the thing I currently have and not, what I have just stored.

@lucidrains
Copy link
Owner

lucidrains commented Jan 16, 2025

Your intuition makes sense. I'll just make sure it can do both ways and allow grad student descent to occur

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants