Missing information from paper #4

ozppupbg · 2025-01-16T16:04:24Z

Hello,

I don't know if you have done this, but looking through the paper, there are a lot of implementation details, that are missing.
Here is my current list:

what are the model dimensions?
who many layers (of LMM, MAC, etc.) did they use?
In case the memory layer is updated and retrieved at the same time, is the output the retrieval before or after the update?
How are the architecture components connected in detail?
How is the neural memory initialized?

Did you identify more gaps?
And how did you fill them in in your implementation?

lucidrains · 2025-01-16T20:04:59Z

for your third bulletpoint, I too have questions about this, as it appears in a diagram they retrieve, attend, then store + gate

but I feel like it could make sense to first store then retrieve? I'm not sure but can offer both

ozppupbg · 2025-01-16T20:14:26Z

It could depend on, what the retrieved result is used for.
But my intuition for the general use-case as "memory" is that I would be mainly interested in what I already know about the thing I currently have and not, what I have just stored.

lucidrains · 2025-01-16T20:18:05Z

Your intuition makes sense. I'll just make sure it can do both ways and allow grad student descent to occur

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing information from paper #4

Missing information from paper #4

ozppupbg commented Jan 16, 2025

lucidrains commented Jan 16, 2025

ozppupbg commented Jan 16, 2025

lucidrains commented Jan 16, 2025 •

edited

Loading

Missing information from paper #4

Missing information from paper #4

Comments

ozppupbg commented Jan 16, 2025

lucidrains commented Jan 16, 2025

ozppupbg commented Jan 16, 2025

lucidrains commented Jan 16, 2025 • edited Loading

lucidrains commented Jan 16, 2025 •

edited

Loading