You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Acording to the paper of transformer , it seems that we can change
x = x + self.sa(self.ln1(x))
x = x + self.ffwd(self.ln2(x))
to
x = self.ln1(x + self.sa(x))
x = self.ln2(x + self.ffwd(x))
Although the result is similar.
The text was updated successfully, but these errors were encountered:
Yes. In his video, he does go over why he's doing this. You can see his explanation here: https://youtu.be/kCc8FmEb1nY?si=VFtUYR-MjtrjR-Lw&t=5722
It's because there has been a "reshuffling" of the structure, as he puts it.
Acording to the paper of transformer , it seems that we can change
x = x + self.sa(self.ln1(x))
x = x + self.ffwd(self.ln2(x))
to
x = self.ln1(x + self.sa(x))
x = self.ln2(x + self.ffwd(x))
Although the result is similar.
The text was updated successfully, but these errors were encountered: