This repository has been archived by the owner on Oct 31, 2023. It is now read-only.
Error in model, scaling only q matrix not qK.T dot product (qk.T/sqrt(dim_per_head))#357
Open
BenoitDalFerro wants to merge 1 commit intofacebookresearch:mainfrom BenoitDalFerro:patch-1
+1-1