Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

Error in model, scaling only q matrix not qK.T dot product (qk.T/sqrt(dim_per_head))#357

Open
BenoitDalFerro wants to merge 1 commit intofacebookresearch:mainfrom BenoitDalFerro:patch-1

Commits

Commits on Feb 14, 2023