Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential Problem in the Naive GEMM #13

Open
MyNewAcc1234 opened this issue Jun 19, 2024 · 0 comments
Open

Potential Problem in the Naive GEMM #13

MyNewAcc1234 opened this issue Jun 19, 2024 · 0 comments

Comments

@MyNewAcc1234
Copy link

Hello,

I think there should be a __syncthreads() before "storeAccum(SC, Accum);". Otherwise, because of the shared memory reuse between A/B and C, one warp may read a position that has been overwritten by other warps.

Although this tile size may not produce a wrong result, I produce inf and nan when I increase the tile size of K dimension from 32 to 64. When the K is large, the synchronization among warps will be significant thus overwriting.

(By the way, I have modified both the tile of K and the function loadSmemA and loadSmemB with 128bit load, and then I have the inf and nan in my result. I check my code many times and then try to add this __syncthreads(). Then I get the right result. So, actually, I'm not sure the inf and nan exactly come from the lack of __syncthreads().)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant