You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think there should be a __syncthreads() before "storeAccum(SC, Accum);". Otherwise, because of the shared memory reuse between A/B and C, one warp may read a position that has been overwritten by other warps.
Although this tile size may not produce a wrong result, I produce inf and nan when I increase the tile size of K dimension from 32 to 64. When the K is large, the synchronization among warps will be significant thus overwriting.
(By the way, I have modified both the tile of K and the function loadSmemA and loadSmemB with 128bit load, and then I have the inf and nan in my result. I check my code many times and then try to add this __syncthreads(). Then I get the right result. So, actually, I'm not sure the inf and nan exactly come from the lack of __syncthreads().)
The text was updated successfully, but these errors were encountered:
Hello,
I think there should be a __syncthreads() before "storeAccum(SC, Accum);". Otherwise, because of the shared memory reuse between A/B and C, one warp may read a position that has been overwritten by other warps.
Although this tile size may not produce a wrong result, I produce inf and nan when I increase the tile size of K dimension from 32 to 64. When the K is large, the synchronization among warps will be significant thus overwriting.
(By the way, I have modified both the tile of K and the function loadSmemA and loadSmemB with 128bit load, and then I have the inf and nan in my result. I check my code many times and then try to add this __syncthreads(). Then I get the right result. So, actually, I'm not sure the inf and nan exactly come from the lack of __syncthreads().)
The text was updated successfully, but these errors were encountered: