sgemm

The implementation method refer to the maxas.

performance

The test environment： ubuntu18.04, cuda10, 1080ti
The code only supports limited input matrix, not universal adaptation, only for learning. Here is the GFLOP for testing different size matrices

N	cublas	sgemm	sgemm/cublas
512	4451.6069	3587.3280	80%
1024	7856.5241	6640.6945	84%
2048	9409.4447	8769.9500	93%
4096	10180.4288	9708.4873	95%

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
sgemm.cu		sgemm.cu