forked from NVIDIA/cutlass
-
Notifications
You must be signed in to change notification settings - Fork 22
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Implement full feature of copy/gemm for PVC backend (#174)
* Implement full feature of copy/gemm for PVC backend Implement Feature: 1. Implement full features of copy/MMA for PVC backend We don't implement full copy/gemm functions before this commit because the cutlass cute copy/MMA API is not fully compatible with PVC backend. The register layout loaded by PVC subgroup intrinsic doesn't satisfy the cute::gemm requirement which leads to problems including but not limited to: (1) GEMM can only support specific combination of tile sizes and copy traits. GEMM functionality will be wrong if you try to change tile size configuration or copy traits. For example, the case "examples/sycl/pvc/pvc_gemm.cpp" will fail if you change sg_tile_k from 32 to 64. So we must retile the register data layout before cute::gemm. (2) We have to hardcode to change the register layout to satisfy the requirement of cutlass cute APIs. For example the data from “partition_fragment_B” need to be hardcoded. 2. Support different GEMM layout and data type (1) Support different combinations of RowMajor and ColumnMajor for matrix A and B. Refer to test/unit/cute/intel_xe/gemm_data_type.cpp. (2) Add GEMM test case for int8/uint8/fp16/bf16. Refer to test/unit/cute/intel_xe/gemm_layout.cpp. This PR will implement above features and keep performance not dropped. Refine Code 1. Refine layout convention for gemm. For GEMM C = A x B; let A is (m, k, l), B is (n, k, l), C is (m, n, l), hide backend related differences inside implementation of PVC copy traits(copy_traits_xe.hpp), make it easier for upper-level users to write code for Intel Xe GPU according to cutlass usage habits, don’t let user hardcode for Intel Xe GPU. 2. Refine the API "get_pvc_tensor" Before this PR, we mix K-slicing and coordinate tensor together, which make the interface parameters unclear and difficult to understand. actuualy "K-slicing" is for MMA use, while "coordinate tensor" is only for copy, they are two things, we must keep them functionally independent, so we supply a helper function "append_pvc_tensor". * misc refine * Update copy_traits_xe.hpp * use make_coord * rename variable to make semantics clear * enable tf32 gemm and some refactoring * refine code * fix some comments * fix comments * fix comments * fix build error * refine gemm, add retile_MMA API for xe * fix build error * fix flash atten build issue * update * fix flash attention validation issue * update * update * update benchmark configurations * fix xe visit bug * fix prefetch issue * fix validation issue of uint test * Update test/unit/cute/intel_xe/utils.hpp Co-authored-by: Joe Todd <[email protected]> --------- Co-authored-by: Alejandro Acosta <[email protected]> Co-authored-by: Joe Todd <[email protected]>
- Loading branch information
1 parent
d4558aa
commit 0692435
Showing
32 changed files
with
1,469 additions
and
1,540 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.