-
Notifications
You must be signed in to change notification settings - Fork 150
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
120 changed files
with
12,995 additions
and
1,356 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
Start with the 6 asm_full logic files | ||
|
||
- vega20_Cijk_Ailk_Bjlk_HB.yaml | ||
- vega20_Cijk_Ailk_Bljk_HB.yaml | ||
- vega20_Cijk_Alik_Bljk_HB.yaml | ||
- vega20_Cijk_Ailk_Bjlk_SB.yaml | ||
- vega20_Cijk_Ailk_Bljk_SB.yaml | ||
- vega20_Cijk_Alik_Bljk_SB.yaml | ||
|
||
from | ||
|
||
- rocBLAS commit a85df88648587a0d2880a74c6c57964366ab02a1 for HGEMM | ||
- rocBLAS commit 0ceb1ad64c8bda5473a1e1c3a74ab9ff204acbf8 for SGEMM | ||
|
||
we merge the 6 Resnet50-specific logic files archived in the "logic" directory | ||
into the corresponding asm_full logic files of the same name, resulting in the | ||
6 combined asm_full logic files in | ||
|
||
- rocBLAS commit ea27b3aba339b4fd48795153995d24dd96cd6457 for HGEMM+SGEMM | ||
|
||
The 6 YAML configuration files used to generate the Resnet50-specific logic | ||
files are archived in the "config" directory correspondingly named | ||
|
||
- hgemm_resnet50_nt.yaml | ||
- hgemm_resnet50_nn.yaml | ||
- hgemm_resnet50_tn.yaml | ||
- sgemm_resnet50_nt.yaml | ||
- sgemm_resnet50_nn.yaml | ||
- sgemm_resnet50_tn.yaml | ||
|
||
Note that we explicitly purged the 6 sizes with either n=49 or k=49 from | ||
the Resnet50-specific logic files for HGEMM because they won't be using | ||
the assembly kernels. |
115 changes: 115 additions & 0 deletions
115
Tensile/Configs/miopen/archives/resnet50/config/hgemm_resnet50_nn.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,115 @@ | ||
GlobalParameters: | ||
MinimumRequiredVersion: 4.2.0 | ||
PrintLevel: 1 | ||
ForceRedoBenchmarkProblems: True | ||
ForceRedoLibraryLogic: True | ||
ForceRedoLibraryClient: True | ||
CMakeBuildType: Release | ||
EnqueuesPerSync: 1 | ||
SyncsPerBenchmark: 1 | ||
LibraryPrintDebug: False | ||
NumElementsToValidate: 0 | ||
ValidationMaxToPrint: 4 | ||
ValidationPrintValids: False | ||
ShortNames: False | ||
MergeFiles: True | ||
Platform: 0 | ||
Device: 0 | ||
KernelTime: True | ||
PinClocks: True | ||
SleepPercent: 200 | ||
DataInitTypeBeta : 0 | ||
CodeFromFiles: 1 | ||
SolutionSelectionAlg: 1 | ||
PrintWinnersOnly: 1 | ||
|
||
BenchmarkProblems: | ||
######################################## | ||
# NN - standard | ||
######################################## | ||
- | ||
- # ProblemType | ||
OperationType: GEMM | ||
DataType: h | ||
TransposeA: False | ||
TransposeB: False | ||
UseBeta: True | ||
Batched: True | ||
######################################## | ||
# Explore large number of ~10K half solutions | ||
######################################## | ||
- # Benchmark Group | ||
InitialSolutionParameters: | ||
BenchmarkCommonParameters: | ||
- EdgeType: ["ShiftPtr"] | ||
- LoopTail: [True] | ||
- KernelLanguage: ["Assembly"] | ||
ForkParameters: | ||
- FractionalLoad: [1] | ||
- PrefetchGlobalRead: [ False, True ] | ||
- PrefetchLocalRead: [ False, True] | ||
- ThreadTile: | ||
- [ 4, 4 ] | ||
- [ 8, 4 ] | ||
- [ 8, 8 ] | ||
- [ 16, 8 ] | ||
- [ 8, 16 ] | ||
- [ 16, 16 ] | ||
- WorkGroup: | ||
- [ 16, 8, 2 ] | ||
- [ 16, 4, 4 ] | ||
- [ 16, 8, 1 ] | ||
- [ 8, 32, 1 ] | ||
- [ 16, 16, 1 ] | ||
- [ 32, 8, 1 ] | ||
- GlobalSplitU: [1,3,5] | ||
- WorkGroupMapping: [1,8,64] | ||
- DepthU: [ 8,16,24,32 ] | ||
- VectorWidth: [4,8] | ||
- GlobalReadVectorWidth: [1,8] | ||
- LdsPadB: [0, -1 ] | ||
- AssertSummationElementMultiple: [2] | ||
- AssertFree0ElementMultiple: [2] | ||
BenchmarkForkParameters: | ||
JoinParameters: | ||
BenchmarkJoinParameters: | ||
BenchmarkFinalParameters: | ||
- ProblemSizes: | ||
# Resnet50 NN | ||
- Exact: [ 784 , 128 , 64, 512 ] # beta= 0 | ||
- Exact: [ 784 , 512 , 64, 128 ] # beta= 0 | ||
- Exact: [ 3136 , 64 , 64, 64 ] # beta= 0 | ||
- Exact: [ 3136 , 64 , 64, 256 ] # beta= 0 | ||
- Exact: [ 3136 , 256 , 64, 64 ] # beta= 0 | ||
- Exact: [ 784 , 128 , 128, 512 ] # beta= 0 | ||
- Exact: [ 784 , 512 , 128, 128 ] # beta= 0 | ||
- Exact: [ 3136 , 64 , 128, 64 ] # beta= 0 | ||
- Exact: [ 3136 , 64 , 128, 256 ] # beta= 0 | ||
- Exact: [ 3136 , 256 , 128, 64 ] # beta= 0 | ||
- Exact: [ 3136 , 512 , 1, 2048 ] # beta= 0 | ||
- Exact: [ 3136 , 2048 , 1, 512 ] # beta= 0 | ||
- Exact: [ 12544 , 256 , 1, 1024 ] # beta= 0 | ||
- Exact: [ 12544 , 1024 , 1, 256 ] # beta= 0 | ||
|
||
LibraryLogic: | ||
ScheduleName: "vega20" | ||
DeviceNames: ["Device 66a0", "Device 66a7"] | ||
ArchitectureName: "gfx906" | ||
|
||
# ScheduleName: "vega10" | ||
# DeviceNames: ["Device 6863", "Device 6862", "Device 687f", "Device 6860", "Device 6861", "Vega 10 XTX [Radeon Vega Frontier Edition]"] | ||
# ArchitectureName: "gfx900" | ||
|
||
# ScheduleName: "mi25" | ||
# DeviceNames: ["Device 6860"] | ||
# ArchitectureName: "gfx900" | ||
|
||
# ScheduleName: "r9nano" | ||
# DeviceNames: ["Device 7300"] | ||
# ArchitectureName: "gfx803" | ||
|
||
# ScheduleName: "hip" | ||
# DeviceNames: ["Device 0000"] | ||
# ArchitectureName: "fallback" | ||
|
||
LibraryClient: |
119 changes: 119 additions & 0 deletions
119
Tensile/Configs/miopen/archives/resnet50/config/hgemm_resnet50_nt.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,119 @@ | ||
GlobalParameters: | ||
MinimumRequiredVersion: 4.2.0 | ||
PrintLevel: 1 | ||
ForceRedoBenchmarkProblems: True | ||
ForceRedoLibraryLogic: True | ||
ForceRedoLibraryClient: True | ||
CMakeBuildType: Release | ||
EnqueuesPerSync: 1 | ||
SyncsPerBenchmark: 1 | ||
LibraryPrintDebug: False | ||
NumElementsToValidate: 0 | ||
ValidationMaxToPrint: 4 | ||
ValidationPrintValids: False | ||
ShortNames: False | ||
MergeFiles: True | ||
Platform: 0 | ||
Device: 0 | ||
KernelTime: True | ||
PinClocks: True | ||
SleepPercent: 200 | ||
DataInitTypeBeta : 0 | ||
CodeFromFiles: 1 | ||
SolutionSelectionAlg: 1 | ||
PrintWinnersOnly: 1 | ||
|
||
BenchmarkProblems: | ||
######################################## | ||
# NT - standard | ||
######################################## | ||
- | ||
- # ProblemType | ||
OperationType: GEMM | ||
DataType: h | ||
TransposeA: False | ||
TransposeB: True | ||
UseBeta: True | ||
Batched: True | ||
######################################## | ||
# Explore large number of ~10K half solutions | ||
######################################## | ||
- # Benchmark Group | ||
InitialSolutionParameters: | ||
BenchmarkCommonParameters: | ||
- EdgeType: ["ShiftPtr"] | ||
- LoopTail: [True] | ||
- KernelLanguage: ["Assembly"] | ||
ForkParameters: | ||
- FractionalLoad: [1] | ||
- PrefetchGlobalRead: [ False, True ] | ||
- PrefetchLocalRead: [ False, True] | ||
- ThreadTile: | ||
- [ 4, 4 ] | ||
- [ 8, 4 ] | ||
- [ 8, 8 ] | ||
- [ 16, 8 ] | ||
- [ 8, 16 ] | ||
- [ 16, 16 ] | ||
- WorkGroup: | ||
- [ 16, 8, 2 ] | ||
- [ 16, 4, 4 ] | ||
- [ 16, 8, 1 ] | ||
- [ 8, 32, 1 ] | ||
- [ 16, 16, 1 ] | ||
- [ 32, 8, 1 ] | ||
- GlobalSplitU: [1,3,5] | ||
- WorkGroupMapping: [1,8,64] | ||
- DepthU: [ 8,16,24,32 ] | ||
- VectorWidth: [4,8] | ||
- GlobalReadVectorWidth: [1,8] | ||
- LdsPadB: [0, -1 ] | ||
- AssertSummationElementMultiple: [2] | ||
- AssertFree0ElementMultiple: [2] | ||
BenchmarkForkParameters: | ||
JoinParameters: | ||
BenchmarkJoinParameters: | ||
BenchmarkFinalParameters: | ||
- ProblemSizes: | ||
# Resnet50 NT | ||
- Exact: [ 49 , 512 , 64, 2048 ] # beta= 0 | ||
- Exact: [ 49 , 2048 , 64, 512 ] # beta= 0 | ||
- Exact: [ 196 , 256 , 64, 1024 ] # beta= 0 | ||
- Exact: [ 196 , 1024 , 64, 256 ] # beta= 0 | ||
- Exact: [ 784 , 128 , 64, 512 ] # beta= 0 | ||
- Exact: [ 784 , 512 , 64, 128 ] # beta= 0 | ||
- Exact: [ 3136 , 64 , 64, 64 ] # beta= 0 | ||
- Exact: [ 3136 , 256 , 64, 64 ] # beta= 0 | ||
- Exact: [ 3136 , 64 , 64, 256 ] # beta= 0 | ||
- Exact: [ 49 , 512 , 128, 2048 ] # beta= 0 | ||
- Exact: [ 49 , 2048 , 128, 512 ] # beta= 0 | ||
- Exact: [ 196 , 256 , 128, 1024 ] # beta= 0 | ||
- Exact: [ 196 , 1024 , 128, 256 ] # beta= 0 | ||
- Exact: [ 784 , 128 , 128, 512 ] # beta= 0 | ||
- Exact: [ 784 , 512 , 128, 128 ] # beta= 0 | ||
- Exact: [ 3136 , 64 , 128, 64 ] # beta= 0 | ||
- Exact: [ 3136 , 64 , 128, 256 ] # beta= 0 | ||
- Exact: [ 3136 , 256 , 128, 64 ] # beta= 0 | ||
|
||
LibraryLogic: | ||
ScheduleName: "vega20" | ||
DeviceNames: ["Device 66a0", "Device 66a7"] | ||
ArchitectureName: "gfx906" | ||
|
||
# ScheduleName: "vega10" | ||
# DeviceNames: ["Device 6863", "Device 6862", "Device 687f", "Device 6860", "Device 6861", "Vega 10 XTX [Radeon Vega Frontier Edition]"] | ||
# ArchitectureName: "gfx900" | ||
|
||
# ScheduleName: "mi25" | ||
# DeviceNames: ["Device 6860"] | ||
# ArchitectureName: "gfx900" | ||
|
||
# ScheduleName: "r9nano" | ||
# DeviceNames: ["Device 7300"] | ||
# ArchitectureName: "gfx803" | ||
|
||
# ScheduleName: "hip" | ||
# DeviceNames: ["Device 0000"] | ||
# ArchitectureName: "fallback" | ||
|
||
LibraryClient: |
Oops, something went wrong.