AMD: parse the architecture as supplied by gcnArchName #11244

Haus1 · 2025-01-14T21:58:14Z

The value provided by minor is truncated for AMD so parse the value returned by gcnArchName for an accurate ID.

We can also use the common value for GCN4, gfx800, to avoid missing compatible devices.

This is a follow-up to #11209 and will change the behavior of CDNA3, CDNA, VEGA and GCN4 as they should now be recognized as expected. Of those I only have access to a GCN4 device for testing.

JohannesGaessler · 2025-01-16T11:36:10Z

I don't know at all whether this is the correct way to do it. @IMbackK your input would be appreciated.

IMbackK · 2025-01-16T13:50:00Z

Yes this is more correct, the current code misses the arch step part.
But i have never seen a device report gfx800, rocblas only supports and checks for gfx803, which is reported by fiji and all polaris variants, and the only other variant i am aware of is gfx802, which is not supported by rocblas (or any rocm component).

Thus nak on the change to the gfx8 define.
I will try this pr out on my devices (i have access to gfx803, gfx900, gfx906, gfx908 and gfx1030)

IMbackK · 2025-01-16T14:37:42Z

Theres also a snag in this pr regarding gfx90a, gfx90a reports 9.1 as major minor but its gcnArchName is gfx90a which this pr wont parse correctly, same goes for others like gfx90c.

So the current code is not correct, but this pr has too many issues to serve as an improvement as is.

Haus1 · 2025-01-16T15:08:52Z

It appears this returns the full target ID as defined in https://github.com/ROCm/clr/blob/amd-staging/rocclr/device/device.cpp around line 125. This'll need to be expanded upon in order to parse out xnack status and to handle the addition of generics.

If it were possible to retrieve the version stepping directly that would be preferable to parsing it out of a string. Would the xnack status be of any use here or can that just be ignored?

IMbackK · 2025-01-16T15:23:42Z

xnack can be ignored since we dont use hipMallocManaged allocated memory. Outside of the user recompileing the whole rocm stack with non default flags only gfx942 and gfx90a can end up in xnak+ mode.

Haus1 · 2025-01-16T16:02:39Z

Yeah, they certainly don't make enabling xnack easy. On linux the kernel module also needs patched to prevent it from rejecting the device

Haus1 · 2025-01-17T23:18:31Z

This will now work with all the IDs AMD has in staging and will gracefully fall back to the old way if it fails. Please let me know if I've missed anything.

Would it be better to submit backend changes like this to ggml first?

IMbackK · 2025-01-22T15:06:06Z

This is more correct, but still NAK on the gfx800 change. No gfx800 ever existed, gfx803 has instructions gfx802 dose not have thus we cant simply have all gfx8 devices be gfx800.

LLVM suggests that at some point gfx802 was considered gfx800 but gfx803 was always a distinct isa https://github.com/ROCm/llvm-project/blob/656552edc693e2bb4abc9258399c39d190fce2b3/amd/comgr/src/comgr-metadata.cpp#L469

Its also inconstant, various sub variants do exist that are isa comptable. all gfx103x versions are the same isa and can execute eatch others code. the same goes for gfx900 and 90c, either you also fold all of these, or you unfold gfx8. Atm the is nothing that checks for gfx8 anyhow, any code path took for gfx8 is taken because its < gfx900 so this folding dosent do anything for you anyhow.

I am also rethinking if this approach is the right one to take. Its overly complicated. On rocm the target architecture is allways known at compile time to client code anyhow, thus parsing the isa at runtime only makes sense for the rocm runtime itself.

I thus suggest that instead of this complicated solution we should simply set the cc to the lowest one of the group of gpus in ggml/src/ggml-cuda/vendors/hip.h (ie gcn/cdna/rdna1/2/3) and when we find the need to add an optimzation that requires this to be subdivided, simply do so.

edit: i was forgetting we also use cc in host code here, this pr is indeed the best solution, besides the gfx8 inconsistency.

JohannesGaessler · 2025-01-22T17:00:12Z

I assume the Python script and the C file with the duplicated code were for testing and are intended to be removed prior to merging. If this is done I am willing to approve this PR (after checking on my own machine that it works).

IMbackK · 2025-01-22T17:27:49Z

The python script needs to go somewhere, since we will have to regenerate the table fairly often when amd releases new devices.

I tested this pr on gfx908 and gfx1030 and it worked fine there

JohannesGaessler · 2025-01-22T19:27:47Z

If these files are needed long term, please add short blurbs that explain why they are needed and how to use them.

Haus1 · 2025-01-26T17:39:01Z

Thanks for the feeedback. Are the datasheets still publicly available for these architectures? I double checked against the compiler's feature and product lists while trying to understand the concerns about gfx8, but it only showed that as an issue for gfx9 and seemed fruitless. I'll remove it nevertheless as it can always be revisited later.

The python script isn't necessary and was just a convenience I thought other's may find helpful. AMD violates their own guidelines they include when these values are defined so the only way to verify this functions is to test all +100 variations.

The value provided by minor doesn't include stepping for AMD, parse the value returned by gcnArchName instead to retrieve an accurate ID.

IMbackK

@Haus1 Thank you, this looks good now and works well on my machine (gfx908, gfx1030)

The gfx8 folding will always be unnecessary as we will simply check for <900 just like we have to check < 1100 && >= 1030 to hit all the gfx103x variants, that are all isa identical.

I would be good to have the test case that tests against the generated table and the means to regenerate the table when amd changes it, but this do sent have to land right now.

Therefore i ACK this as is.

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Jan 14, 2025

Haus1 force-pushed the amd-rework-version branch from 7bd1195 to 468296f Compare January 17, 2025 22:47

github-actions bot added script Script related testing Everything test related python python script changes labels Jan 17, 2025

Haus1 force-pushed the amd-rework-version branch from 468296f to 9620bce Compare January 17, 2025 22:54

Haus1 force-pushed the amd-rework-version branch from 9620bce to f77ea24 Compare January 18, 2025 21:05

IMbackK mentioned this pull request Jan 22, 2025

Avoid fp32->fp16->fp32 conversion on cdna in ggml_cuda_op_mul_mat_cublas #11356

Merged

Haus1 force-pushed the amd-rework-version branch 2 times, most recently from c3f41aa to bc59003 Compare January 26, 2025 18:26

AMD: parse the architecture as supplied by gcnArchName

bc59003

The value provided by minor doesn't include stepping for AMD, parse the value returned by gcnArchName instead to retrieve an accurate ID.

IMbackK approved these changes Jan 26, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AMD: parse the architecture as supplied by gcnArchName #11244

AMD: parse the architecture as supplied by gcnArchName #11244

Haus1 commented Jan 14, 2025

JohannesGaessler commented Jan 16, 2025

IMbackK commented Jan 16, 2025 •

edited

Loading

IMbackK commented Jan 16, 2025 •

edited

Loading

Haus1 commented Jan 16, 2025

IMbackK commented Jan 16, 2025

Haus1 commented Jan 16, 2025

Haus1 commented Jan 17, 2025

IMbackK commented Jan 22, 2025 •

edited

Loading

JohannesGaessler commented Jan 22, 2025

IMbackK commented Jan 22, 2025

JohannesGaessler commented Jan 22, 2025

Haus1 commented Jan 26, 2025

IMbackK left a comment

AMD: parse the architecture as supplied by gcnArchName #11244

Are you sure you want to change the base?

AMD: parse the architecture as supplied by gcnArchName #11244

Conversation

Haus1 commented Jan 14, 2025

JohannesGaessler commented Jan 16, 2025

IMbackK commented Jan 16, 2025 • edited Loading

IMbackK commented Jan 16, 2025 • edited Loading

Haus1 commented Jan 16, 2025

IMbackK commented Jan 16, 2025

Haus1 commented Jan 16, 2025

Haus1 commented Jan 17, 2025

IMbackK commented Jan 22, 2025 • edited Loading

JohannesGaessler commented Jan 22, 2025

IMbackK commented Jan 22, 2025

JohannesGaessler commented Jan 22, 2025

Haus1 commented Jan 26, 2025

IMbackK left a comment

Choose a reason for hiding this comment

IMbackK commented Jan 16, 2025 •

edited

Loading

IMbackK commented Jan 16, 2025 •

edited

Loading

IMbackK commented Jan 22, 2025 •

edited

Loading