-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AMD: parse the architecture as supplied by gcnArchName #11244
base: master
Are you sure you want to change the base?
Conversation
I don't know at all whether this is the correct way to do it. @IMbackK your input would be appreciated. |
Yes this is more correct, the current code misses the arch step part. Thus nak on the change to the gfx8 define. |
Theres also a snag in this pr regarding gfx90a, gfx90a reports 9.1 as major minor but its gcnArchName is gfx90a which this pr wont parse correctly, same goes for others like gfx90c. So the current code is not correct, but this pr has too many issues to serve as an improvement as is. |
It appears this returns the full target ID as defined in https://github.com/ROCm/clr/blob/amd-staging/rocclr/device/device.cpp around line 125. This'll need to be expanded upon in order to parse out xnack status and to handle the addition of generics. If it were possible to retrieve the version stepping directly that would be preferable to parsing it out of a string. Would the xnack status be of any use here or can that just be ignored? |
xnack can be ignored since we dont use hipMallocManaged allocated memory. Outside of the user recompileing the whole rocm stack with non default flags only gfx942 and gfx90a can end up in xnak+ mode. |
Yeah, they certainly don't make enabling xnack easy. On linux the kernel module also needs patched to prevent it from rejecting the device |
7bd1195
to
468296f
Compare
468296f
to
9620bce
Compare
This will now work with all the IDs AMD has in staging and will gracefully fall back to the old way if it fails. Please let me know if I've missed anything. Would it be better to submit backend changes like this to ggml first? |
9620bce
to
f77ea24
Compare
This is more correct, but still NAK on the gfx800 change. No gfx800 ever existed, gfx803 has instructions gfx802 dose not have thus we cant simply have all gfx8 devices be gfx800. LLVM suggests that at some point gfx802 was considered gfx800 but gfx803 was always a distinct isa https://github.com/ROCm/llvm-project/blob/656552edc693e2bb4abc9258399c39d190fce2b3/amd/comgr/src/comgr-metadata.cpp#L469 Its also inconstant, various sub variants do exist that are isa comptable. all gfx103x versions are the same isa and can execute eatch others code. the same goes for gfx900 and 90c, either you also fold all of these, or you unfold gfx8. Atm the is nothing that checks for gfx8 anyhow, any code path took for gfx8 is taken because its < gfx900 so this folding dosent do anything for you anyhow.
edit: i was forgetting we also use cc in host code here, this pr is indeed the best solution, besides the gfx8 inconsistency. |
I assume the Python script and the C file with the duplicated code were for testing and are intended to be removed prior to merging. If this is done I am willing to approve this PR (after checking on my own machine that it works). |
The python script needs to go somewhere, since we will have to regenerate the table fairly often when amd releases new devices. I tested this pr on gfx908 and gfx1030 and it worked fine there |
If these files are needed long term, please add short blurbs that explain why they are needed and how to use them. |
Thanks for the feeedback. Are the datasheets still publicly available for these architectures? I double checked against the compiler's feature and product lists while trying to understand the concerns about gfx8, but it only showed that as an issue for gfx9 and seemed fruitless. I'll remove it nevertheless as it can always be revisited later. The python script isn't necessary and was just a convenience I thought other's may find helpful. AMD violates their own guidelines they include when these values are defined so the only way to verify this functions is to test all +100 variations. |
c3f41aa
to
bc59003
Compare
The value provided by minor doesn't include stepping for AMD, parse the value returned by gcnArchName instead to retrieve an accurate ID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Haus1 Thank you, this looks good now and works well on my machine (gfx908, gfx1030)
The gfx8 folding will always be unnecessary as we will simply check for <900 just like we have to check < 1100 && >= 1030 to hit all the gfx103x variants, that are all isa identical.
I would be good to have the test case that tests against the generated table and the means to regenerate the table when amd changes it, but this do sent have to land right now.
Therefore i ACK this as is.
The value provided by minor is truncated for AMD so parse the value returned by gcnArchName for an accurate ID.
We can also use the common value for GCN4, gfx800, to avoid missing compatible devices.
This is a follow-up to #11209 and will change the behavior of CDNA3, CDNA, VEGA and GCN4 as they should now be recognized as expected. Of those I only have access to a GCN4 device for testing.