Fixed lookup buffer size calculation for Vulkan implementation. #99

AndrewAR2 · 2023-04-20T11:45:18Z

Fixed lookup buffer size calculation for Vulkan implementation.

pigmej · 2023-04-21T15:05:16Z

For me the behaviour is still the same on Linux at least.

[nj@ryzen]~/workspace/post% hexdump -n 129 cpu/postdata_0.bin
0000000 18e6 e3e9 31f3 a10a 3f86 bd9a cddb 040c
0000010 6ba3 d9a5 c796 35c8 2b21 49d2 b856 226d
0000020 ac55 b76f 8c6c d7a2 92a2 0f6b 6302 c515
0000030 96f1 eb6d 39e1 d79a ac01 6242 eb9d 4d36
0000040 e7f6 796d a033 a2ad 08a8 e1aa b7d2 88f0
0000050 0409 94f8 2dd4 1062 2c08 e9d3 649e 38d9
0000060 1b9c 7537 b50a a75d 0085 395a e5d3 a553
0000070 3987 95e1 4801 b2db a060 505d 029b 93e8
0000080 0020
0000081

[nj@ryzen]~/workspace/post% hexdump -n 129 gpu2/postdata_0.bin
0000000 1c21 4efb b9d8 2501 c7df 1e6a b02f a3e9
0000010 2b5c 5832 49fb 07a4 782c f2df f105 8a0b
0000020 2587 a9d4 2175 89d0 a282 b3fa b200 b491
0000030 1976 6202 3cc4 a9ee e385 b1ee e29b ff91
0000040 a882 df0f 2f2d a8ac 06e6 b2fa f931 5a15
0000050 deff f566 aba7 6356 3cf0 2b39 ddc5 2d79
0000060 ac2e 76d0 6613 f340 c54d f3f7 157c 5146
0000070 07fd 7caa 32da 2546 d11c a6e2 cd92 a671
0000080 00ac
0000081

[nj@ryzen]~/workspace/post% hexdump -n 129 gpu2/postdata_0.bin
0000000 624e 32c4 a1d1 04ee cf43 d009 0309 e305
0000010 c62d d0b1 527f 76a0 2a68 35ba 3582 651e
0000020 a1bc 3453 2b76 9003 bd0d f19d 8463 b58b
0000030 2f69 27c6 6654 062e 76b4 4c07 d38b d44c
0000040 af49 23a9 5d19 cb61 cb73 2c62 8a87 c51a
0000050 deff f566 aba7 6356 3cf0 2b39 ddc5 2d79
0000060 e1dd 4f4c b375 2d78 1984 c949 8c9e 5295
0000070 03df f4c4 59ee a0f2 3276 a44d 9b81 5a45
0000080 00df

There is, however recently found (it is working with and without that fix) ONE working Vulkan enabled GPU AMD Radeon Graphics (RADV GFX1036) which is 8GB variant of 6600 XT

poszu

I ran the tests with build/test/gpu-setup-test --test --logs -N 8192 on llvmpipe and it fails:

Test LEAFS: Label size: 128, count 131072, buffer 2.0M
[2023-04-24 11:10:43] Vulkan library LOADED.
[2023-04-24 11:10:43] initCpu() finished.
CPU: 131072 hashes, 609 h/s, status: 0
WARNING: lavapipe is not a conformant vulkan implementation, testing use only.
[2023-04-24 11:14:19] 92% Max Allocation: 1971322880
[2023-04-24 11:14:19] GPU 0: selecting buffer_size of 1880
[2023-04-24 11:14:19] GPU 0: setting thread_concurrency to 7520 based on buffer size 1880 and lookup gap 4
[2023-04-24 11:14:19] 64:128 34362 -> 126892
[2023-04-24 11:14:19] SPIR-V program 64:128 126892 bytes
[2023-04-24 11:14:19] initVulkan() finished. Found llvmpipe (LLVM 12.0.0, 256 bits)
llvmpipe (LLVM 12.0.0, 256 bits): 131072 hashes, 14883 h/s
ZEROS result
WRONG result for label size 128 from provider 0 [llvmpipe (LLVM 12.0.0, 256 bits)]

poszu · 2023-04-24T09:33:54Z

src/vulkan/driver-vulkan.cpp

 	cgpu->lookup_gap = 4;

-	unsigned int bsize = 1024;
-	size_t ipt = (bsize / cgpu->lookup_gap + (bsize % cgpu->lookup_gap > 0));
+	size_t ipt = scrypt_mem / cgpu->lookup_gap;


Could you please document and explain in comments the logic behind calculations of buffers/memory sizes in this function and what variables mean (for example, what "ipt" stands for)?

Scrypt memory size in bytes scrypt_mem = 128 * r * N
Lookup gap reduces memory usage but increases computational complexity. With lookup gap memory usage is
ipt = scrypt_mem / lookup_gap
max concurrent threads = allocated_buffer_size / ipt

poszu · 2023-04-24T09:39:22Z

src/vulkan/driver-vulkan.cpp


 	if (!cgpu->buffer_size) {
-		unsigned int base_alloc = (int)(cgpu->gpu_max_alloc * 88 / 100 / 1024 / 1024 / 8) * 8 * 1024 * 1024;
-		cgpu->thread_concurrency = (uint32_t)(base_alloc / scrypt_mem / ipt);
+		unsigned int base_alloc = (int)(cgpu->gpu_max_alloc * 92 / 100 / 1024 / 1024 / 8) * 8 * 1024 * 1024;


Why use 92% of max memory?

A value of 88% is generally accepted, but this is a legacy of cards with 4G memory or less. 92% is performance improvement and is acceptable as long as there are no memory allocation errors. In fact, for cards with more than 4G memory, a value of 100% is acceptable.

pigmej · 2023-04-24T09:45:36Z

[nj@ryzen]~/workspace/gpu-post% build/test/gpu-setup-test --test --logs -N 8192

Test LEAFS: Label size: 8, count 131072, buffer 0.1M
[2023-04-24 11:38:35] Vulkan library LOADED.
[2023-04-24 11:38:35] initCpu() finished.
CPU: 131072 hashes, 835 h/s, status: 0
[2023-04-24 11:41:12] 92% Max Allocation: 1769996288
[2023-04-24 11:41:12] GPU 0: selecting buffer_size of 1688
[2023-04-24 11:41:12] GPU 0: setting thread_concurrency to 6752 based on buffer size 1688 and lookup gap 4
[2023-04-24 11:41:12] 64:008 28178 -> 102560
[2023-04-24 11:41:12] SPIR-V program 64:8 102560 bytes
[2023-04-24 11:41:12] initVulkan() finished. Found AMD Radeon Graphics (RADV GFX1036)
AMD Radeon Graphics (RADV GFX1036): 131072 hashes, 5871 h/s
ZEROS result
WRONG result for label size 8 from provider 0 [AMD Radeon Graphics (RADV GFX1036)]
[2023-04-24 11:41:34] 92% Max Allocation: 2826960896
[2023-04-24 11:41:34] GPU 1: selecting buffer_size of 2696
[2023-04-24 11:41:34] GPU 1: setting thread_concurrency to 10784 based on buffer size 2696 and lookup gap 4
[2023-04-24 11:41:34] Failure in vkAllocateMemory at 225 /home/nj/workspace/gpu-post/src/vulkan/vulkan-helpers.c  ErrCode=-2

[2023-04-24 11:41:34] 64:008 28178 -> 102560
[2023-04-24 11:41:34] SPIR-V program 64:8 102560 bytes
[2023-04-24 11:41:34] initVulkan() finished. Found Intel(R) Arc(tm) A770 Graphics (DG2)
Intel(R) Arc(tm) A770 Graphics (DG2): 131072 hashes, 33541 h/s
ZEROS result
WRONG result for label size 8 from provider 1 [Intel(R) Arc(tm) A770 Graphics (DG2)]
[2023-04-24 11:41:38] 92% Max Allocation: 1971322880
[2023-04-24 11:41:38] GPU 2: selecting buffer_size of 1880
[2023-04-24 11:41:38] GPU 2: setting thread_concurrency to 7520 based on buffer size 1880 and lookup gap 4
[2023-04-24 11:41:38] 64:008 28178 -> 102560
[2023-04-24 11:41:38] SPIR-V program 64:8 102560 bytes
[2023-04-24 11:41:38] initVulkan() finished. Found llvmpipe (LLVM 16.0.0, 256 bits)
llvmpipe (LLVM 16.0.0, 256 bits): 131072 hashes, 23314 h/s
ZEROS result
WRONG result for label size 8 from provider 2 [llvmpipe (LLVM 16.0.0, 256 bits)]

That's from

[nj@ryzen]~/workspace/gpu-post% build/test/gpu-setup-test --list
Available POST compute providers:
  1: [VULKAN] AMD Radeon Graphics (RADV GFX1036)
  2: [VULKAN] Intel(R) Arc(tm) A770 Graphics (DG2)
  3: [VULKAN] llvmpipe (LLVM 16.0.0, 256 bits)
  4: [CPU] CPU

AndrewAR2 added 2 commits April 20, 2023 14:42

Added -N option to the test app to set the Scrypt N parameter.

f38e322

Fixed lookup buffer size calculation for Vulkan implementation.

d447367

Fix shaders-gen flags for Debug build.

0c1e113

poszu suggested changes Apr 24, 2023

View reviewed changes

AndrewAR2 added 7 commits May 1, 2023 16:40

Fix wrong thread concurrency value

9e5fdd2

Add more logs

07b449f

Fix error in logs

5e7ac5e

Fix max allocation size

e396d17

Fix CUDA batch size

4ee20a7

Use gpu_memory if gpu_max_alloc == 0

1ad1db1

Fix optimal blocks for CUDA

b268d5d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed lookup buffer size calculation for Vulkan implementation. #99

Fixed lookup buffer size calculation for Vulkan implementation. #99

AndrewAR2 commented Apr 20, 2023

pigmej commented Apr 21, 2023

poszu left a comment

poszu Apr 24, 2023

AndrewAR2 Apr 24, 2023

poszu Apr 24, 2023

AndrewAR2 Apr 24, 2023

pigmej commented Apr 24, 2023

Fixed lookup buffer size calculation for Vulkan implementation. #99

Are you sure you want to change the base?

Fixed lookup buffer size calculation for Vulkan implementation. #99

Conversation

AndrewAR2 commented Apr 20, 2023

pigmej commented Apr 21, 2023

poszu left a comment

Choose a reason for hiding this comment

poszu Apr 24, 2023

Choose a reason for hiding this comment

AndrewAR2 Apr 24, 2023

Choose a reason for hiding this comment

poszu Apr 24, 2023

Choose a reason for hiding this comment

AndrewAR2 Apr 24, 2023

Choose a reason for hiding this comment

pigmej commented Apr 24, 2023