Optimize the rotm kernel with RVV intrinsic. #5038

tingboliao · 2024-12-31T02:51:42Z

Based on the scalar implementation of rotm, we optimized it by using RVV 1.0 Intrinsic.
Subsequently, we developed related cases for the functional and performance verifications on K230 and K1.

The performance data are shown as below:

Parameter setting: OPENBLAS_LOOPS = 10000.

K230 [C908, vlen = 128]@1.6GHz:
| Cases | Scalar / MFlops | Optimized RVV / MFlops |
| srotm.goto | 875.57 | 1536.78 |
| drotm.goto | 799.77 | 1408.70 |
K1 [C908, vlen = 256]@1.6GHz:
| Cases | Scalar / MFlops | Optimized RVV / MFlops |
| srotm.goto | 880.02 | 1490.44 |
| drotm.goto | 811.13 | 1541.92 |

In the above data, the bigger value is, the better performance is.

Signed-off-by: tingbo.liao <[email protected]>

martin-frbg · 2024-12-31T22:40:28Z

Thanks - the numbers are very compelling, but I'm not entirely sure having that much architecture-specific code at the interface level is a good idea. At least I don't think we've done this before, and if every architecture ifdef'd their specific intrinsics implementation into it, the file would get unwieldy rather quickly. (Need some time to think about alternatives though - not sure if it's easy to add a kernel mapping for just riscv64 either...)

tingboliao · 2025-01-02T00:57:29Z

Thanks, we will further consider new alternatives, and submit a new Pull Request (PR) later if possible.

tingbo.liao added 2 commits December 31, 2024 10:32

Optimize the rotm kernel with RVV intrinsic.

2afd741

Signed-off-by: tingbo.liao <[email protected]>

Correct the usage conditions of the macro RISCV_SIMD.

c2271f2

Signed-off-by: tingbo.liao <[email protected]>

Merge branch 'OpenMathLib:develop' into dev_rotm_1231

da8af30

tingboliao closed this Jan 2, 2025

tingboliao reopened this Jan 7, 2025

tingboliao closed this Jan 7, 2025

tingboliao mentioned this pull request Jan 7, 2025

Rearranged the rotm kernel to adapt to the architecture. #5053

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize the rotm kernel with RVV intrinsic. #5038

Optimize the rotm kernel with RVV intrinsic. #5038

tingboliao commented Dec 31, 2024

martin-frbg commented Dec 31, 2024

tingboliao commented Jan 2, 2025

Optimize the rotm kernel with RVV intrinsic. #5038

Optimize the rotm kernel with RVV intrinsic. #5038

Conversation

tingboliao commented Dec 31, 2024

martin-frbg commented Dec 31, 2024

tingboliao commented Jan 2, 2025