[TRANSFORMATIONS] SDPAToPagedAttention transformation: support decompression case in the Qwen-7b-Chat pattern #28514

CuriousPanCake · 2025-01-17T11:16:12Z

Qwen-7b-Chat has a decompression if the model is executed in lower
precision resulting into the model having additional Converts (i.e. FP16
to FP32).

Handle this case of optional Convert in PositionIDsReplacerQwen

Added a unit test for it.

Tickets:

CVS-157308

Signed-off-by: Andrii Staikov [email protected]
Signed-off-by: Ivan Tikhonov [email protected]

…ression case in the Qwen-7b-Chat pattern (openvinotoolkit#28493) Qwen-7b-Chat has a decompression if the model is executed in lower precision resulting into the model having additional Converts (i.e. FP16 to FP32). Handle this case of optional Convert in PositionIDsReplacerQwen Added a unit test for it. ### Tickets: - *CVS-157308* Signed-off-by: Andrii Staikov <[email protected]> Signed-off-by: Ivan Tikhonov <[email protected]> --------- Co-authored-by: Ivan Tikhonov <[email protected]>

github-actions bot added the category: transformations OpenVINO Runtime library - Transformations label Jan 17, 2025

itikhono marked this pull request as ready for review January 17, 2025 11:16

itikhono requested a review from a team as a code owner January 17, 2025 11:16

itikhono requested review from itikhono and removed request for a team January 17, 2025 11:16

itikhono approved these changes Jan 17, 2025

View reviewed changes

Merge branch 'releases/2025/0' into qwen_decompression_support

f58159d

itikhono added this to the 2025.0 milestone Jan 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TRANSFORMATIONS] SDPAToPagedAttention transformation: support decompression case in the Qwen-7b-Chat pattern #28514

[TRANSFORMATIONS] SDPAToPagedAttention transformation: support decompression case in the Qwen-7b-Chat pattern #28514

CuriousPanCake commented Jan 17, 2025

[TRANSFORMATIONS] SDPAToPagedAttention transformation: support decompression case in the Qwen-7b-Chat pattern #28514

Are you sure you want to change the base?

[TRANSFORMATIONS] SDPAToPagedAttention transformation: support decompression case in the Qwen-7b-Chat pattern #28514

Conversation

CuriousPanCake commented Jan 17, 2025

Tickets: