Fix hybrid quantization reg issue #27687

hyunback · 2024-11-22T00:05:05Z

Details:

Backport of PR [GPU] Fix hybrid quantization reg issue #27404

Tickets:

ticket-id

### Details: - set LPT callbacks to handle compression and avoid constant folding for it (taken from openvinotoolkit#20973) - Allow u8/i8 output data type for compressed onednn FC - Disable Dequantize propagation through Transpose if it's a dependency of SDPA to keep Transpose+SDPA fusion

Many daily int8 models have Perf regression.(incorrect conv data type and bias) Fix kernel selection issue. Signed-off-by: hyunback <[email protected]>

Convolution is expected int8 data type in int8 model, but with mixed weight compressed occurs, it run fp16. Signed-off-by: hyunback <[email protected]>

Currently dynamic quantized int8 onednn convolution has the problem. Working with ref convolution. So replace to run fp16 mode instead. Signed-off-by: hyunback <[email protected]>

Signed-off-by: hyunback <[email protected]>

vladimir-paramuzov and others added 8 commits November 5, 2024 19:47

[GPU] Fix hybrid quantization regression issue.

6ed5c74

Many daily int8 models have Perf regression.(incorrect conv data type and bias) Fix kernel selection issue. Signed-off-by: hyunback <[email protected]>

Fix int8 daily regression models issue.

2e62681

Convolution is expected int8 data type in int8 model, but with mixed weight compressed occurs, it run fp16. Signed-off-by: hyunback <[email protected]>

Disable quantized int8 onednn convolution in dynamic.

f6448ff

Currently dynamic quantized int8 onednn convolution has the problem. Working with ref convolution. So replace to run fp16 mode instead. Signed-off-by: hyunback <[email protected]>

Fix the failure when int4 weight compression pattern in lama3.2

0ec2a87

Signed-off-by: hyunback <[email protected]>

Fix CI func_test failure.

0fc98bb

Signed-off-by: hyunback <[email protected]>

Back to rnn_joint reg status.

a03dc6a

Signed-off-by: hyunback <[email protected]>

Apply the code review comments.

49ad601

Signed-off-by: hyunback <[email protected]>

hyunback added the category: GPU OpenVINO GPU plugin label Nov 22, 2024

hyunback requested review from a team as code owners November 22, 2024 00:05

e-ddykim approved these changes Nov 22, 2024

View reviewed changes

isanghao added this pull request to the merge queue Nov 29, 2024

Merged via the queue into openvinotoolkit:releases/2024/5 with commit 868a568 Nov 29, 2024
285 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix hybrid quantization reg issue #27687

Fix hybrid quantization reg issue #27687

hyunback commented Nov 22, 2024

Fix hybrid quantization reg issue #27687

Fix hybrid quantization reg issue #27687

Conversation

hyunback commented Nov 22, 2024

Details:

Tickets: