[GPU] set output from variable's memory if kv-cache #27658

sungeunk · 2024-11-21T07:29:35Z

Tickets:

157514

isanghao · 2024-11-22T06:24:46Z

src/plugins/intel_gpu/src/graph/primitive_inst.cpp

@@ -618,6 +618,7 @@ void primitive_inst::realloc_if_needed() {
                    _max_output_layout_count[j] = 0;
                }
            } else {
+                _outputs[0] = variable.get_memory();


I have some questions about this PR:

This is setting _outputs[0] on realloc_if_needed. In case realloc does not happen, are we using proper buffer for _outputs[0]? I'm afraid this should be set before this function.

Does shape_predictor work correctly for multi-ireq scenario? From the code, it seems like that the shape_predictor is placed within the network and it may have issue for multi-ireq scenario as it does not receive ireq information. We may need to take ireq information as an argument for that. Could you comment on it? @sshlyapn

@isanghao I think there shouldn't be any problems with shape_predictor, since each ireq has unique instance of shape_predictor (which is passed to the network after the network is assigned to ireq):

openvino/src/plugins/intel_gpu/include/intel_gpu/plugin/sync_infer_request.hpp

Line 89 in 0f149e3

std::shared_ptr<cldnn::ShapePredictor> m_shape_predictor = nullptr;

oh you already did the work ;) Thanks for confirmation.

@sungeunk Could you please describe the issue in more detail? Does this mean that the KV-cache from one inference request is being used in another request in some cases? If so, then we probably need to adjust scales and zero points memory buffers as well

@sshlyapn Memory of variable-state in two infer-requests are shared at this sample code.minicpm_multi_inferreq.cpp
results of query_state() for llmIR are changed when it calls llmIR2.infer().

@sungeunk , thank you for the details. In that case, it seems we have to update the scales/zero points buffers as well

src/plugins/intel_gpu/src/graph/primitive_inst.cpp

src/plugins/intel_gpu/tests/functional/subgraph_tests/dynamic/kv_cache.cpp

### Tickets: - 157514

set output from variable's memory if kv-cache

d4c1aa5

sungeunk requested review from a team as code owners November 21, 2024 07:29

github-actions bot added the category: GPU OpenVINO GPU plugin label Nov 21, 2024

sungeunk requested review from isanghao, vladimir-paramuzov and yeonbok November 21, 2024 07:35

isanghao reviewed Nov 22, 2024

View reviewed changes

src/plugins/intel_gpu/src/graph/primitive_inst.cpp Show resolved Hide resolved

add a test-case

3efd529

sungeunk force-pushed the 157514_conflicts_inf_reqs branch from dda463c to 3efd529 Compare November 25, 2024 09:39

sungeunk requested a review from isanghao November 25, 2024 09:40

isanghao reviewed Nov 27, 2024

View reviewed changes

src/plugins/intel_gpu/tests/functional/subgraph_tests/dynamic/kv_cache.cpp Outdated Show resolved Hide resolved

src/plugins/intel_gpu/tests/functional/subgraph_tests/dynamic/kv_cache.cpp Outdated Show resolved Hide resolved

sungeunk added 3 commits November 28, 2024 14:18

set outputs for scale/zp

eebe767

update naming

1ebcb33

fixed cpplint issues

9a046c2

sungeunk requested review from isanghao and sshlyapn December 2, 2024 01:24

isanghao approved these changes Dec 2, 2024

View reviewed changes

isanghao added this pull request to the merge queue Dec 2, 2024

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 2, 2024

isanghao added this pull request to the merge queue Dec 2, 2024

github-merge-queue bot pushed a commit that referenced this pull request Dec 2, 2024

[GPU] set output from variable's memory if kv-cache (#27658)

e58a9a7

### Tickets: - 157514

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 2, 2024

isanghao added this pull request to the merge queue Dec 2, 2024

Merged via the queue into openvinotoolkit:master with commit 4a4bfed Dec 2, 2024
155 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GPU] set output from variable's memory if kv-cache #27658

[GPU] set output from variable's memory if kv-cache #27658

sungeunk commented Nov 21, 2024

isanghao Nov 22, 2024

sshlyapn Nov 22, 2024

isanghao Nov 22, 2024

sshlyapn Nov 25, 2024

sungeunk Nov 26, 2024

sshlyapn Nov 27, 2024

[GPU] set output from variable's memory if kv-cache #27658

[GPU] set output from variable's memory if kv-cache #27658

Conversation

sungeunk commented Nov 21, 2024

Tickets:

isanghao Nov 22, 2024

Choose a reason for hiding this comment

sshlyapn Nov 22, 2024

Choose a reason for hiding this comment

isanghao Nov 22, 2024

Choose a reason for hiding this comment

sshlyapn Nov 25, 2024

Choose a reason for hiding this comment

sungeunk Nov 26, 2024

Choose a reason for hiding this comment

sshlyapn Nov 27, 2024

Choose a reason for hiding this comment