[VLM] Set stop token ids from default generation config in VLM pipeline #1612

yatarkan · 2025-01-21T15:22:32Z

In VLM pipeline generate() method has generation_config parameter. If eos_token_id is not set explicitly in this generation config, it is taken from the default m_generation_config and as an effect it is added to the stop_token_ids.
However, if a model (e.g. Qwen2-VL or InternVL2) has multiple eos_token_id in its generation_config.json, stop_token_ids set is not updated as it includes only one eos_token_id.

This issue results in excessive output, e.g. when model (Qwen2-VL w/o instruct) adds system messages after real answer utill max new tokens limit is reached.

yatarkan · 2025-01-21T15:28:25Z

src/cpp/src/visual_language/pipeline.cpp

+        m_inputs_embedder->set_stop_token_ids(generation_config.stop_token_ids);

        auto start_get_inputs_embeds = std::chrono::steady_clock::now();
        ov::Tensor inputs_embeds = m_inputs_embedder->get_inputs_embeds(prompt, rgbs, perf_metrics);


Another approach is to pass generation_config.stop_token_ids to m_inputs_embedder->get_inputs_embeds() and then to InputsEmbedder::get_encoded_input_ids(), but this will require updating all get_inputs_embeds() methods of all inherited InputsEmbedder classes for existing VLM models.

src/cpp/src/generation_config.cpp

src/cpp/src/visual_language/pipeline.cpp

sbalandi · 2025-01-30T11:14:12Z

src/cpp/src/visual_language/inputs_embedder.cpp

@@ -194,8 +199,8 @@ class InputsEmbedder::IInputsEmbedder {
            // so let's check it out, find the trusted part and use it in on the next step
            size_t trusted_history_length = 0;
            if (!m_tokenized_history.empty()) {
-                std::set<int64_t> stop_tokens = {m_tokenizer.get_eos_token_id()};
-                trusted_history_length = ov::genai::utils::get_first_history_difference(prev_chat_tokens, m_tokenized_history, stop_tokens);
+                OPENVINO_ASSERT(!m_stop_token_ids.empty(), "Stop tokens are not set for InputsEmbedder");


Why should it be raised error without stop_token_ids? Generation could run and finished without it

As I can see, we do not have generation config in InputsEmbedder, but we do in VLM pipeline (as well as in other pipelines).
Previously we was relying on single stop token retrieved from tokenizer:

std::set<int64_t> stop_tokens = {m_tokenizer.get_eos_token_id()};

Now, when several stop tokens can present in generation config, we need to ensure that the same stop tokens are used in InputsEmbedder. For this I am using m_stop_token_ids field. This assert checks that we did not forget to call set_stop_token_ids() in VLM pipeline for InputsEmbedder.
Does it make sense or it is still not needed here? @sbalandi @ilya-lavrenov

if I understand right it is possible to set generation config that stop_token_ids is empty , in this case it will fail incorrectly

agree with @sbalandi
according to our tests

openvino.genai/tests/python_tests/test_generation_config.py

Lines 25 to 26 in 38ab055

dict(max_new_tokens=12),

dict(max_length=12),

absence of any stop token is valid scenario.

yatarkan requested review from ilya-lavrenov, Wovchena and sbalandi January 21, 2025 15:22

github-actions bot added category: visual language Visual language pipeline category: sampling Sampling / Decoding algorithms labels Jan 21, 2025

yatarkan commented Jan 21, 2025

View reviewed changes

ilya-lavrenov self-assigned this Jan 21, 2025

ilya-lavrenov added this to the 2025.1 milestone Jan 21, 2025

Wovchena requested changes Jan 22, 2025

View reviewed changes

src/cpp/src/generation_config.cpp Outdated Show resolved Hide resolved

src/cpp/src/visual_language/pipeline.cpp Show resolved Hide resolved

yatarkan force-pushed the yt/stop-token-ids-vlm branch from ef06bb9 to 27a173d Compare January 24, 2025 15:04

github-actions bot removed the category: sampling Sampling / Decoding algorithms label Jan 24, 2025

yatarkan force-pushed the yt/stop-token-ids-vlm branch from 27a173d to 15798f8 Compare January 24, 2025 16:35

yatarkan added 8 commits January 28, 2025 19:26

Set stop token ids from generation config to VLM pipeline

dcf46bf

Remove redundant check

f125b14

Try removing 1 extra token even when there is no last_disappeared_token

9666161

Add size check for tokenized history and prompt ids

fdd251b

Add comment

fa945be

Fix num tokens to remove from kv cache for LLM pipeline

8a93341

Propagate stop token ids from default generation config in LLM pipelines

715cdbd

Propagate stop token ids from default generation config in CB

fedb16f

yatarkan force-pushed the yt/stop-token-ids-vlm branch from 15798f8 to fedb16f Compare January 28, 2025 15:28

github-actions bot added category: continuous batching Continuous batching category: LLM LLM pipeline (stateful, static) no-match-files labels Jan 28, 2025

sbalandi reviewed Jan 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[VLM] Set stop token ids from default generation config in VLM pipeline #1612

[VLM] Set stop token ids from default generation config in VLM pipeline #1612

yatarkan commented Jan 21, 2025

yatarkan Jan 21, 2025

sbalandi Jan 30, 2025

yatarkan Jan 30, 2025 •

edited

Loading

sbalandi Jan 30, 2025 •

edited

Loading

ilya-lavrenov Jan 31, 2025

[VLM] Set stop token ids from default generation config in VLM pipeline #1612

Are you sure you want to change the base?

[VLM] Set stop token ids from default generation config in VLM pipeline #1612

Conversation

yatarkan commented Jan 21, 2025

yatarkan Jan 21, 2025

Choose a reason for hiding this comment

sbalandi Jan 30, 2025

Choose a reason for hiding this comment

yatarkan Jan 30, 2025 • edited Loading

Choose a reason for hiding this comment

sbalandi Jan 30, 2025 • edited Loading

Choose a reason for hiding this comment

ilya-lavrenov Jan 31, 2025

Choose a reason for hiding this comment

yatarkan Jan 30, 2025 •

edited

Loading

sbalandi Jan 30, 2025 •

edited

Loading