Skip to content

Commit

Permalink
do_sample=False for NPU in chat_sample, add NPU to README (#1637)
Browse files Browse the repository at this point in the history
- make chat_sample work out of the box on NPU by forcing do_sample=False
for NPU
- add NPU info to text_generation samples README

and a small unrelated change:

- change `pip install` command for exporting models that are already on
huggingface-hub. No need to install all of PyTorch and transformers if
you only need to download a model.
  • Loading branch information
helena-intel authored Jan 28, 2025
1 parent 3b016df commit 4521bb6
Show file tree
Hide file tree
Showing 2 changed files with 24 additions and 2 deletions.
13 changes: 12 additions & 1 deletion samples/cpp/text_generation/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ optimim-cli export openvino --model <model> <output_folder>
```
If a converted model in OpenVINO IR format is already available in the collection of [OpenVINO optimized LLMs](https://huggingface.co/collections/OpenVINO/llm-6687aaa2abca3bbcec71a9bd) on Hugging Face, it can be downloaded directly via huggingface-cli.
```sh
pip install --upgrade-strategy eager -r ../../export-requirements.txt
pip install huggingface-hub
huggingface-cli download <model> --local-dir <output_folder>
```

Expand Down Expand Up @@ -54,6 +54,17 @@ The following template can be used as a default, but it may not work properly wi
"chat_template": "{% for message in messages %}{% if (message['role'] == 'user') %}{{'<|im_start|>user\n' + message['content'] + '<|im_end|>\n<|im_start|>assistant\n'}}{% elif (message['role'] == 'assistant') %}{{message['content'] + '<|im_end|>\n'}}{% endif %}{% endfor %}",
```

#### NPU support

NPU device is supported with some limitations. See [NPU inference of
LLMs](https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide/genai-guide-npu.html) documentation. In particular:

- Models must be exported with symmetric INT4 quantization (`optimum-cli export openvino --weight-format int4 --sym --model <model> <output_folder>`).
For models with more than 4B parameters, channel wise quantization should be used (`--group-size -1`).
- Beam search and parallel sampling are not supported.
- Use OpenVINO 2025.0 or later (installed by deployment-requirements.txt, see "Common information" section), and the latest NPU driver.


### 2. Greedy Causal LM (`greedy_causal_lm`)
- **Description:**
Basic text generation using a causal language model.
Expand Down
13 changes: 12 additions & 1 deletion samples/python/text_generation/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ optimim-cli export openvino --model <model> <output_folder>
```
If a converted model in OpenVINO IR format is already available in the collection of [OpenVINO optimized LLMs](https://huggingface.co/collections/OpenVINO/llm-6687aaa2abca3bbcec71a9bd) on Hugging Face, it can be downloaded directly via huggingface-cli.
```sh
pip install --upgrade-strategy eager -r ../../export-requirements.txt
pip install huggingface-hub
huggingface-cli download <model> --local-dir <output_folder>
```

Expand Down Expand Up @@ -54,6 +54,17 @@ The following template can be used as a default, but it may not work properly wi
"chat_template": "{% for message in messages %}{% if (message['role'] == 'user') %}{{'<|im_start|>user\n' + message['content'] + '<|im_end|>\n<|im_start|>assistant\n'}}{% elif (message['role'] == 'assistant') %}{{message['content'] + '<|im_end|>\n'}}{% endif %}{% endfor %}",
```

#### NPU support

NPU device is supported with some limitations. See [NPU inference of
LLMs](https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide/genai-guide-npu.html) documentation. In particular:

- Models must be exported with symmetric INT4 quantization (`optimum-cli export openvino --weight-format int4 --sym --model <model> <output_folder>`).
For models with more than 4B parameters, channel wise quantization should be used (`--group-size -1`).
- Beam search and parallel sampling are not supported.
- Use OpenVINO 2025.0 or later (installed by deployment-requirements.txt, see "Common information" section), and the latest NPU driver.


### 2. Greedy Causal LM (`greedy_causal_lm`)
- **Description:**
Basic text generation using a causal language model.
Expand Down

0 comments on commit 4521bb6

Please sign in to comment.