fix bug for GaudiMixtralAttentionLongSequence forward #1650

kaixuanliu · 2024-12-20T10:02:44Z

In PT2.5, using row_o.fill_ will throw error 'RuntimeError: fill_ only supports 0-dimension value tensor but got tensor with 4 dimensions.'

kaixuanliu · 2024-12-20T10:03:15Z

@regisss @libinta @mandy-li , pls help review

Signed-off-by: kaixuanliu <[email protected]>

regisss · 2024-12-20T12:34:06Z

@kaixuanliu Can you provide a command to reproduce this error?
I tried

GAUDI2_CI=1 pytest tests/test_text_generation_example.py -v -s -k "test_text_generation_bf16_1x[token0-mistralai/Mixtral-8x7B-v0.1-1-False-23.7931"

but it did pass without any issue with 1.19.

kaixuanliu · 2024-12-22T15:17:19Z

@regisss ，Hi, You can use following cmd line to reproduce it under the path examples/text-generation:
python run_generation.py --model_name_or_path mistralai/Mixtral-8x7B-Instruct-v0.1 --use_kv_cache --use_flash_attention --flash_attention_recompute --max_new_tokens 512 --max_input_tokens 8000 --ignore_eos --bf16

yuanwu2017 · 2024-12-25T05:00:45Z

The patch fixes following error.
python run_generation.py --model_name_or_path mistralai/Mixtral-8x7B-Instruct-v0.1 --use_kv_cache --use_flash_attention --flash_attention_recompute --max_new_tokens 512 --max_input_tokens 8000 --ignore_eos --bf16
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self.call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1847, in call_impl
return inner()
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1793, in inner
result = forward_call(*args, **kwargs)
File "/workspace/optimum-habana/optimum/habana/transformers/models/mixtral/modeling_mixtral.py", line 317, in forward
attn_output = GaudiMixtralAttentionLongSequence.forward(
File "/workspace/optimum-habana/optimum/habana/transformers/models/mixtral/modeling_mixtral.py", line 170, in forward
row_o.fill(FusedSDPA.apply(row_q, k, v, row_mask, 0.0, causal, None))
RuntimeError: fill only supports 0-dimension value tensor but got tensor with 4 dimensions.

kaixuanliu requested a review from regisss as a code owner December 20, 2024 10:02

fix bug

bf24820

Signed-off-by: kaixuanliu <[email protected]>

kaixuanliu changed the title ~~fix bug~~ fix bug for GaudiMixtralAttentionLongSequence forward Dec 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix bug for GaudiMixtralAttentionLongSequence forward #1650

fix bug for GaudiMixtralAttentionLongSequence forward #1650

kaixuanliu commented Dec 20, 2024

kaixuanliu commented Dec 20, 2024

regisss commented Dec 20, 2024

kaixuanliu commented Dec 22, 2024

yuanwu2017 commented Dec 25, 2024 •

edited

Loading

fix bug for GaudiMixtralAttentionLongSequence forward #1650

Are you sure you want to change the base?

fix bug for GaudiMixtralAttentionLongSequence forward #1650

Conversation

kaixuanliu commented Dec 20, 2024

kaixuanliu commented Dec 20, 2024

regisss commented Dec 20, 2024

kaixuanliu commented Dec 22, 2024

yuanwu2017 commented Dec 25, 2024 • edited Loading

yuanwu2017 commented Dec 25, 2024 •

edited

Loading