Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a confusing issue #8

Closed
lilhongxy opened this issue Mar 29, 2024 · 8 comments
Closed

a confusing issue #8

lilhongxy opened this issue Mar 29, 2024 · 8 comments

Comments

@lilhongxy
Copy link

cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
ValueError: too many values to unpack (expected 2)

I follow the instructions in the Full inference code,bu then I encounter this issue.
How can I fix this?

@ChenxinAn-fdu
Copy link
Contributor

ChenxinAn-fdu commented Mar 29, 2024

This error is usually caused by calling replace_with_chunkllama() after model.from_pretrained(). Make sure replace_with_chunkllama() is called before initializing the model. If it cannot solve the error, please provide more details.

@lilhongxy
Copy link
Author

from transformers import AutoTokenizer, LlamaTokenizer, LlamaForCausalLM, AutoModelForCausalLM
from chunkllama_attn_replace import replace_with_chunkllama
import torch

replace_with_chunkllama(pretraining_length=4096)

tokenizer = LlamaTokenizer.from_pretrained("path_to_Llama-2-7b-hf", trust_remote_code=True)
model = LlamaForCausalLM.from_pretrained("path_to_Llama-2-7b-hf", trust_remote_code=True, torch_dtype=torch.bfloat16)
inputs = tokenizer("Long...docs\n Q: How to extend the context window of LLMs? ", return_tensors="pt")

output_ids = model.generate(**inputs, max_length=128)[0]
print(tokenizer.decode(output_ids))

I just precisely followde the inference instructions,but the issue remained...

@Mooler0410
Copy link

from transformers import AutoTokenizer, LlamaTokenizer, LlamaForCausalLM, AutoModelForCausalLM
from chunkllama_attn_replace import replace_with_chunkllama
import torch

replace_with_chunkllama(pretraining_length=4096)

tokenizer = LlamaTokenizer.from_pretrained("path_to_Llama-2-7b-hf", trust_remote_code=True)
model = LlamaForCausalLM.from_pretrained("path_to_Llama-2-7b-hf", trust_remote_code=True, torch_dtype=torch.bfloat16)
inputs = tokenizer("Long...docs\n Q: How to extend the context window of LLMs? ", return_tensors="pt")

output_ids = model.generate(**inputs, max_length=128)[0]
print(tokenizer.decode(output_ids))

I just precisely followde the inference instructions,but the issue remained...

Could you please check your transformers version? RoPE api for llama is changed again after 4.38. (Actually, It always changes...from 4.35 to 4.36, to 4.37, to 4.38 ... almost each recent transformers release has a new RoPE implementation for Llama..😓)

@ChenxinAn-fdu
Copy link
Contributor

Hey guys, the code works in my environment. My transformer version is 4.37.2

from transformers import AutoTokenizer, LlamaTokenizer, LlamaForCausalLM, AutoModelForCausalLM
from chunkllama_github import replace_with_chunkllama
import torch

model_path = "path/to/llama2"

replace_with_chunkllama(pretraining_length=4096)

tokenizer = LlamaTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = LlamaForCausalLM.from_pretrained(model_path, trust_remote_code=True, torch_dtype=torch.bfloat16).to("cuda:0")
inputs = tokenizer("Long...docs\n Q: How to extend the context window of LLMs? ", return_tensors="pt").to("cuda:0")

output_ids = model.generate(**inputs, max_length=128)[0]
print(tokenizer.decode(output_ids))

@ChenxinAn-fdu
Copy link
Contributor

Please use Flash Attention for processing longer input:
model = LlamaForCausalLM.from_pretrained(model_path, attn_implementation="flash_attention_2", trust_remote_code=True, torch_dtype=torch.bfloat16).to(device)

@lilhongxy
Copy link
Author

thank you all guys !!!😄
finally get success
It is my torch version that caused the issue, previous version is 2.2.1+cu118

success environment:
torch_version:2.0.1+cu118
transformers_version:4.37.2

@ChenxinAn-fdu
Copy link
Contributor

If there are no further questions or follow-up discussions, I will close this issue shortly. Thank you all for your contributions and participation.

@MarsMeng1994
Copy link

infer is corret, but when finetune, it comes out again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants