Skip to content

Commit

Permalink
community: Fixed the procedure of initializing pad_token_id
Browse files Browse the repository at this point in the history
Add to check pad_token_id and eos_token_id of model config.
It seems that this is the same bug as the HuggingFace TGI bug.
In addition, the source code of
libs/partners/huggingface/langchain_huggingface/llms/huggingface_pipeline.py
also requires similar changes.
  • Loading branch information
tishizaki committed Jan 27, 2025
1 parent dbb6b7b commit f32e776
Showing 1 changed file with 10 additions and 1 deletion.
11 changes: 10 additions & 1 deletion libs/community/langchain_community/llms/huggingface_pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -169,7 +169,16 @@ def from_model_id(
) from e

if tokenizer.pad_token is None:
tokenizer.pad_token_id = model.config.eos_token_id
if model.config.pad_token_id is not None:
tokenizer.pad_token_id = model.config.pad_token_id
elif model.config.eos_token_id is not None and isinstance(
model.config.eos_token_id, int
):
tokenizer.pad_token_id = model.config.eos_token_id
elif tokenizer.eos_token_id is not None:
tokenizer.pad_token_id = tokenizer.eos_token_id
else:
tokenizer.add_special_tokens({"pad_token": "[PAD]"})

if (
(
Expand Down

0 comments on commit f32e776

Please sign in to comment.