Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trust_remote_code not pass to router, TGI launcher get stuck if model tokenizer has custom code #2649

Open
4 tasks
tanyinyan opened this issue Oct 15, 2024 · 0 comments · May be fixed by #2664
Open
4 tasks

Comments

@tanyinyan
Copy link

tanyinyan commented Oct 15, 2024

System Info

TGI version: v2.3.1
Model: Baichuan2-7B
The full command line (has set trust-remote-code):

export USE_PREFIX_CACHING=0
export ATTENTION=paged
export ASCEND_RT_VISIBLE_DEVICES=4
model_id=/home/data/models/baichuan2-7b

text-generation-launcher \
--model-id $model_id \
--port 12359 \
--max-input-length 2048 \
--max-total-tokens 2560 \
--max-batch-prefill-tokens 4096 \
--max-waiting-tokens 20 \
--max-concurrent-requests 200 \
--waiting-served-ratio 1.2 \
--trust-remote-code

From log ,we can see trust_remote-code is set:

2024-10-15T07:47:47.879952Z  WARN text_generation_launcher: `trust_remote_code` is set. Trusting that model `/home/data/models/baichuan2-7b` do not contain malicious code.
2024-10-15T07:47:47.880230Z  INFO download: text_generation_launcher: Starting check and download process for /home/data/models/baichuan2-7b
2024-10-15T07:47:52.691488Z  INFO text_generation_launcher: Files are already present on the host. Skipping download.
2024-10-15T07:47:53.696712Z  INFO download: text_generation_launcher: Successfully downloaded weights for /home/data/models/baichuan2-7b

But after warmup, it gets stuck, and suggest trust_remote_code=True

2024-10-15T07:48:19.445156Z  INFO text_generation_router_v3: backends/v3/src/lib.rs:102: Setting max batch total tokens to 25344
2024-10-15T07:48:19.445326Z  INFO text_generation_router_v3: backends/v3/src/lib.rs:127: Using backend V3
The repository for /home/data/models/baichuan2-7b contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co//home/data/models/baichuan2-7b.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

Add , ("trust_remote_code","True".to_string()) as AutoTokenizer.from_pretrained kwargs
text-generation-inference/router/src/server.rs
Lines L1624-L1625:

    let tokenizer: Option<Tokenizer> = tokenizer_filename.and_then(|filename| {
        use pyo3::prelude::*;
        let convert = pyo3::Python::with_gil(|py| -> PyResult<()> {
            let transformers = py.import_bound("transformers")?;
            let auto = transformers.getattr("AutoTokenizer")?;
            let from_pretrained = auto.getattr("from_pretrained")?;
            let args = (tokenizer_name.to_string(),);
            let kwargs = [(
                "revision",
                revision.clone().unwrap_or_else(|| "main".to_string()),
            )
            , ("trust_remote_code","True".to_string())
            ]
            .into_py_dict_bound(py);
            let tokenizer = from_pretrained.call(args, Some(&kwargs))?;
            let save = tokenizer.getattr("save_pretrained")?;
            let args = ("out".to_string(),);
            save.call1(args)?;
            Ok(())
        })

It runs OK:

2024-10-15T08:44:56.377562Z  INFO text_generation_router_v3: backends/v3/src/lib.rs:102: Setting max batch total tokens to 25344
2024-10-15T08:44:56.377648Z  INFO text_generation_router_v3: backends/v3/src/lib.rs:127: Using backend V3
2024-10-15T08:45:01.690286Z  INFO text_generation_router::server: router/src/server.rs:1671: Using config Some(Baichuan)
2024-10-15T08:45:01.690369Z  WARN text_generation_router::server: router/src/server.rs:1716: no pipeline tag found for model /home/data/models/baichuan2-7b
2024-10-15T08:45:01.690379Z  WARN text_generation_router::server: router/src/server.rs:1818: Invalid hostname, defaulting to 0.0.0.0
2024-10-15T08:45:01.801597Z  INFO text_generation_router::server: router/src/server.rs:2211: Connected
2024-10-15T08:45:10.554155Z  INFO generate{parameters=GenerateParameters { best_of: None, temperature: None, repetition_penalty: None, frequency_penalty: None, top_k: None, top_p: None, typical_p: None, do_sample: false, max_new_tokens: Some(20), return_full_text: None, stop: [], truncate: None, watermark: false, details: false, decoder_input_details: false, seed: None, top_n_tokens: None, grammar: None, adapter_id: None } total_time="576.72759ms" validation_time="570.99µs" queue_time="91.36µs" inference_time="576.06533ms" time_per_token="28.803266ms" seed="None"}: text_generation_router::server: router/src/server.rs:402: Success

Expected behavior

Text-generation-launcher argument "trust-remote-code" should be passed to text-generation-router for constucting a fast tokenizer!

@tanyinyan tanyinyan changed the title Trust_remote_code not pass to router tokenizer, TGI launcher get stuck if model tokenizer has custom code Trust_remote_code not pass to router, TGI launcher get stuck if model tokenizer has custom code Oct 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant