huggingface / text-generation-inference Public

Notifications You must be signed in to change notification settings
Fork 1.1k
Star 8.9k

Code
Issues 109
Pull requests 18
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Issues: huggingface/text-generation-inference

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

109 Open 1,229 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

PREFIX_CACHING=0 does not disable prefix caching in v2.3.1

#2676 opened Oct 21, 2024 by sam-ulrich1

2 of 4 tasks

(Prefill) KV Cache Indexing error if started multiple TGI servers concurrently

#2675 opened Oct 21, 2024 by nathan-az

3 of 4 tasks

Distributed Inference failing for Llama-3.1-70b-Instruct

#2671 opened Oct 20, 2024 by SMAntony

2 of 4 tasks

Getting 2 different responses from the same HTTP call with seed set depending on what machine calls

#2670 opened Oct 18, 2024 by sam-ulrich1

2 of 4 tasks

TGI does not support FP8 quantized models on ROCm

#2654 opened Oct 16, 2024 by Bihan

1 of 4 tasks

Unable to load GPTQ LoRA Adapter

#2653 opened Oct 15, 2024 by SMAntony

2 of 4 tasks

Trust_remote_code not pass to router, TGI launcher get stuck if model tokenizer has custom code

#2649 opened Oct 15, 2024 by tanyinyan

4 tasks

OpenAI Client format + chat template for a single call

#2644 opened Oct 14, 2024 by vitalyshalumov

1 of 4 tasks

How do you download a subfile?

#2643 opened Oct 14, 2024 by PeterTucker

1 of 4 tasks

Add AMD gfx110* support

#2641 opened Oct 13, 2024 by cazlo

input tokens exceeded max_input_tokens

#2638 opened Oct 12, 2024 by LanSnowZ

2 of 4 tasks

[New Model Request] NVLM

#2636 opened Oct 11, 2024 by nbroad1881

2 tasks done

TGI drops requests when 150 requests are sent continuously at the rate of 5 Request Per Second in AMD 8 X MI300x with Llama 3.1 405B

#2635 opened Oct 11, 2024 by Bihan

2 of 4 tasks

No module named moe_kernel in Flash Attention Installation while compiling TGI2.3.1

#2621 opened Oct 8, 2024 by abhasin14

Excessive use of VRAM for Llama 3.1 8B

#2615 opened Oct 7, 2024 by ukito-pl

1 of 4 tasks

huggingface_hub.errors.GenerationError: Request failed during generation: Server error:

#2608 opened Oct 4, 2024 by ivanhe123

2 of 4 tasks

Server error: transport error

#2593 opened Oct 1, 2024 by ismael-dm

2 of 4 tasks

Remove max_stop_sequences by default

#2584 opened Sep 29, 2024 by sestinj

3 of 4 tasks

How to turn on the KV cache when serve a model?

#2583 opened Sep 28, 2024 by hahmad2008

4 tasks

OutOfMemory error running Meta-Llama-3.1-405B-Instruct-fp8 on 8xH100

#2572 opened Sep 26, 2024 by ad01bl

1 of 4 tasks

Deploy error for Llama-3.2-vision-11B: "Sharded is not supported for AutoModel"

#2571 opened Sep 26, 2024 by xuan1905

1 of 4 tasks

Question: What is preferred way to cite TGI/repo? Didnt see a citation file.

#2569 opened Sep 26, 2024 by elegantmoose

Passing an image_url to a text-only model should fail explicitly

#2565 opened Sep 25, 2024 by Wauplin

4 tasks

Image for arm64 (Macbook Pro)

#2560 opened Sep 24, 2024 by arsentievalex

1 of 4 tasks

Inconsistent Behavior with Multi-LoRA Deployment

#2559 opened Sep 24, 2024 by charlatan-101

2 of 4 tasks

Previous 1 2 3 4 5 Next

Previous Next

ProTip! Exclude everything labeled bug with -label:bug.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly