How do I use it in vllm deployment #3

jchang98 · 2024-03-05T16:01:38Z

How can I use this approach in vllm deployment without training，can you give me a specific example. thx

ChenxinAn-fdu · 2024-03-06T14:10:18Z

Thank you for bringing this to our attention. Unfortunately, the current version of vLLM does not support the return of attention scores. However, we are pleased to inform you that this functionality is planned in the next release of the software.

In the meantime, we are working diligently to implement paged attention—the key feature of vLLM—as well as Flash decoding. These enhancements aim to accelerate the generation process and decrease the GPU memory of the KV cache.

we appreciate your patience while we work on these developments.
Stay tuned for updates.

jchang98 · 2024-03-06T14:18:38Z

@ChenxinAn-fdu OK, thanks for your response

ChenxinAn-fdu · 2024-04-03T06:07:58Z

I have pushed the code for flash decoding and it significantly decreases the memory consumption for decoding with KV-cache. It may be helpful for you.

skyshine102 · 2024-04-16T16:42:24Z

looking forward to the support in vllm!

Shuai-Xie · 2024-05-08T10:12:02Z

@ChenxinAn-fdu Dose vllm support DCA now? We'd like to use this feature in the deployment.

ChenxinAn-fdu · 2024-05-08T10:31:48Z

@Shuai-Xie Hi, I left an issue in their official repo, but it seems that the current version of vllm only supports returning the output tensor without softmax_lse. We plan to implement it ourselves.

If you do not need continual batching, the current repo has implemented flash_decoding. You can use it for some preliminary experiments.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do I use it in vllm deployment #3

How do I use it in vllm deployment #3

jchang98 commented Mar 5, 2024

ChenxinAn-fdu commented Mar 6, 2024

jchang98 commented Mar 6, 2024

ChenxinAn-fdu commented Apr 3, 2024

skyshine102 commented Apr 16, 2024

Shuai-Xie commented May 8, 2024

ChenxinAn-fdu commented May 8, 2024

How do I use it in vllm deployment #3

How do I use it in vllm deployment #3

Comments

jchang98 commented Mar 5, 2024

ChenxinAn-fdu commented Mar 6, 2024

jchang98 commented Mar 6, 2024

ChenxinAn-fdu commented Apr 3, 2024

skyshine102 commented Apr 16, 2024

Shuai-Xie commented May 8, 2024

ChenxinAn-fdu commented May 8, 2024