[Feature] Request to 8-bit Quantization of Attention with SageAttention #1763

Snowdar · 2024-10-23T09:30:36Z

Checklist

1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
2. Please use English, otherwise it will be closed.

Motivation

As https://github.com/thu-ml/SageAttention mentioned, the quantized 8-bit attention will improvement the speed of inference about 2x and more with the same accuracy, so shall we give it a try or do some verification?

Related resources

github: https://github.com/thu-ml/SageAttention

merrymercy · 2024-10-24T01:18:42Z

contributions are welcome

merrymercy added the good first issue Good for newcomers label Oct 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Request to 8-bit Quantization of Attention with SageAttention #1763

[Feature] Request to 8-bit Quantization of Attention with SageAttention #1763

Snowdar commented Oct 23, 2024

merrymercy commented Oct 24, 2024

[Feature] Request to 8-bit Quantization of Attention with SageAttention #1763

[Feature] Request to 8-bit Quantization of Attention with SageAttention #1763

Comments

Snowdar commented Oct 23, 2024

Checklist

Motivation

Related resources

merrymercy commented Oct 24, 2024