Possible timing side-channels caused by shared prefix #1504

Unik-lif · 2024-09-24T14:29:46Z

Dear Sglang Team,
we are a security research group. We are impressed by its decent design, especially by the shared prefix kv-cache. But as we studied further, more concerns about the security of Sglang have arosen. When a new prompt comes, if the TokenKVPool has its prefix tokens, the prefill process will be accelerated, which can be reflected in TTFT. We found the timing differences of TTFT introduced by more shared-tokens are significant enough to be recognized.

Description

Assume the victim has sent a valuable prompt to the sglang, or a valuable system prompt is sent beforehand in sglang, under certain conditions (e.g. the attacker shares the same serving backend with the victim, etc.), the attacker can endeavor to guess the content of the victim prompt and check its validity according to the TTFT.

Different from vLLM (which shares tokens in chunks), Sglang uses token-by-token sharing mechanism (RadixAttention) and cooperates it with trie structure to store kv-cache info. On the other hand, the timing decrease of one more shared token is often negligble, which increases the difficulties for the attacker to guess prompts token-by-token, so we want simply demonstrate the above leakage with multiple-more-shared-tokens.

Environment

GPU: NVIDIA A100 (40G)
CUDA: 11.8
pytorch: 2.3.1
OS: ubuntu 18.04
Sglang: v0.2.6

We lanuch the Sglang Server using the default settings. We set the configuration max_tokens=1 of requests to measure the TTFT.

Leakage

We've tested in LLaMA2-13B and LLaMA2-70B-GPTQ (on one device), and plotted the ROC curve to fingerprint the timing difference when the prompts share the prefix of 1, 2, 4 and 8 tokens respectively.

Results seem to indicate larger Model has greater leakage windows. Even when we only have 2-more-shared-tokens, the ROC is still great enough for we to check the validity of our guess.

Attack

We've tried to design some methods to amplify the phenomenon and we found that the AUC of one-more-shared-token can be increased from 0.529 to 0.58. By using the function flush_cache provided by the Sglang, we can increase our TPR in more trails without interfering ourselves (since when the guess is the same, the later prompt will be accelerated).

We've designed a theoretical token-by-token algorithm to recover victim prompts. Detailed information will be provided soon in our paper.

Possible mitigations

Below are some possible mitigations to our attacks.

Maybe Srts can detect whether a user is consistently asking for the same question (using the same prompt) , i.e. Guess in more trails. This can also be inferred from other behaviour, e.g. the attacker might always set the max_tokens = 1 to get the TTFT.
Increase the granularity of minimum shared tokens. Though the timing differences (shown in ROC graph above) will be amplified, the searching space of attacker scaling exponentially. It could cost the attacker forever when the granularity of shared tokens increase to 8 tokens or more.

We hope to receive your early reply and look forward to discussing with you!

The text was updated successfully, but these errors were encountered:

merrymercy · 2024-10-06T23:12:16Z

@Unik-lif This is very interesting. Is your paper publicly available now?
We would like to invite you to join our bi-weekly online development meeting to discuss this vulnerability. Are you available on Oct. 19? If so, could you sign up for a 20-min slot in this doc?

Unik-lif · 2024-10-08T07:40:01Z

Thank u for ur warm reply @merrymercy !
We are honored by the invitation you extended to us, however, we are now busy in settling down other stuffs, and might not be available on Oct.19 . I am sorry for that.😢

Is your paper publicly available now?

Yes! We've recently put our manuscript on Arixiv. However, the content presented in this manuscript is not yet complete, and we hope to further refine it in the future.

Unik-lif changed the title ~~[Disscusions] Possible timing side-channels of KV-Cache?~~ Possible timing side-channels of KV-Cache? Sep 24, 2024

Unik-lif changed the title ~~Possible timing side-channels of KV-Cache?~~ Possible timing side-channels caused by shared prefix Sep 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible timing side-channels caused by shared prefix #1504

Possible timing side-channels caused by shared prefix #1504

Unik-lif commented Sep 24, 2024 •

edited

Loading

merrymercy commented Oct 6, 2024 •

edited

Loading

Unik-lif commented Oct 8, 2024 •

edited

Loading

Possible timing side-channels caused by shared prefix #1504

Possible timing side-channels caused by shared prefix #1504

Comments

Unik-lif commented Sep 24, 2024 • edited Loading

Description

Environment

Leakage

Attack

Possible mitigations

merrymercy commented Oct 6, 2024 • edited Loading

Unik-lif commented Oct 8, 2024 • edited Loading

Unik-lif commented Sep 24, 2024 •

edited

Loading

merrymercy commented Oct 6, 2024 •

edited

Loading

Unik-lif commented Oct 8, 2024 •

edited

Loading