Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible timing side-channels caused by shared prefix #1504

Open
Unik-lif opened this issue Sep 24, 2024 · 2 comments
Open

Possible timing side-channels caused by shared prefix #1504

Unik-lif opened this issue Sep 24, 2024 · 2 comments

Comments

@Unik-lif
Copy link

Unik-lif commented Sep 24, 2024

Dear Sglang Team,
we are a security research group. We are impressed by its decent design, especially by the shared prefix kv-cache. But as we studied further, more concerns about the security of Sglang have arosen. When a new prompt comes, if the TokenKVPool has its prefix tokens, the prefill process will be accelerated, which can be reflected in TTFT. We found the timing differences of TTFT introduced by more shared-tokens are significant enough to be recognized.

Description

Assume the victim has sent a valuable prompt to the sglang, or a valuable system prompt is sent beforehand in sglang, under certain conditions (e.g. the attacker shares the same serving backend with the victim, etc.), the attacker can endeavor to guess the content of the victim prompt and check its validity according to the TTFT.

Different from vLLM (which shares tokens in chunks), Sglang uses token-by-token sharing mechanism (RadixAttention) and cooperates it with trie structure to store kv-cache info. On the other hand, the timing decrease of one more shared token is often negligble, which increases the difficulties for the attacker to guess prompts token-by-token, so we want simply demonstrate the above leakage with multiple-more-shared-tokens.

Environment

  • GPU: NVIDIA A100 (40G)
  • CUDA: 11.8
  • pytorch: 2.3.1
  • OS: ubuntu 18.04
  • Sglang: v0.2.6

We lanuch the Sglang Server using the default settings. We set the configuration max_tokens=1 of requests to measure the TTFT.

Leakage

We've tested in LLaMA2-13B and LLaMA2-70B-GPTQ (on one device), and plotted the ROC curve to fingerprint the timing difference when the prompts share the prefix of 1, 2, 4 and 8 tokens respectively.

graph

Results seem to indicate larger Model has greater leakage windows. Even when we only have 2-more-shared-tokens, the ROC is still great enough for we to check the validity of our guess.

Attack

We've tried to design some methods to amplify the phenomenon and we found that the AUC of one-more-shared-token can be increased from 0.529 to 0.58. By using the function flush_cache provided by the Sglang, we can increase our TPR in more trails without interfering ourselves (since when the guess is the same, the later prompt will be accelerated).

We've designed a theoretical token-by-token algorithm to recover victim prompts. Detailed information will be provided soon in our paper.

Possible mitigations

Below are some possible mitigations to our attacks.

  • Maybe Srts can detect whether a user is consistently asking for the same question (using the same prompt) , i.e. Guess in more trails. This can also be inferred from other behaviour, e.g. the attacker might always set the max_tokens = 1 to get the TTFT.
  • Increase the granularity of minimum shared tokens. Though the timing differences (shown in ROC graph above) will be amplified, the searching space of attacker scaling exponentially. It could cost the attacker forever when the granularity of shared tokens increase to 8 tokens or more.

We hope to receive your early reply and look forward to discussing with you!

@Unik-lif Unik-lif changed the title [Disscusions] Possible timing side-channels of KV-Cache? Possible timing side-channels of KV-Cache? Sep 24, 2024
@Unik-lif Unik-lif changed the title Possible timing side-channels of KV-Cache? Possible timing side-channels caused by shared prefix Sep 29, 2024
@merrymercy
Copy link
Contributor

merrymercy commented Oct 6, 2024

@Unik-lif This is very interesting. Is your paper publicly available now?
We would like to invite you to join our bi-weekly online development meeting to discuss this vulnerability. Are you available on Oct. 19? If so, could you sign up for a 20-min slot in this doc?

@Unik-lif
Copy link
Author

Unik-lif commented Oct 8, 2024

Thank u for ur warm reply @merrymercy !
We are honored by the invitation you extended to us, however, we are now busy in settling down other stuffs, and might not be available on Oct.19 . I am sorry for that.😢

Is your paper publicly available now?

Yes! We've recently put our manuscript on Arixiv. However, the content presented in this manuscript is not yet complete, and we hope to further refine it in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants