Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Router] Implement router backbone #1695

Closed
wants to merge 20 commits into from

Conversation

ByronHsu
Copy link
Collaborator

@ByronHsu ByronHsu commented Oct 17, 2024

Motivation

The first step of #1732. See full context in the design doc
This PR implements:

  1. Two-level HTTP architecture
  2. Periodic Garbage Collection (GC)
  3. Dynamic Scaling
  4. Basic request routing (round robin and random)

Modifications

  1. Create router/ folder under srt
  2. Implement router and http server
  3. Handle GC and dynamic scaling

Usage

  1. Launch workers
export CUDA_VISIBLE_DEVICES=0; python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct --host 127.0.0.1 --port 9000
export CUDA_VISIBLE_DEVICES=1; python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct --host 127.0.0.1 --port 9002
  1. Launch a router

Note: for multi-node, replace worker-urls with accessible http endpoint

python -m sglang.srt.router.launch_router --host 127.0.0.1 --port 8080 --policy round_robin --worker-urls http://127.0.0.1:9000 http://127.0.0.1:9002
  1. Send a curl request to the router
curl -X POST http://127.0.0.1:8080/generate  -H "Content-Type: application/json" -d '{
    "text": "Once upon a time,",
    "sampling_params": {
      "max_new_tokens": 16,
      "temperature": 0
    }
  }'

Tests

  1. Add a ci test to test router with DP = 2, ensuring gsm8k score is reasonable.
  2. Do a manual test to ensure the failure of a worker does not crash the router

TODO

  1. Implement cache-aware routing (collab w @Ying1123 @yichuan520030910320 )
  2. Profile and improve performance
  3. Add tests for fault tolerance (like killing a worker in the middle), but make sure the test is not flaky on CI
  4. Add complete support of SGLang compatible apis (like /health_generate etc)
  5. (stretched) Implement router in another language to reduce overhead (e.g. rust)

Checklist

  • Format your code according to the Contributor Guide.
  • Add unit tests as outlined in the Contributor Guide.
  • Update documentation as needed, including docstrings or example tutorials.

@merrymercy merrymercy mentioned this pull request Oct 19, 2024
30 tasks
@ByronHsu ByronHsu marked this pull request as ready for review October 20, 2024 18:15
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will revert once review is done. this is for fast ci iteration

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this has been disabled


data_path = "test.jsonl"

if is_local() is True:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

python3 test_moe_eval_accuracy_large.py

- name: Evaluate MLA Accuracy (TP=2)
- name: Evaluate DP Router Accuracy (DP = 2)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@ByronHsu ByronHsu changed the title [wip] [router] Implement router backbone [Router] Implement router backbone Oct 20, 2024
@ByronHsu ByronHsu mentioned this pull request Oct 21, 2024
3 tasks
Copy link
Contributor

@merrymercy merrymercy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per our discussion offline, we would like to go with rust. Should we still merge this?

@ByronHsu
Copy link
Collaborator Author

Due to the low performance of 2-level HTTP with python server, i will try to re-implement in rust. Closing the issue for now!

@ByronHsu ByronHsu closed this Oct 22, 2024
@ByronHsu ByronHsu mentioned this pull request Oct 25, 2024
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants