Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Router] Implement router backbone #1695

Closed
wants to merge 20 commits into from
51 changes: 0 additions & 51 deletions .github/workflows/pr-test-amd.yml
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will revert once review is done. this is for fast ci iteration

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this has been disabled

This file was deleted.

234 changes: 2 additions & 232 deletions .github/workflows/pr-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,212 +18,6 @@ concurrency:
cancel-in-progress: true

jobs:
unit-test-frontend:
if: github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request'
runs-on: 1-gpu-runner
steps:
- name: Checkout code
uses: actions/checkout@v3

- name: Install dependencies
run: |
pip install --upgrade pip
pip install -e "python[dev]"
pip install transformers==4.45.2
pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.4/ --force-reinstall
- name: Run test
timeout-minutes: 20
run: |
cd test/lang
python3 run_suite.py --suite minimal
unit-test-backend-part-1:
if: github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request'
runs-on: 1-gpu-runner
steps:
- name: Checkout code
uses: actions/checkout@v3

- name: Install dependencies
run: |
pip install --upgrade pip
pip install -e "python[dev]"
pip install transformers==4.45.2
pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.4/ --force-reinstall
- name: Run test
timeout-minutes: 20
run: |
cd test/srt
python3 run_suite.py --suite minimal --range-begin 0 --range-end 5
unit-test-backend-part-2:
if: github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request'
runs-on: 1-gpu-runner
steps:
- name: Checkout code
uses: actions/checkout@v3

- name: Install dependencies
run: |
pip install --upgrade pip
pip install -e "python[dev]"
pip install transformers==4.45.2
pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.4/ --force-reinstall
- name: Run test
timeout-minutes: 30
run: |
cd test/srt
python3 run_suite.py --suite minimal --range-begin 5 --range-end 17
unit-test-backend-part-3:
if: github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request'
runs-on: 1-gpu-runner
steps:
- name: Checkout code
uses: actions/checkout@v3

- name: Install dependencies
run: |
pip install --upgrade pip
pip install -e "python[dev]"
pip install transformers==4.45.2
pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.4/ --force-reinstall
- name: Run test
timeout-minutes: 30
run: |
cd test/srt
python3 run_suite.py --suite minimal --range-begin 17
performance-test-1-gpu-part-1:
if: github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request'
runs-on: 1-gpu-runner
steps:
- name: Checkout code
uses: actions/checkout@v3

- name: Install dependencies
run: |
pip install --upgrade pip
pip install -e "python[all]"
pip install transformers==4.45.2
pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.4/ --force-reinstall
- name: Benchmark Single Latency
timeout-minutes: 10
run: |
cd test/srt
python3 -m unittest test_bench_latency.TestBenchLatency.test_default
- name: Benchmark Online Latency
timeout-minutes: 10
run: |
cd test/srt
python3 -m unittest test_bench_serving.TestBenchServing.test_online_latency_default
- name: Benchmark Offline Throughput
timeout-minutes: 10
run: |
cd test/srt
python3 -m unittest test_bench_serving.TestBenchServing.test_offline_throughput_default
- name: Benchmark Offline Throughput (Non-streaming, small batch size)
timeout-minutes: 10
run: |
cd test/srt
python3 -m unittest test_bench_serving.TestBenchServing.test_offline_throughput_non_stream_small_batch_size
performance-test-1-gpu-part-2:
if: github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request'
runs-on: 1-gpu-runner
steps:
- name: Checkout code
uses: actions/checkout@v3

- name: Install dependencies
run: |
pip install --upgrade pip
pip install -e "python[all]"
pip install transformers==4.45.2
pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.4/ --force-reinstall
- name: Benchmark Offline Throughput (w/o RadixAttention)
timeout-minutes: 10
run: |
cd test/srt
python3 -m unittest test_bench_serving.TestBenchServing.test_offline_throughput_without_radix_cache
- name: Benchmark Offline Throughput (w/ Triton)
timeout-minutes: 10
run: |
cd test/srt
python3 -m unittest test_bench_serving.TestBenchServing.test_offline_throughput_with_triton_attention_backend
- name: Benchmark Offline Throughput (w/ FP8)
timeout-minutes: 10
run: |
cd test/srt
python3 -m unittest test_bench_serving.TestBenchServing.test_offline_throughput_default_fp8
performance-test-2-gpu:
if: github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request'
runs-on: 2-gpu-runner
steps:
- name: Checkout code
uses: actions/checkout@v3

- name: Install dependencies
run: |
pip install --upgrade pip
pip install -e "python[all]"
pip install transformers==4.45.2
pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.4/ --force-reinstall
- name: Benchmark Offline Throughput (TP=2)
timeout-minutes: 10
run: |
cd test/srt
python3 -m unittest test_bench_serving.TestBenchServing.test_moe_offline_throughput_default
- name: Benchmark Offline Throughput (w/o RadixAttention) (TP=2)
timeout-minutes: 10
run: |
cd test/srt
python3 -m unittest test_bench_serving.TestBenchServing.test_moe_offline_throughput_without_radix_cache
- name: Benchmark Single Latency (TP=2)
timeout-minutes: 10
run: |
cd test/srt
python3 -m unittest test_bench_latency.TestBenchLatency.test_moe_default
accuracy-test-1-gpu:
if: github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request'
runs-on: 1-gpu-runner
steps:
- name: Checkout code
uses: actions/checkout@v3

- name: Install dependencies
run: |
pip install --upgrade pip
pip install -e "python[all]"
pip install transformers==4.45.2
pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.4/ --force-reinstall
git clone https://github.com/merrymercy/human-eval.git
cd human-eval
pip install -e .
- name: Evaluate Accuracy
timeout-minutes: 20
run: |
cd test/srt
python3 test_eval_accuracy_large.py
accuracy-test-2-gpu:
if: github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request'
runs-on: 2-gpu-runner
Expand All @@ -242,32 +36,8 @@ jobs:
cd human-eval
pip install -e .
- name: Evaluate Accuracy (TP=2)
timeout-minutes: 20
run: |
cd test/srt
python3 test_moe_eval_accuracy_large.py
- name: Evaluate MLA Accuracy (TP=2)
- name: Evaluate DP Router Accuracy (DP = 2)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

timeout-minutes: 10
run: |
cd test/srt
python3 test_mla.py
python3 test_mla_fp8.py
- name: Evaluate Data Parallelism Accuracy (DP=2)
timeout-minutes: 10
run: |
cd test/srt
python3 test_data_parallelism.py
finish:
needs: [
unit-test-frontend, unit-test-backend-part-1, unit-test-backend-part-2, unit-test-backend-part-3,
performance-test-1-gpu-part-1, performance-test-1-gpu-part-2, performance-test-2-gpu,
accuracy-test-1-gpu, accuracy-test-2-gpu
]
runs-on: ubuntu-latest
steps:
- name: Finish
run: echo "This is an empty step to ensure that all jobs are completed."
python3 test_router.py
Empty file.
Loading
Loading