[Docs] Offline batch inference guide (static assignment) #4144

Michaelvll · 2024-10-22T18:49:51Z

An initial guide for the offline batch inference with static assignment.

Future TODOs:

Dynamic pulling based batch inference
Batch inference with autoscaling (maybe based on SkyServe)
An interface for batch inference.

Tested (run the relevant ones):

Code formatting: bash format.sh
Any manual or new tests for this PR (please specify below)
All smoke tests: pytest tests/test_smoke.py
Relevant individual smoke tests: pytest tests/test_smoke.py::test_fill_in_the_name
Backward compatibility tests: conda deactivate; bash -i tests/backward_compatibility_tests.sh

…to batch-inference

concretevitamin

Quick look. P0 is discuss + clarify the concepts.

concretevitamin · 2024-10-23T18:16:12Z

docs/source/examples/batch-inference.rst

+
+.. code-block::
+
+    metadata.txt


Would be nice to show the first few lines of metadata file.

concretevitamin · 2024-10-23T18:16:40Z

docs/source/examples/batch-inference.rst

+
+.. code-block:: bash
+
+    NUM_PARALLEL_GROUPS=16


"groups" is surprising/unfamiliar; do you mean "workers"?

concretevitamin · 2024-10-23T18:25:08Z

docs/source/examples/batch-inference.rst

+.. _offline-batch-inference:
+
+Large-Scale Offline Batch Inference
+===================================


Overall may need a figure to illustrate all concepts and how they relate: chunks / work items (is one item = one jsonl in this case?) / groups / workers, ..

At least a bullet list at the top to explain these. I'm currently confused by the above.

concretevitamin · 2024-10-23T18:31:05Z

docs/source/examples/batch-inference.rst

+
+    llm = LLM(model='meta-llama/Meta-Llama-3.1-7B-Instruct', tensor_parallel_size=1)
+
+    def batch_inference(llm: LLM, data_path: str):


Rename? It seems to be working on 1 work item / 1 batch only, and there's no 'batching'?

concretevitamin · 2024-10-23T18:32:48Z

docs/source/examples/batch-inference.rst

+Scale Out to Multiple Nodes
+---------------------------
+
+To scale out the inference to multiple machines, we can group the data chunks into multiple pieces so that each machine can process one piece.


Reduce concept: yet another one here 'piece'!

Michaelvll added 12 commits September 13, 2024 08:49

early draft for batch inference

adbbdd8

Fix inference yaml

487b5f2

fix inference

38ef2e3

fix interface calling

674582b

Fix

a2968ae

fix worker yaml

a01d867

Tested with continue batch inference

0146d46

add a few contents

47a40c2

Add contents for batch inference

faaece4

add contents for batch inference

48cc391

Updates batch inference examples

cbbb0cb

Merge branch 'batch-inference' of github.com:assemble-org/skypilot in…

921cc92

…to batch-inference

concretevitamin reviewed Oct 23, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Docs] Offline batch inference guide (static assignment) #4144

[Docs] Offline batch inference guide (static assignment) #4144

Michaelvll commented Oct 22, 2024

concretevitamin left a comment

concretevitamin Oct 23, 2024

concretevitamin Oct 23, 2024

concretevitamin Oct 23, 2024

concretevitamin Oct 23, 2024

concretevitamin Oct 23, 2024


		llm = LLM(model='meta-llama/Meta-Llama-3.1-7B-Instruct', tensor_parallel_size=1)

		def batch_inference(llm: LLM, data_path: str):

[Docs] Offline batch inference guide (static assignment) #4144

Are you sure you want to change the base?

[Docs] Offline batch inference guide (static assignment) #4144

Conversation

Michaelvll commented Oct 22, 2024

concretevitamin left a comment

Choose a reason for hiding this comment

concretevitamin Oct 23, 2024

Choose a reason for hiding this comment

concretevitamin Oct 23, 2024

Choose a reason for hiding this comment

concretevitamin Oct 23, 2024

Choose a reason for hiding this comment

concretevitamin Oct 23, 2024

Choose a reason for hiding this comment

concretevitamin Oct 23, 2024

Choose a reason for hiding this comment