-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: add support for distributed serving type #1187
Conversation
Signed-off-by: 林联辉 <[email protected]>
Signed-off-by: 林联辉 <[email protected]>
fa633fa
to
9dc8b00
Compare
Signed-off-by: 林联辉 <[email protected]>
b.AddArgValue(key, value) | ||
} | ||
if err := b.PreBuild(); err != nil { | ||
return nil, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest using fmt.Errorf("failed to build args: %v", err)
instead of err
.
return nil, err | ||
} | ||
if err := b.ArgsBuilder.Build(); err != nil { | ||
return nil, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
@linnlh Please run the following commands to download the go module into the vendor package. go mod tidy
go mod vendor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cheyang The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Purpose of this PR
This PR introduces a new serving type called
distributed
to Arena's serving module. The primary motivation behind these changes is to enable the deployment of large-scale models across multiple nodes within a Kubernetes (K8s) cluster.Proposed changes:
distributed
to Arena's serving module which can deploy model across multiple nodes.distributed
serving type.Which issue(s) this PR fixes:
Fixes #1186
Change Category
Rationale
The
distributed
serving type addressed the increasing demand for multi-host inference due to the advancement of large language models (LLMs) such as Meta's Llama-3.1-405B. Currently, Arena lacks the capability to deploy models distributed across multiple nodes, and this PR aims to fill the gap.