Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[serve] Simplify application build codepath #48218

Open
wants to merge 70 commits into
base: master
Choose a base branch
from

Conversation

edoakes
Copy link
Contributor

@edoakes edoakes commented Oct 23, 2024

Why are these changes needed?

Stacked on: #48223

Reimplements the .bind() codepath to avoid using the Ray DAG codepath which adds a lot of complexity for little benefit.

The DAG traversal code is much simpler, around 100 lines of self-contained code in build_app.py. I've also added unit tests for it.

There should be no behavior change here.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@edoakes edoakes added the go add ONLY when ready to merge, run all tests label Oct 23, 2024
Signed-off-by: Edward Oakes <[email protected]>
Signed-off-by: Edward Oakes <[email protected]>
Signed-off-by: Edward Oakes <[email protected]>
Signed-off-by: Edward Oakes <[email protected]>
Signed-off-by: Edward Oakes <[email protected]>
Signed-off-by: Edward Oakes <[email protected]>
Signed-off-by: Edward Oakes <[email protected]>
Signed-off-by: Edward Oakes <[email protected]>
Signed-off-by: Edward Oakes <[email protected]>
Signed-off-by: Edward Oakes <[email protected]>
Signed-off-by: Edward Oakes <[email protected]>
Signed-off-by: Edward Oakes <[email protected]>
Signed-off-by: Edward Oakes <[email protected]>
Signed-off-by: Edward Oakes <[email protected]>
@edoakes edoakes force-pushed the reimplement-build branch 2 times, most recently from 182cdcd to 2bb5722 Compare October 25, 2024 15:58
Signed-off-by: Edward Oakes <[email protected]>
Signed-off-by: Edward Oakes <[email protected]>
Signed-off-by: Edward Oakes <[email protected]>
Signed-off-by: Edward Oakes <[email protected]>
Signed-off-by: Edward Oakes <[email protected]>
Signed-off-by: Edward Oakes <[email protected]>
Signed-off-by: Edward Oakes <[email protected]>
Signed-off-by: Edward Oakes <[email protected]>
Signed-off-by: Edward Oakes <[email protected]>
Signed-off-by: Edward Oakes <[email protected]>
Signed-off-by: Edward Oakes <[email protected]>
Signed-off-by: Edward Oakes <[email protected]>
@edoakes edoakes requested review from zcin and GeneDer and removed request for zcin October 26, 2024 23:13
@edoakes edoakes marked this pull request as ready for review October 26, 2024 23:13
@edoakes edoakes requested a review from zcin October 26, 2024 23:15
Signed-off-by: Edward Oakes <[email protected]>
Signed-off-by: Edward Oakes <[email protected]>
Signed-off-by: Edward Oakes <[email protected]>
@edoakes
Copy link
Contributor Author

edoakes commented Oct 26, 2024

NOTE: test_deploy_2.py::test_deployment_error_handling was failing with my changes because it was not actually checking the right thing previously. It was passing because a validation error was raised in the DAG building codepath, not in the controller. This meant that if bad actor options did make it to the controller, it was not handled gracefully (similar to #43888).

I've updated the logic in deployment_scheduler.py to handle the case where Ray complains during actor creation and verified that it's handled gracefully now.

Signed-off-by: Edward Oakes <[email protected]>
Signed-off-by: Edward Oakes <[email protected]>
Copy link
Contributor

@GeneDer GeneDer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

"""Dictionary that uses id() for keys instead of hash().
This is necessary because Application objects aren't hashable and we want each
instance to map to a unique key.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great comment! Kinda wondering if there will be collisions between those ids if used across nodes and will this ever be used across nodes? Maybe a safer implementation is just generate a random uuid in Application and implement __hash__ to return that id?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will never be used across nodes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I agree that would probably be a safer solution overall

Signed-off-by: Edward Oakes <[email protected]>
def call_app_builder_with_args_if_necessary(
def call_user_app_builder_with_args_if_necessary(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed to clarify the difference between this and build_app

@edoakes
Copy link
Contributor Author

edoakes commented Oct 28, 2024

now blocked on: #48294

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants