Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skypilot only wants to spawn 4 core cpu controller when sky serve up #4197

Open
mainey opened this issue Oct 28, 2024 · 1 comment
Open

Skypilot only wants to spawn 4 core cpu controller when sky serve up #4197

mainey opened this issue Oct 28, 2024 · 1 comment

Comments

@mainey
Copy link

mainey commented Oct 28, 2024

When sky serve up a manifest with spot resources, skypilot would only want to launch a controller instance with 4 core

~/.sky/config.yaml

serve:
  controller:
    resources:
      cloud: aws
      region: ap-southeast-1
      instance_type: c6a.large
      disk_size: 50

jobs:
  controller:
    resources:
      cloud: aws
      region: ap-southeast-1
      instance_type: c6a.large
      disk_size: 50

allowed_clouds:
  - aws

sky serve up prod.yaml

service:
  replica_policy:
    min_replicas: 1
    max_replicas: 1
    target_qps_per_replica: 2
  readiness_probe:
    path: /embeddings
    headers:
      Authorization: Bearer $AUTH_TOKEN
    post_data:
      model: $MODEL_NAME
      user: "user"
      input:
        "a"

resources:
  cloud: aws
  disk_tier: best
  use_spot: true
  disk_size: 100
  ports: 8000
  any_of:
    - cloud: aws
      region: ap-southeast-1
      accelerators: T4g

envs:
  #censored

setup: |
  #censored

run: |
  docker run --runtime nvidia --gpus all -p 8000:8000 \
    --env #CENSORED \
    censored \
    --model-id $MODEL_NAME \
    --port 8000

Error log

ValueError: c6a.large does not have enough vCPUs. c6a.large has 2.0 vCPUs, but 4+ is requested.

The above exception was the direct cause of the following exception:

ValueError: Serve controller resources is not valid, please check ~/.sky/config.yaml file and make sure serve.controller.resources is a valid resources spec. details:
  [valueerror] c6a.large does not have enough vcpus. c6a.large has 2.0 vcpus, but 4+ is requested.

Version & Commit info:

  • sky -v: skypilot, version 1.0.0.dev20241027
  • sky -c: skypilot, commit c0c17483d1f692ad639144050f5f6fa0966e47a5
@cblmemo
Copy link
Collaborator

cblmemo commented Oct 28, 2024

Hi @mainey ! Thanks for reporting this. Could you try adding cpus: 2+ under the {jobs,serve}.controller.resoruces field?

This makes me wondering should we ignore default settings (at least cpu) if we find a customized resource. cc @Michaelvll for a look here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants