You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After lost connection to the K8S Serve Controller, I then have no way to terminate it, even with --purge flag.
andyl@andylizf-dev-server ~/skypilot (fix-aws-name)> sky down sky-serve-controller-e2dc6f0f (sky)
⠹ Checking for live services
Canceled autodown on the cluster 'sky-serve-controller-e2dc6f0f', since it is found to be in an abnormal state. To fix, try running: sky start -f -i 10 --down sky-serve-controller-e2dc6f0f
sky.exceptions.ClusterNotUpError: Failed to connect to serve controller, please try again later.
During handling of the above exception, another exception occurred:
sky.exceptions.NotSupportedError: Tearing down the sky serve controller while it is in INIT state is not supported (this means a sky serve up is in progress or the previous launch failed), as we cannot guarantee that all the services are terminated. Please wait until the sky serve controller is UP or fix it with sky start sky-serve-controller-e2dc6f0f.
andyl@andylizf-dev-server ~/skypilot (fix-aws-name) [1]> sky down sky-serve-controller-e2dc6f0f --purge (sky)
sky.exceptions.ClusterNotUpError: Failed to connect to serve controller, please try again later.
During handling of the above exception, another exception occurred:
sky.exceptions.NotSupportedError: Tearing down the sky serve controller while it is in INIT state is not supported (this means a sky serve up is in progress or the previous launch failed), as we cannot guarantee that all the services are terminated. Please wait until the sky serve controller is UP or fix it with sky start sky-serve-controller-e2dc6f0f.
Also, for K8S pod, I don't think we could re-start it. But the log prompt us to do so, and it gives:
andyl@andylizf-dev-server ~/skypilot (fix-aws-name) [1]> sky start sky-serve-controller-e2dc6f0f (sky)
Restarting 1 cluster: sky-serve-controller-e2dc6f0f. Proceed? [Y/n]:
Traceback (most recent call last):
File "/home/andyl/miniconda3/envs/sky/bin/sky", line 8, in <module>
sys.exit(cli())
^^^^^
File "/home/andyl/miniconda3/envs/sky/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/andyl/miniconda3/envs/sky/lib/python3.11/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/home/andyl/skypilot/sky/utils/common_utils.py", line 366, in _record
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/home/andyl/skypilot/sky/cli.py", line 838, in invoke
return super().invoke(ctx)
^^^^^^^^^^^^^^^^^^^
File "/home/andyl/miniconda3/envs/sky/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/andyl/miniconda3/envs/sky/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/andyl/miniconda3/envs/sky/lib/python3.11/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/andyl/skypilot/sky/utils/common_utils.py", line 386, in _record
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/home/andyl/skypilot/sky/cli.py", line 2581, in start
core.start(name,
File "/home/andyl/skypilot/sky/utils/common_utils.py", line 386, in _record
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/home/andyl/skypilot/sky/core.py", line 379, in start
return _start(cluster_name,
^^^^^^^^^^^^^^^^^^^^
File "/home/andyl/skypilot/sky/core.py", line 305, in _start
handle = backend.provision(dummy_task,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/andyl/skypilot/sky/utils/common_utils.py", line 386, in _record
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/home/andyl/skypilot/sky/utils/common_utils.py", line 366, in _record
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/home/andyl/skypilot/sky/backends/backend.py", line 84, in provision
return self._provision(task, to_provision, dryrun, stream_logs,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/andyl/skypilot/sky/backends/cloud_vm_ray_backend.py", line 2869, in _provision
config_dict = retry_provisioner.provision_with_retries(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/andyl/skypilot/sky/utils/common_utils.py", line 386, in _record
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/home/andyl/skypilot/sky/backends/cloud_vm_ray_backend.py", line 2039, in provision_with_retries
assert (clouds.CloudImplementationFeatures.STOP
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: set()
That's all happens when:
andyl@andylizf-dev-server ~/skypilot (fix-aws-name) [1]> sky status (sky)
Clusters
NAME LAUNCHED RESOURCES STATUS AUTOSTOP COMMAND
sky-serve-controller-e2dc6f0f 10 mins ago 1x Kubernetes(2CPU--2GB, cpus=2+, disk_size=20, ports=['30001-30020']) INIT - sky serve up ./_a/task.yaml...
Managed jobs
No in-progress managed jobs. (See: sky jobs -h)
Services
Failed to connect to serve controller, please try again later.
The text was updated successfully, but these errors were encountered:
After lost connection to the K8S Serve Controller, I then have no way to terminate it, even with
--purge
flag.Also, for K8S pod, I don't think we could re-start it. But the log prompt us to do so, and it gives:
That's all happens when:
The text was updated successfully, but these errors were encountered: