-
Notifications
You must be signed in to change notification settings - Fork 496
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[k8s] Inconsistent Display in Setting up a Local Cluster #4191
Comments
The tutorial said: It appears that issue 1 lies with the experimental
~/paper/hello-sky 🐍 skypilot 15:04:39
❯ sky status -a
Clusters
NAME LAUNCHED RESOURCES REGION ZONE STATUS AUTOSTOP HEAD_IP COMMAND
mycluster 4 hrs ago 1x Kubernetes(2CPU--2GB) kind-skypilot - INIT - - sky launch -c mycluster hello_sky.yaml
Managed jobs
No in-progress managed jobs. (See: sky jobs -h)
Services
No live services. (See: sky serve -h)
~/paper/hello-sky 🐍 skypilot 15:06:30
❯ sky status
Clusters
NAME LAUNCHED RESOURCES STATUS AUTOSTOP COMMAND
mycluster 4 hrs ago 1x Kubernetes(2CPU--2GB) INIT - sky launch -c mycluster h...
Managed jobs
No in-progress managed jobs. (See: sky jobs -h)
Services
No live services. (See: sky serve -h)
~/paper/hello-sky 🐍 skypilot 15:06:34
❯ sky status --k8s
Kubernetes cluster state (context: kind-skypilot)
SkyPilot clusters
USER NAME LAUNCHED RESOURCES STATUS
huluobo mycluster 4 hrs ago 1x Kubernetes(cpus=2, mem=2) UP
huluobo sky-serve-controller-9d6cffc0 4 weeks ago 1x Kubernetes(cpus=4, mem=4) UP
Hint: SkyServe replica pods are shown in the "SkyPilot clusters" section. The discrepancy in the |
Hi @root-hbx, thanks for the report. Before I answer your questions, can you share your provision.log? |
Here it is, thanks :) |
Looks like it is stuck at pulling the image. Are you behind a firewall?
|
Thanks @romilbhardwaj . My computer's firewall has always been turned off.
❯ kubectl apply -f https://raw.githubusercontent.com/skypilot-org/skypilot/master/tests/kubernetes/cpu_test_pod.yaml
pod/skytest created
service/skytest-svc unchanged
~/Github_Content/AcademicHomepage main 🐍 skypilot ⎈ kind-skypilot 12:17:39
❯ kubectl apply -f https://raw.githubusercontent.com/skypilot-org/skypilot/master/tests/kubernetes/cpu_test_pod.yaml
pod/skytest unchanged
service/skytest-svc unchanged
~/Github_Content/AcademicHomepage main 🐍 skypilot ⎈ kind-skypilot 12:17:54
❯ kubectl get pod skytest
NAME READY STATUS RESTARTS AGE
skytest 1/1 Running 0 24s
~/Github_Content/AcademicHomepage main 🐍 skypilot ⎈ kind-skypilot 12:18:03
❯ kubectl port-forward svc/skytest-svc 8080:8080
Forwarding from [::1]:8080 -> 8080
Handling connection for 8080 |
Thanks a lot @romilbhardwaj ! Following your guidance, I identified the issue — it was related to Docker Desktop on my computer. I deleted the pod and re-deployed it. Running step A1 of our troubleshooting guide showed that everything was working correctly. After trying to deploy the local Kubernetes cluster again, it worked as expected. However, I'm still a bit confused about why, in the previous stuck state, the outputs of |
Great to hear it works now!
skypilot/sky/provision/kubernetes/utils.py Lines 2105 to 2117 in c0c1748
We could improve this by actually polling the pod to get the right SkyPilot cluster status, but that might be an expensive operation requiring We can also change the cluster status in |
Got it 🫡 I’m wondering if there might be an alternative approach. For example, setting up a probe within each pod that continuously outputs its status externally. This way, we could simply track the status changes from its streaming output instead of inspecting each pod individually each time?
|
Issue Reproduction
I attempted to set up a local Kubernetes cluster on my laptop (Apple M2 Pro) following the instructions here.
After starting it up, I followed the QuickStart guide to launch a cluster named
mycluster
and run the taskhello_sky.yaml
.However, even after waiting over 6 hours, the process remained stalled in CLI.
When I use
sky status
, I find it showUP
status.But when I was trying to
sky exec mycluster hello_sky.yaml
since I supposedmycluster
already existed, I found I was blocked :(It's a little bit confusing.
# showcase the cluster (`mycluster`) initialization is finished ❯ sky status --k8s Kubernetes cluster state (context: kind-skypilot) SkyPilot clusters USER NAME LAUNCHED RESOURCES STATUS huluobo mycluster 10 hrs ago 1x Kubernetes(cpus=2, mem=2) UP huluobo sky-serve-controller-9d6cffc0 4 weeks ago 1x Kubernetes(cpus=4, mem=4) UP
BTW, just a month ago, I used the same setup steps to initialize and launch a GCP cluster, which only didn't show this problem.
Issue Visualization
Raise Question
I’m wondering 3 issues:
sky statis
simultaneously, it shows meUP
status. It's really confusing, maybe we should work on this display issue?Version & Commit info:
sky -v
: version 1.0.0.dev20241024sky -c
: commit cbf5c00The text was updated successfully, but these errors were encountered: