-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hello_world can not run #112
Comments
It looks like your K8s might not have any storageclasses installed. AdaptDL requires a shared filesystem which can be used to store checkpoints and other information when a job is restarted. Once you have a storageclass for a shared filesystem installed, you can pass it into the submit command with |
Hello @aurickq ,
Because I am not familiar with ValidatingWebhookConfiguration, I don't know how to slove it. |
@SHu0421 This error could be caused by a variety of reasons. You can start by checking |
Hi, have you solved the problem? |
@gudiandian it sounds like it's related to the problem you are having in #124 |
I changed microk8s to standard k8s instance (with three nodes), and I didn't met the problem again. By the way, I used the insecure registry rather than external registry. |
Unfortunately, I am using standard k8s already. Thank you for your reply. |
I have installed the k8s (v1.18.2) in the local cluster and used helm(v2.17.0) to install adaptdl, adaptdl-sched successfully:
root@k8s-master:/home/czq/Pollux/adaptdl_v2/examples/mnist# kubectl get pod -A | grep adaptdl
adaptdl adaptdl-registry-697884b65-wf4w6 1/1 Running 0 17h
adaptdl jazzed-koala-adaptdl-sched-85d75fdb5d-9lvzq 3/3 Running 6 17h
adaptdl jazzed-koala-validator-98f8fcf7c-jj959 1/1 Running 0 17h
adaptdl peeking-ostrich-adaptdl-sched-667c78f9fb-fr2zj 3/3 Running 4 17h
and I write the hello_world protect the same as the introduction with the following structure:
└── hello_world
├── adaptdljob.yaml
├── Dockerfile
└── hello_world.py
I execute the "adaptdl submit hello_world" and get the following information:
/usr/lib/python3/dist-packages/requests/init.py:80: RequestsDependencyWarning: urllib3 (1.26.8) or chardet (3.0.4) doesn't match a supported version!
RequestsDependencyWarning)
Using AdaptDL insecure registry.
Sending build context to Docker daemon 4.096kB
Step 1/4 : FROM python:3.7-slim
---> d3c9ad326043
Step 2/4 : RUN python3 -m pip install -i https://pypi.tuna.tsinghua.edu.cn/simple adaptdl
---> Using cache
---> 05dae174d67e
Step 3/4 : COPY hello_world.py /root/hello_world.py
---> Using cache
---> 10d12170490d
Step 4/4 : ENV PYTHONUNBUFFERED=true
---> Using cache
---> bc04efd29920
Successfully built bc04efd29920
Successfully tagged localhost:59283/adaptdl-submit:latest
Using default tag: latest
The push refers to repository [localhost:59283/adaptdl-submit]
2cab9519a560: Layer already exists
16f13637494a: Layer already exists
25ad0307b4c1: Layer already exists
874b45955cb1: Layer already exists
85c923303735: Layer already exists
d0fa20bfdce7: Layer already exists
2edcec3590a4: Layer already exists
latest: digest: sha256:7346ece45037f13481a30a50907418bbd460035f488a1aab3cfb0f8ebdf35644 size: 1790
W0126 21:25:38.652722 75926 helpers.go:535] --dry-run is deprecated and can be replaced with --dry-run=client.
Unsupported storageclass from available storageclasses []
and I execute "adaptdl ls" but cannot get the information about this demo:
root@k8s-master:/home/czq/Pollux/adaptdl_v2/examples/HelloWorld# adaptdl ls
/usr/lib/python3/dist-packages/requests/init.py:80: RequestsDependencyWarning: urllib3 (1.26.8) or chardet (3.0.4) doesn't match a supported version!
RequestsDependencyWarning)
No adaptdljobs
Name Status Start(UTC) Runtime Rplc Rtrt
I wonder how to cope with this problem and the job can correctly execute.
The text was updated successfully, but these errors were encountered: