Poka Yoke for pupernetes #88

CharlyF · 2018-07-03T17:30:16Z

Describe what happened:

Following the ignition set up for AWS, I ended up on a machine where pupernetes was already running.
Not seeing a state folder I started it up again.
sudo pupernetes daemon run dca


E0703 17:06:44.423933    5833 run.go:197] Cannot apply manifests in /home/core/dca/dca/manifest-api
I0703 17:06:45.181316    5833 readiness.go:18] Calling kubectl apply -f /home/core/dca/dca/manifest-api ...
E0703 17:06:45.415828    5833 readiness.go:22] Cannot apply manifests exit status 1:
serviceaccount "coredns" unchanged
clusterrole.rbac.authorization.k8s.io "system:coredns" configured
clusterrolebinding.rbac.authorization.k8s.io "system:coredns" configured
configmap "coredns" unchanged
deployment.extensions "coredns" unchanged
service "coredns" unchanged
serviceaccount "kube-controller-manager" unchanged
daemonset.extensions "kube-proxy" unchanged
daemonset.extensions "kube-scheduler" unchanged
clusterrolebinding.rbac.authorization.k8s.io "p8s-admin" configured
The Pod "kube-controller-manager" is invalid: spec: Forbidden: pod updates may not change fields other than `spec.containers[*].image`, `spec.initContainers[*].image`, `spec.activeDeadlineSeconds` or `spec.tolerations` (only additions to existing tolerations)
{"Volumes":[{"Name":"secrets","HostPath":{"Path":"/

A: home/core/dca/dca/secrets","Type":""},"EmptyDir":null,"GCEPersistentDisk":null,"AWSElasticBlockStore":null,"GitRepo":null,"Secret":null,"NFS":null,"ISCSI":null,"Glusterfs":null,"PersistentVolumeClaim":null,"RBD":null,"Quobyte":null,"FlexVolume":null,"Cinder":null,"CephFS":null,"Flocker":null,"DownwardAPI":null,"FC":null,"AzureFile":null,"ConfigMap":null,"VsphereVolume":null,"AzureDisk":null,"PhotonPersistentDisk":null,"Projected":null,"PortworxVolume":null,"ScaleIO":null,"StorageOS":null}],"InitContainers":null,"Containers":[{"Name":"kube-controller-manager","Image":"gcr.io/google_containers/hyperkube:v1.10.3","Command":["/hyperkube","controller-manager","--master=http://127.0.0.1:8080","--leader-elect=true","--leader-elect-lease-duration=150s","--leader-elect-renew-deadline=100s","--leader-elect-retry-period=20s","--cluster-signing-cert-file=/etc/secrets/kube-controller-manager.certificate","--cluster-signing-key-file=/etc/secrets/kube-controller-manager.private_key","--root-ca-file=/etc/secrets/kube-controller-manager.bundle","--service-account-private-key-file=/etc/secrets/service-accounts.rsa","--concurrent-deployment-syncs=2","--concurrent-endpoint-syncs=2","--concurrent-gc-syncs=5","--concurrent-namespace-syncs=3","--concurrent-replicaset-syncs=2","--concurrent-resource-quota-syncs=2","--concurrent-service-syncs=1","--concurrent-serviceaccount-token-syncs=2","--horizontal-pod-autoscaler-use-rest-clients=true"],"Args":null,"WorkingDir":"","Ports":null,"EnvFrom":null,"Env":null,"Resources":{"Limits":{"cpu":"250m"},"Requests":{"cpu":"100m"}},"VolumeMounts":[{"Name":"secrets","ReadOnly":false,"MountPath":"/etc/secrets","SubPath":"","MountPropagation":null}],"VolumeDevices":null,"LivenessProbe":{"Exec":null,"HTTPGet":{"Path":"/healthz","Port":10252,"Host":"","Scheme":"HTTP","HTTPHeaders":null},"TCPSocket":null,"InitialDelaySeconds":15,"TimeoutSeconds":1,"PeriodSeconds":10,"SuccessThreshold":1,"FailureThreshold":3},"ReadinessProbe":{"Exec":null,"HTTPGet":{"Path":"/healthz","Port":10252,"Host":"","Scheme":"HTTP","HTTPHeaders":null},"TCPSocket":null,"InitialDelaySeconds":5,"TimeoutSeconds":1,"PeriodSeconds":10,"SuccessThreshold":1,"FailureThreshold":3},"Lifecycle":null,"TerminationMessagePath":"/dev/termination-log","TerminationMessagePolicy":"File","ImagePullPolicy":"IfNotPresent","SecurityContext":null,"Stdin":false,"StdinOnce":false,"TTY":false}],"RestartPolicy":"Always","TerminationGracePeriodSeconds":30,"ActiveDeadlineSeconds":null,"DNSPolicy":"ClusterFirst","NodeSelector":null,"ServiceAccountName":"kube-controller-manager","AutomountServiceAccountToken":false,"NodeName":"ip-172-31-12-20","SecurityContext":{"HostNetwork":true,"HostPID":false,"HostIPC":false,"ShareProcessNamespace":null,"SELinuxOptions":null,"RunAsUser":null,"RunAsGroup":null,"RunAsNonRoot":null,"SupplementalGroups":null,"FSGroup":null},"ImagePullSecrets":null,"Hostname":"","Subdomain":"","Affinity":null,"SchedulerName":"default-scheduler","Tolerations":null,"HostAliases":null,"PriorityClassName":"","Priority":null,"DNSConfig":null}

B: opt/sandbox/secrets","Type":""},"EmptyDir":null,"GCEPersistentDisk":null,"AWSElasticBlockStore":null,"GitRepo":null,"Secret":null,"NFS":null,"ISCSI":null,"Glusterfs":null,"PersistentVolumeClaim":null,"RBD":null,"Quobyte":null,"FlexVolume":null,"Cinder":null,"CephFS":null,"Flocker":null,"DownwardAPI":null,"FC":null,"AzureFile":null,"ConfigMap":null,"VsphereVolume":null,"AzureDisk":null,"PhotonPersistentDisk":null,"Projected":null,"PortworxVolume":null,"ScaleIO":null,"StorageOS":null}],"InitContainers":null,"Containers":[{"Name":"kube-controller-manager","Image":"gcr.io/google_containers/hyperkube:v1.10.3","Command":["/hyperkube","controller-manager","--master=http://127.0.0.1:8080","--leader-elect=true","--leader-elect-lease-duration=150s","--leader-elect-renew-deadline=100s","--leader-elect-retry-period=20s","--cluster-signing-cert-file=/etc/secrets/kube-controller-manager.certificate","--cluster-signing-key-file=/etc/secrets/kube-controller-manager.private_key","--root-ca-file=/etc/secrets/kube-controller-manager.bundle","--service-account-private-key-file=/etc/secrets/service-accounts.rsa","--concurrent-deployment-syncs=2","--concurrent-endpoint-syncs=2","--concurrent-gc-syncs=5","--concurrent-namespace-syncs=3","--concurrent-replicaset-syncs=2","--concurrent-resource-quota-syncs=2","--concurrent-service-syncs=1","--concurrent-serviceaccount-token-syncs=2","--horizontal-pod-autoscaler-use-rest-clients=true"],"Args":null,"WorkingDir":"","Ports":null,"EnvFrom":null,"Env":null,"Resources":{"Limits":{"cpu":"250m"},"Requests":{"cpu":"100m"}},"VolumeMounts":[{"Name":"secrets","ReadOnly":false,"MountPath":"/etc/secrets","SubPath":"","MountPropagation":null}],"VolumeDevices":null,"LivenessProbe":{"Exec":null,"HTTPGet":{"Path":"/healthz","Port":10252,"Host":"","Scheme":"HTTP","HTTPHeaders":null},"TCPSocket":null,"InitialDelaySeconds":15,"TimeoutSeconds":1,"PeriodSeconds":10,"SuccessThreshold":1,"FailureThreshold":3},"ReadinessProbe":{"Exec":null,"HTTPGet":{"Path":"/healthz","Port":10252,"Host":"","Scheme":"HTTP","HTTPHeaders":null},"TCPSocket":null,"InitialDelaySeconds":5,"TimeoutSeconds":1,"PeriodSeconds":10,"SuccessThreshold":1,"FailureThreshold":3},"Lifecycle":null,"TerminationMessagePath":"/dev/termination-log","TerminationMessagePolicy":"File","ImagePullPolicy":"IfNotPresent","SecurityContext":null,"Stdin":false,"StdinOnce":false,"TTY":false}],"RestartPolicy":"Always","TerminationGracePeriodSeconds":30,"ActiveDeadlineSeconds":null,"DNSPolicy":"ClusterFirst","NodeSelector":null,"ServiceAccountName":"kube-controller-manager","AutomountServiceAccountToken":false,"NodeName":"ip-172-31-12-20","SecurityContext":{"HostNetwork":true,"HostPID":false,"HostIPC":false,"ShareProcessNamespace":null,"SELinuxOptions":null,"RunAsUser":null,"RunAsGroup":null,"RunAsNonRoot":null,"SupplementalGroups":null,"FSGroup":null},"ImagePullSecrets":null,"Hostname":"","Subdomain":"","Affinity":null,"SchedulerName":"default-scheduler","Tolerations":null,"HostAliases":null,"PriorityClassName":"","Priority":null,"DNSConfig":null}

Kubernetes would not start, and as it reached a timeout, it tried to delete the resources:

E0703 17:06:46.422509    5833 run.go:197] Cannot apply manifests in /home/core/dca/dca/manifest-api
E0703 17:06:47.180707    5833 run.go:217] Cannot read dir: open /var/log/pods/: no such file or directory
I0703 17:06:47.602828    5833 state.go:116] Kubenertes apiserver not ready yet: bad status code for http://127.0.0.1:8080/healthz: 500
E0703 17:06:48.185283    5833 probe.go:50] Unexpected state of: unit "p8s-etcd.service" with load state "loaded" is "failed"
W0703 17:06:48.515873    5833 run.go:121] Signal received: "terminated", propagating ...
I0703 17:06:48.515954    5833 stop.go:84] Draining kubelet's pods ...
I0703 17:06:48.515969    5833 stop.go:41] Graceful deleting API resources ...


E0703 17:07:48.821860    5833 stop.go:29] Unexpected error during get namespaces: the server was unable to return a response in the time allotted, but may still be processing the request (get namespaces)
E0703 17:07:49.826396    5833 getter.go:22] Cannot get PodList: Get https://127.0.0.1:10250/pods: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "p8s")
E0703 17:07:49.826426    5833 stop.go:60] Stop called too early: cannot poll pods, RBAC may not deployed
W0703 17:07:49.826435    5833 stop.go:88] Failed to handle a graceful delete of API resources: cannot poll pods, RBAC may not deployed
E0703 17:07:49.830656    5833 getter.go:22] Cannot get PodList: Get https://127.0.0.1:10250/pods: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "p8s")

I think there was a certificates issue but in the end, it just failed to stop.

E0703 17:08:07.835117    5833 getter.go:22] Cannot get PodList: Get https://127.0.0.1:10250/pods: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "p8s")
E0703 17:08:09.830914    5833 stop.go:157] Cannot properly delete pods: timeout 20s reached during pod draining
E0703 17:08:09.830956    5833 stop.go:224] Failed to drain the node: timeout 20s reached during pod draining
E0703 17:08:09.838194    5833 probe.go:50] Unexpected state of: unit "p8s-etcd.service" with load state "loaded" is "failed"
I0703 17:08:09.846542    5833 logging.go:155] Journal tailing of p8s-etcd.service stopped, get them again with: journalctl -o cat -u p8s-etcd.service --no-pager -S 17:06:42
I0703 17:08:09.846640    5833 systemd_action.go:123] Stopping p8s-kubelet.service ...
I0703 17:08:09.859837    5833 systemd_action.go:123] Stopping p8s-kube-apiserver.service ...
I0703 17:09:03.593790    5833 systemd_action.go:123] Stopping p8s-etcd.service ...
E0703 17:09:03.812584    5833 stop.go:256] Unexpected errors: errors during stop: timeout 20s reached during pod draining, systemd units unhealthy: p8s-etcd.service
E0703 17:09:03.812611    5833 main.go:27] Exiting on error: 2

Describe what you expected:

This is a classic 1D10T error, but I think it would be nice to print a message if p8s is already running.

I don't have a specific use case where one would need to run several p8s, and I actually think it should not be allowed, but we should check for already running systemd units, or Kubernetes ports being used.

The text was updated successfully, but these errors were encountered:

CharlyF self-assigned this Jul 3, 2018

CharlyF added the enhancement New feature or request label Jul 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Poka Yoke for pupernetes #88

Poka Yoke for pupernetes #88

CharlyF commented Jul 3, 2018

Poka Yoke for pupernetes #88

Poka Yoke for pupernetes #88

Comments

CharlyF commented Jul 3, 2018