Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARK backup failed with efs provisioner #579

Closed
pmquang opened this issue Jun 25, 2018 · 18 comments
Closed

ARK backup failed with efs provisioner #579

pmquang opened this issue Jun 25, 2018 · 18 comments

Comments

@pmquang
Copy link

pmquang commented Jun 25, 2018

I used efs provisioner for creating PV for PO and ark can not backup PV.

This is yaml file to create nginx with efs pv.

#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

---
apiVersion: v1
kind: Namespace
metadata:
  name: nginx-example-efs
  labels:
    app: nginx-example-efs

---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: nginx-logs
  namespace: nginx-example-efs
  labels:
    app: nginx-example-efs
spec:
  storageClassName: aws-efs-2
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi

---
apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: nginx-deployment
  namespace: nginx-example-efs
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: nginx-example-efs
    spec:
      volumes:
        - name: nginx-logs
          persistentVolumeClaim:
           claimName: nginx-logs
      containers:
      - image: nginx:1.7.9
        name: nginx
        ports:
        - containerPort: 80
        volumeMounts:
          - mountPath: "/var/log/nginx"
            name: nginx-logs
            readOnly: false
      tolerations:
        - key: "type"
          effect: "NoSchedule"
          value: "MEM"

---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: nginx-example-efs
  name: my-nginx
  namespace: nginx-example-efs
spec:
  ports:
  - port: 80
    targetPort: 80
  selector:
    app: nginx-example-efs
  type: ClusterIP

Log:

Name:         nginx-example-efs
Namespace:    heptio-ark
Labels:       <none>
Annotations:  <none>

Namespaces:
  Included:  *
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        <none>
  Cluster-scoped:  auto

Label selector:  app=nginx-example-efs

Snapshot PVs:  auto

TTL:  720h0m0s

Hooks:  <none>

Phase:  Completed

Backup Format Version:  1

Expiration:  2018-07-25 14:42:49 +0700 +07

Validation errors:  <none>

Persistent Volumes: <none included>

time="2018-06-25T07:42:50Z" level=info msg="PersistentVolume is not a supported volume type for snapshots, skipping." backup=heptio-ark/nginx-example-efs group=v1 groupResource=persistentvolumeclaims logSource="pkg/backup/item_backupper.go:307" name=pvc-b33e8de0-784a-11e8-957d-12dd8b001c9e namespace=nginx-example
@ncdc
Copy link
Contributor

ncdc commented Jun 25, 2018

This is expected. EFS is an NFS-based file system, and there is no snapshot API available for it. Instead, you'll want to use our new integration with Restic. Note, there is a bug that we're currently working to address where you can't restore successfully.

@ncdc ncdc closed this as completed Jun 25, 2018
@ncdc ncdc reopened this Jun 25, 2018
@ncdc
Copy link
Contributor

ncdc commented Jun 25, 2018

@rosskukulinski I don't know how we would necessarily do it, but it might be nice to find a way to somehow inform users that their volumes aren't getting backed up, or that they need to use Restic. I know we currently have a log message that you can find after the backup has completed, such as the one in the report above:

time="2018-06-25T07:42:50Z" level=info msg="PersistentVolume is not a supported volume type for snapshots, skipping." backup=heptio-ark/nginx-example-efs group=v1 groupResource=persistentvolumeclaims logSource="pkg/backup/item_backupper.go:307" name=pvc-b33e8de0-784a-11e8-957d-12dd8b001c9e namespace=nginx-example

but I wonder if we could make this more visible somehow.

@rosskukulinski
Copy link
Contributor

@ncdc Good point. I would hope that #448 for backups would help solve this. In addition, we could tailor #550 to include a Pod Volume Backups.Ignored or Skipped.

@pmquang
Copy link
Author

pmquang commented Jun 26, 2018

Hi @ncdc ,

I see restic example yaml used cloud-credentials, but I don't want to use it on production. I just used IAM role instead of. Is there any chance for it ?

@ncdc
Copy link
Contributor

ncdc commented Jun 26, 2018 via email

@pmquang
Copy link
Author

pmquang commented Jun 26, 2018

I mean in this yaml :

kind: DaemonSet
metadata: 
  name: restic
  namespace: heptio-ark
spec:
  selector:
    matchLabels:
      name: restic
  template:
    metadata:
      labels:
        name: restic
    spec:
      serviceAccountName: ark
      securityContext:
        runAsUser: 0
      volumes:
        - name: cloud-credentials
          secret:
            secretName: cloud-credentials
        - name: host-pods
          hostPath:
            path: /var/lib/kubelet/pods
        - name: scratch
          emptyDir: {}
      containers:
        - name: ark
          image: gcr.io/heptio-images/ark:latest
          command:
            - /ark
          args:
            - restic 
            - server
          volumeMounts:
            - name: cloud-credentials
              mountPath: /credentials
            - name: host-pods
              mountPath: /host_pods
              mountPropagation: HostToContainer
            - name: scratch
              mountPath: /scratch
          env:
            - name: NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            - name: HEPTIO_ARK_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
            - name: AWS_SHARED_CREDENTIALS_FILE
              value: /credentials/cloud
            - name: ARK_SCRATCH_DIR
              value: /scratch

I will remove all aws cloud-credentials and add annotation to use IAM role ( use kube2iam ). Will it work ?

@ncdc
Copy link
Contributor

ncdc commented Jun 26, 2018 via email

@pmquang
Copy link
Author

pmquang commented Jun 26, 2018

thanks @ncdc , let me try.

@pmquang
Copy link
Author

pmquang commented Jun 27, 2018

hi @ncdc ,

I meet this error:

~ 123491$ kubectl logs --tail 100 restic-2jgld -n heptio-ark
Error: unknown command "restic" for "ark"
Run 'ark --help' for usage.
An error occurred: unknown command "restic" for "ark"

Could you help me check ?

@ncdc
Copy link
Contributor

ncdc commented Jun 27, 2018 via email

@pmquang
Copy link
Author

pmquang commented Jun 27, 2018

same @ncdc :)

~ 123491$ kubectl describe daemonset restic -n heptio-ark
Name:           restic
Selector:       name=restic
Node-Selector:  <none>
Labels:         name=restic
Annotations:    <none>
Desired Number of Nodes Scheduled: 29
Current Number of Nodes Scheduled: 29
Number of Nodes Scheduled with Up-to-date Pods: 0
Number of Nodes Scheduled with Available Pods: 0
Number of Nodes Misscheduled: 0
Pods Status:  29 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:           name=restic
  Annotations:      iam.amazonaws.com/role=stg-heptio-ark-role
  Service Account:  ark
  Containers:
   ark:
    Image:  gcr.io/heptio-images/ark:v0.9.0.alpha.2
    Port:   <none>
    Command:
      /ark
    Args:
      restic
      server
    Limits:
      cpu:     300m
      memory:  300Mi
    Requests:
      cpu:     25m
      memory:  100Mi
    Environment:
      NODE_NAME:              (v1:spec.nodeName)
      HEPTIO_ARK_NAMESPACE:   (v1:metadata.namespace)
    Mounts:
      /plugins from plugins (rw)
  Volumes:
   plugins:
    Type:    EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
Events:      <none>
~ 123491$ kubectl logs --tail 100 restic-pj98z -n heptio-ark
Error: unknown command "restic" for "ark"
Run 'ark --help' for usage.
An error occurred: unknown command "restic" for "ark"

@pmquang
Copy link
Author

pmquang commented Jun 27, 2018

ah, it should be v0.9.0-alpha.2

@pmquang
Copy link
Author

pmquang commented Jun 27, 2018

it still doesn't work @ncdc

@ncdc
Copy link
Contributor

ncdc commented Jun 27, 2018 via email

@pmquang
Copy link
Author

pmquang commented Jun 27, 2018

same @ncdc :

Error: unknown command "restic" for "ark"
Run 'ark --help' for usage.
An error occurred: unknown command "restic" for "ark"

And I don't know why I give wrong tag and k8s still can pull the image :|

@ncdc
Copy link
Contributor

ncdc commented Jun 27, 2018

There are 29 instances of the restic pod (1 per node, and the output above shows that you have 29 nodes). It's possible you're looking at the logs from one of the older pods, before you set the image tag correctly. Please examine one of the new pods (created most recently) and confirm that its tag is correct, and then check to see if it's running / look at the logs.

@pmquang
Copy link
Author

pmquang commented Jun 27, 2018

you're right :)

Its working now. Thank you.

@skriss
Copy link
Contributor

skriss commented Jul 5, 2018

Looks like this is resolved so closing out. Feel free to open a new issue if needed!

@skriss skriss closed this as completed Jul 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants