Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mayastor replication does not create a new copy on available Pool in workers #1749

Open
Rammurthy5 opened this issue Oct 7, 2024 · 1 comment
Labels
BUG Something isn't working

Comments

@Rammurthy5
Copy link

Describe the bug
while using OpenEBS replicated storage (Mayastor) in my Kubernetes cluster, created a Mayastor storage class with 2 replication factors. If the worker nodes where the storage is replicated go down, it does not create a copy in available pools and attach the copy.

To Reproduce
Steps to reproduce the behavior:
install openebs with mayastor on talos k8s os.
command i used:

helm install openebs --namespace openebs openebs/openebs --set zfs-localpv.zfsNode.encrKeysDir="/var/openebs/keys" --set mayastor.etcd.localpvScConfig.basePath="/var/openebs/local/{{ .Release.Name }}/localpv-hostpath/etcd" --set mayastor.loki-stack.localpvScConfig.basePath="/var/openebs/local/{{ .Release.Name }}/localpv-hostpath/loki" --set mayastor.loki-stack.loki.persistence.size=1Gi --set mayastor.csi.node.initContainers.enabled=false --create-namespace

Expected behavior
When a worker node goes down and if there is another pool on another node then replication should create a new copy.

Screenshots

** OS info (please complete the following information):**

  • Distro: Talos 1.7.2
  • Kernel version 1.27.0
  • MayaStor revision or container image:
    iomesh/openebs-ndm 1.8.0 1.8.0
    openebs/openebs 4.1.1 4.1.1
    openebs-jiva/jiva 3.6.0 3.6.0

Additional context
couldn't attach openebs log as it doesn't support .tar
can share the logs in private / community slack etc.

@tiagolobocastro
Copy link
Contributor

Adding the tar here: cluster2.tar.gz

Here were the issues:

 2024-10-06T05:03:00.175720961Z stdout F   [2m2024-10-06T05:03:00.175316Z[0m [31mERROR[0m [1;31mcore::volume::operations_helper[0m[31m: [31mFailed to attach replica to nexus, [1;31mreplica.uuid[0m[31m: ef78c63c-7cec-4f54-9911-1507467a01e6, [1;31mreplica.pool[0m[31m: gcp-225, [1;31mreplica.node[0m[31m: gcp-225, [1;31merror[0m[31m: "gRPC request 'share_replica' for 'Replica' failed with 'status: Internal, message: \"failed to share lvol ef78c63c-7cec-4f54-9911-1507467a01e6: NVMe persistence through power-loss failure: File exists (os error 17)\", details: [], metadata: MetadataMap { headers: {\"content-type\": \"application/grpc\", \"date\": \"Sun, 06 Oct 2024 05:03:00 GMT\", \"content-length\": \"0\"} }': status: Internal, message: \"failed to share lvol ef78c63c-7cec-4f54-9911-1507467a01e6: NVMe persistence through power-loss failure: File exists (os error 17)\", details: [], metadata: MetadataMap { headers: {\"content-type\": \"application/grpc\", \"date\": \"Sun, 06 Oct 2024 05:03:00 GMT\", \"content-length\": \"0\"} }"[0m

And same for the nexus.

@tiagolobocastro tiagolobocastro added the BUG Something isn't working label Oct 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BUG Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants