Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug][Cluster][Kubernetes]: All Milvus nodes and coordinators are stuck in CrashLoopBackOff #37459

Closed
1 task done
SalBakraa opened this issue Nov 5, 2024 · 8 comments
Closed
1 task done
Assignees
Labels
help wanted Extra attention is needed

Comments

@SalBakraa
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: v2.4.13
- Deployment mode(standalone or cluster): Cluster
- MQ type(rocksmq, pulsar or kafka): pulsar/kafka
- SDK version(e.g. pymilvus v2.0.0rc2): N/A
- OS(Ubuntu or CentOS): Ubuntu 24.10
- Linux Kernel: 6.11.0-9-generic 
- HOST CPU/Memory: AMD Ryzen Threadripper PRO 5995WX 64-Cores / 995 GiB
- VM CPU: QEMU Virtual CPU version 2.5+
- GPU: 2x Nvidia A6000
- Others:

Current Behavior

Milvus Cluster fails to run on a Virtual Kubernetes Cluster. All milvus nodes and coordinators become stuck in CrashLoopBackOff state.

NAME                                             READY   STATUS             RESTARTS        AGE
my-release-etcd-0                                1/1     Running            0               21m
my-release-etcd-1                                1/1     Running            0               21m
my-release-etcd-2                                1/1     Running            0               21m
my-release-milvus-datacoord-d8c598b86-vdm8p      0/1     CrashLoopBackOff   7 (3m21s ago)   15m
my-release-milvus-datanode-6c9b95645f-rrl24      0/1     CrashLoopBackOff   7 (3m55s ago)   15m
my-release-milvus-indexcoord-694758d846-djgds    0/1     CrashLoopBackOff   7 (3m45s ago)   15m
my-release-milvus-indexnode-87c6cd97-8hjzv       0/1     CrashLoopBackOff   7 (3m40s ago)   15m
my-release-milvus-proxy-6cc57f7b7d-tlpdv         0/1     CrashLoopBackOff   7 (4m2s ago)    15m
my-release-milvus-querycoord-9b84c6974-ff4ss     0/1     CrashLoopBackOff   7 (3m30s ago)   15m
my-release-milvus-querynode-0-7485d47ffd-9v94m   0/1     CrashLoopBackOff   7 (3m51s ago)   14m
my-release-milvus-rootcoord-6746987cc4-d4xdp     0/1     CrashLoopBackOff   7 (3m44s ago)   15m
my-release-minio-0                               1/1     Running            0               21m
my-release-minio-1                               1/1     Running            0               21m
my-release-minio-2                               1/1     Running            0               21m
my-release-minio-3                               1/1     Running            0               21m
my-release-pulsar-bookie-0                       1/1     Running            0               21m
my-release-pulsar-bookie-1                       1/1     Running            0               21m
my-release-pulsar-bookie-2                       1/1     Running            0               21m
my-release-pulsar-bookie-init-qjg58              0/1     Completed          0               21m
my-release-pulsar-broker-0                       1/1     Running            0               21m
my-release-pulsar-proxy-0                        1/1     Running            0               21m
my-release-pulsar-pulsar-init-8zkqc              0/1     Completed          0               21m
my-release-pulsar-zookeeper-0                    1/1     Running            0               21m
my-release-pulsar-zookeeper-1                    1/1     Running            0               20m
my-release-pulsar-zookeeper-2                    1/1     Running            0               19m

Expected Behavior

All milvus pods are Running and Ready.

Steps To Reproduce

1. Create the directory milvus-cluster and `cd` into it
2. Create the virtual machine images using `create_vm_images.sh`:
   
   #!/usr/bin/sh

   # Create image file for workers
   for i in $(seq 6); do
   	qemu-img create -f raw "worker_image_file_$i" 40G
   done

   # Create image file for master
   qemu-img create -f raw "master_image_file" 10G
   
3. download the ubuntu server iso into the *milvus-cluster* directory.
4. Install and prepare the virtual machines in the cluster using `setup_vms.sh`:
   ```sh
   #!/usr/bin/sh

   LOGIN_PASSWORD="$(mkpasswd password)"

   # Create ssh key to access the virtual machiens
   ssh-keygen -f ./ed25519_vm_key -t ed25519

   # mount the image
   mkdir mnt_iso
   sudo mount -o loop ./ubuntu-24.10-live-server-amd64.iso ./mnt_iso

   # Install os on worker nodes
   for i in $(seq 6); do
   	# Setup http server
   	mkdir -p ./www_$i

   	cat > ./www_$i/user-data << YAML
   #cloud-config
   autoinstall:
     version: 1

     # No interactive sections
     interactive-sections: []

     # default behavior
     locale: "en_US.UTF-8"

     # default behavior
     keyboard:
       layout: us
       variant: "dvorak"

     source:
       search_drivers: true

     storage:
       swap:
         size: 0
       layout:
         name: lvm
         sizing-policy: all

     ssh:
       install-server: true
       authorized-keys: []
       allow-pw: true

     codecs:
       install: true

     drivers:
       install: true

     # Updates from both the security and updates pockets are installed.
     updates: all

     # shutdown instead of reboot
     shutdown: poweroff

     packages:
         - ufw
         - apt-transport-https
         - ca-certificates
         - curl
         - software-properties-common
         - netcat-traditional
         - containerd
         - runc
         - nfs-common

     late-commands:
       # Allow all incoming and outgoing traffic in UFW
       - curtin in-target --target=/target -- echo "Configuring UFW to allow all traffic"
       - curtin in-target --target=/target -- /usr/sbin/ufw allow in on any
       - curtin in-target --target=/target -- /usr/sbin/ufw allow out on any
       - curtin in-target --target=/target -- /usr/sbin/ufw reload

       # System config
       - curtin in-target --target=/target -- echo "Disabling swap"
       - curtin in-target --target=/target -- /usr/sbin/swapoff -a
       - curtin in-target --target=/target -- sed -i '/ swap / s/^/#/' /etc/fstab
       - curtin in-target --target=/target -- rm -vf /swap.img

       # K8s Steps
       - curtin in-target --target=/target -- echo "Creating keyring directory if it does not exist"
       - curtin in-target --target=/target -- mkdir -p -m 755 /etc/apt/keyrings

       - curtin in-target --target=/target -- echo "Downloading Kubernetes signing key"
       - curtin in-target --target=/target -- sh -c '/usr/bin/curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.31/deb/Release.key | gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg'

       - curtin in-target --target=/target -- echo "Adding Kubernetes apt repository for v1.31"
       - curtin in-target --target=/target -- sh -c "echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.31/deb/ /' | tee /etc/apt/sources.list.d/kubernetes.list"

       - curtin in-target --target=/target -- echo "Updating package index and installing Kubernetes components"
       - curtin in-target --target=/target -- apt-get update
       - curtin in-target --target=/target -- apt-get install -y kubelet kubeadm kubectl
       - curtin in-target --target=/target -- apt-mark hold kubelet kubeadm kubectl

       - curtin in-target --target=/target -- echo "Enabling kubelet service"
       - curtin in-target --target=/target -- systemctl enable kubelet
       - curtin in-target --target=/target -- echo "Loading required kernel modules"

       - |
         curtin in-target --target=/target -- sh -c 'cat <<EOF > /etc/modules-load.d/k8s.conf
         overlay
         br_netfilter
         EOF'

       - curtin in-target --target=/target -- echo "Setting sysctl parameters for Kubernetes"
       - |
         curtin in-target --target=/target -- sh -c 'cat <<EOF | tee /etc/sysctl.d/kubernetes.conf
         net.bridge.bridge-nf-call-ip6tables = 1
         net.bridge.bridge-nf-call-iptables = 1
         net.ipv4.ip_forward = 1
         EOF'
       - curtin in-target --target=/target -- /usr/sbin/sysctl --system

       # Containerd stuff
       - curtin in-target --target=/target -- mkdir -p -m 755 /etc/containerd
       - curtin in-target --target=/target -- sh -c 'containerd config default | tee /etc/containerd/config.toml'
       - curtin in-target --target=/target -- systemctl enable containerd

       # Write netplan configuration file
       - curtin in-target --target=/target -- echo "Setting up netplan"
       - |
         curtin in-target --target=/target -- sh -c "cat <<EOF > /etc/netplan/01-netcfg.yaml
         network:
           version: 2
           ethernets:
             ens3:
               dhcp4: false
               dhcp6: false
               addresses:
                 - 192.168.2.$((2+$i))/24  # Static IP address
               routes:
                 - to: 0.0.0.0/0
                   via: 192.168.2.1  # Gateway
               nameservers:
                 addresses:
                   - 8.8.8.8  # Google DNS
                   - 8.8.4.4
         EOF"
       - curtin in-target --target=/target -- netplan apply

       # Add root user ssh keys
       - curtin in-target --target=/target -- mkdir -p -m 700 /root/.ssh
       - curtin in-target --target=/target -- sh -c 'echo $(cat ./ed25519_vm_key) > /root/.ssh/authorized_keys'

       # Allow password ssh
       - curtin in-target --target=/target -- sh -c 'echo "PermitRootLogin yes" >> /etc/ssh/sshd_config'

     user-data:
       disable_root: false
       chpasswd:
         expire: false
         list:
           - "root:password"

     ssh:
       install-server: true
       authorized-keys:
         - $(cat ./ed25519_vm_key)
       allow-pw: true

     identity:
       hostname: worker-node-$i
       username: user
       password: "$LOGIN_PASSWORD"
   YAML
   	touch ./www_$i/meta-data
   	cd ./www_$i
   		python3 -m http.server 300$((i+3)) 2>/dev/null 1>&2 &
   		{
   			qemu-system-x86_64 -no-reboot -m 64G \
   				-cdrom ../ubuntu-24.10-live-server-amd64.iso \
   				-boot order=d \
   				-drive file=../worker_image_file_$i,format=raw,cache=none,if=virtio \
   				-kernel ../mnt_iso/casper/vmlinuz \
   				-initrd ../mnt_iso/casper/initrd \
   				-append "autoinstall ds=nocloud-net;s=http://_gateway:300$(($i+3))/";
   			printf "Completed worker $i setup\n\n";
   			kill $(pgrep --full "python3 -m http.server 300$((i+3))");
   		} &
   	cd ..
   done

   # Install os on master node
   # Setup http server
   mkdir -p ./www
   cd ./www
   	python3 -m http.server 3003 2>/dev/null 1>&2 &
   cd ..
   cat > ./www/user-data << YAML
   #cloud-config
   autoinstall:
     version: 1

     # No interactive sections
     interactive-sections: []

     # default behavior
     locale: "en_US.UTF-8"

     # default behavior
     keyboard:
       layout: us
       variant: "dvorak"

     source:
       search_drivers: true

     storage:
       swap:
         size: 0
       layout:
         name: lvm
         sizing-policy: all

     ssh:
       install-server: true
       authorized-keys: []
       allow-pw: true

     codecs:
       install: true

     drivers:
       install: true

     # Updates from both the security and updates pockets are installed.
     updates: all

     # shutdown instead of reboot
     shutdown: poweroff

     packages:
         - ufw
         - apt-transport-https
         - ca-certificates
         - curl
         - software-properties-common
         - netcat-traditional
         - containerd
         - runc
         - nfs-common

     late-commands:
       # Allow all incoming and outgoing traffic in UFW
       - curtin in-target --target=/target -- echo "Configuring UFW to allow all traffic"
       - curtin in-target --target=/target -- /usr/sbin/ufw allow in on any
       - curtin in-target --target=/target -- /usr/sbin/ufw allow out on any
       - curtin in-target --target=/target -- /usr/sbin/ufw reload

       # System config
       - curtin in-target --target=/target -- echo "Disabling swap"
       - curtin in-target --target=/target -- /usr/sbin/swapoff -a
       - curtin in-target --target=/target -- sed -i '/ swap / s/^/#/' /etc/fstab
       - curtin in-target --target=/target -- rm -vf /swap.img

       # K8s Steps
       - curtin in-target --target=/target -- echo "Creating keyring directory if it does not exist"
       - curtin in-target --target=/target -- mkdir -p -m 755 /etc/apt/keyrings

       - curtin in-target --target=/target -- echo "Downloading Kubernetes signing key"
       - curtin in-target --target=/target -- sh -c '/usr/bin/curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.31/deb/Release.key | gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg'

       - curtin in-target --target=/target -- echo "Adding Kubernetes apt repository for v1.31"
       - curtin in-target --target=/target -- sh -c "echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.31/deb/ /' | tee /etc/apt/sources.list.d/kubernetes.list"

       - curtin in-target --target=/target -- echo "Updating package index and installing Kubernetes components"
       - curtin in-target --target=/target -- apt-get update
       - curtin in-target --target=/target -- apt-get install -y kubelet kubeadm kubectl
       - curtin in-target --target=/target -- apt-mark hold kubelet kubeadm kubectl

       - curtin in-target --target=/target -- echo "Enabling kubelet service"
       - curtin in-target --target=/target -- systemctl enable kubelet
       - curtin in-target --target=/target -- echo "Loading required kernel modules"

       - |
         curtin in-target --target=/target -- sh -c 'cat <<EOF > /etc/modules-load.d/k8s.conf
         overlay
         br_netfilter
         EOF'

       - curtin in-target --target=/target -- echo "Setting sysctl parameters for Kubernetes"
       - |
         curtin in-target --target=/target -- sh -c 'cat <<EOF | tee /etc/sysctl.d/kubernetes.conf
         net.bridge.bridge-nf-call-ip6tables = 1
         net.bridge.bridge-nf-call-iptables = 1
         net.ipv4.ip_forward = 1
         EOF'
       - curtin in-target --target=/target -- /usr/sbin/sysctl --system

       # Containerd stuff
       - curtin in-target --target=/target -- mkdir -p -m 755 /etc/containerd
       - curtin in-target --target=/target -- sh -c 'containerd config default | tee /etc/containerd/config.toml'
       - curtin in-target --target=/target -- systemctl enable containerd

       # Write netplan configuration file
       - curtin in-target --target=/target -- echo "Setting up netplan"
       - |
         curtin in-target --target=/target -- sh -c "cat <<EOF > /etc/netplan/01-netcfg.yaml
         network:
           version: 2
           ethernets:
             ens3:
               dhcp4: false
               dhcp6: false
               addresses:
                 - 192.168.2.2/24  # Static IP address
               routes:
                 - to: 0.0.0.0/0
                   via: 192.168.2.1  # Gateway
               nameservers:
                 addresses:
                   - 8.8.8.8  # Google DNS
                   - 8.8.4.4
         EOF"
       - curtin in-target --target=/target -- netplan apply

       # Add root user ssh keys
       - curtin in-target --target=/target -- mkdir -p -m 700 /root/.ssh
       - curtin in-target --target=/target -- sh -c 'echo $(cat ./ed25519_vm_key) > /root/.ssh/authorized_keys'

       # Allow password ssh
       - curtin in-target --target=/target -- sh -c 'echo "PermitRootLogin yes" >> /etc/ssh/sshd_config'

     user-data:
       disable_root: false
       chpasswd:
         expire: false
         list:
           - "root:password"

     ssh:
       install-server: true
       authorized-keys:
         - $(cat ./ed25519_vm_key)
       allow-pw: true

     identity:
       hostname: master-node
       username: user
       password: "$LOGIN_PASSWORD"
   YAML
   touch ./www/meta-data
   cd ./www
   	{
   		qemu-system-x86_64 -no-reboot -m 64G \
   			-cdrom ../ubuntu-24.10-live-server-amd64.iso \
   			-boot order=d \
   			-drive file=../master_image_file,format=raw,cache=none,if=virtio \
   			-kernel ../mnt_iso/casper/vmlinuz \
   			-initrd ../mnt_iso/casper/initrd \
   			-append 'autoinstall ds=nocloud-net;s=http://_gateway:3003/';
   		printf "Completed master setup\n\n";
   		kill $(pgrep --full "python3 -m http.server 3003");
   	} &
   cd ..

   wait
   sudo umount ./mnt_iso
  1. Prepare the nfs server that will be user for the storage class using setup_nfs_server.sh:
    #!/usr/bin/sh
    
    mkdir -pv /srv/nfs/milvus-storage
    chmod 777 /srv/nfs/milvus-storage
    chown nobody:nogroup /srv/nfs/milvus-storage
    echo '/srv/nfs/milvus-storage 192.168.2.0/24(rw,sync,no_subtree_check)' >> /etc/exports
    exportfs -a
  2. Start the cluster using run_cluster.sh:
    #!/usr/bin/sh
    
    # Setup the network
    
    # Create the tap devices
    tunctl -u user -t kmtap
    tunctl -u user -t kwtap1
    tunctl -u user -t kwtap2
    tunctl -u user -t kwtap3
    tunctl -u user -t kwtap4
    tunctl -u user -t kwtap5
    tunctl -u user -t kwtap6
    
    # Bring up the tap devices
    ip link set kmtap up
    ip link set kwtap1 up
    ip link set kwtap2 up
    ip link set kwtap3 up
    ip link set kwtap4 up
    ip link set kwtap5 up
    ip link set kwtap6 up
    
    # Create the bridge to link the tap devices
    brctl addbr kbr0
    brctl addif kbr0 kmtap
    brctl addif kbr0 kwtap1
    brctl addif kbr0 kwtap2
    brctl addif kbr0 kwtap3
    brctl addif kbr0 kwtap4
    brctl addif kbr0 kwtap5
    brctl addif kbr0 kwtap6
    
    # Bring up the bridge
    ip link set kbr0 up
    
    # Set an ip adddress to the bridge
    ifconfig kbr0 192.168.2.1 netmask 255.255.255.00 broadcast 192.168.2.255
    
    # Have a nat to allow the guest oses to acces the internet
    iptables -t nat -A POSTROUTING -o enp36s0f1 -j MASQUERADE
    iptables -I DOCKER-USER 1 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
    iptables -I DOCKER-USER 2 -i kbr0 -o enp36s0f1 -j ACCEPT
    
    # setup port forwarding
    sysctl -w net.ipv4.ip_forward=1
    
    for i in $(seq 6); do
    	qemu-system-x86_64 -no-reboot -m 64G -smp cpus=18 -name “worker-$i” \
    		-nographic -serial none -monitor none \
    		-netdev tap,id=network0,ifname=kwtap$i,script=no,downscript=no \
    		-device virtio-net,netdev=network0,mac=de:ad:be:ef:00:0$i  \
    		-drive file=./worker_image_file_$i,format=raw,cache=none,if=virtio &
    done
    
    qemu-system-x86_64 -no-reboot -m 64G -smp cpus=12  -name “master” \
    	-nographic -serial none -monitor none \
    	-netdev tap,id=network0,ifname=kmtap,script=no,downscript=no \
    	-device virtio-net,netdev=network0,mac=de:ad:be:ef:00:00  \
    	-drive file=./master_image_file,format=raw,cache=none,if=virtio &
    
    # wait for all the machines to turn off
    wait
    
    # Disable the network
    
    # disable port forwarding
    sysctl -w net.ipv4.ip_forward=0
    
    # Disable the nat
    iptables -t nat -D POSTROUTING -o enp36s0f1 -j MASQUERADE
    iptables -D DOCKER-USER -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
    iptables -D DOCKER-USER -i kbr0 -o enp36s0f1 -j ACCEPT
    
    # Bring down the bridge
    ifconfig kbr0 down
    
    # Delete the bridge
    brctl delbr kbr0
    
    # Delete the tap devices
    tunctl -d kmtap
    tunctl -d kwtap1
    tunctl -d kwtap2
    tunctl -d kwtap3
    tunctl -d kwtap4
    tunctl -d kwtap5
    tunctl -d kwtap6
  3. Finish the setup of the k8s cluster using kubeadm setup_cluster.sh:
    #!/usr/bin/sh
    # Remove the host keys form known_hosts
    for i in $(seq 2 8); do
    	ssh-keygen -R 192.168.2.$i
    	ssh -o StrictHostKeyChecking=no -i ./ed25519_vm_key "[email protected].$i" "apt update && apt install -y nfs-common"
    done
    
    # Define the pod network CIDR
    POD_NETWORK_CIDR="10.244.0.0/16"
    
    # Initialize the Kubernetes cluster with the specified pod network CIDR
    JOIN_CMD="$(ssh -o StrictHostKeyChecking=no -i ./ed25519_vm_key "[email protected]" "kubeadm init --pod-network-cidr=${POD_NETWORK_CIDR}" 2>&1 | tail -n 2)"
    echo "join command = '$JOIN_CMD'"
    
    # Wait for the master node to initialize (optional)
    sleep 30  # Adjust sleep time as needed
    
    # Join worker nodes to the cluster
    for i in $(seq 3 8); do
        ssh -o StrictHostKeyChecking=no -i ./ed25519_vm_key "[email protected].$i" "$JOIN_CMD"
    done
    
    # Copy admin.conf to the local machine for kubectl access
    sshpass -p password rsync -Pavhrz [email protected]:/etc/kubernetes/admin.conf .
    
    # Apply the Flannel CNI configuration
    kubectl --kubeconfig=./admin.conf apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
    
    # # Optionally, restart the Flannel DaemonSet to ensure it picks up the correct settings
    # kubectl delete pods -n kube-flannel --all
    
    # get node status
    kubectl --kubeconfig=./admin.conf describe nodes
    kubectl --kubeconfig=./admin.conf get pods -n kube-flannel
  4. Setup the milvus config operator-milvus-config.yaml:
    # This is a sample to deploy a milvus cluster in milvus-operator's default configurations.
    apiVersion: milvus.io/v1beta1
    kind: Milvus
    metadata:
      name: my-release
      namespace: milvus-workshop
      labels:
        app: milvus
    spec:
      mode: cluster
      dependencies:
        storage:
          external: false
          type: "MinIO"
          inCluster:
            deletionPolicy: Delete
            pvcDeletion: true
            values:
              persistence:
                size: 200Gi
        etcd:
          inCluster:
            deletionPolicy: Delete
            pvcDeletion: true
            values:
              autoCompactionMode: revision
              autoCompactionRetention: "1000"
              extraEnvVars:
              - name: ETCD_QUOTA_BACKEND_BYTES
                value: "4294967296"
              - name: ETCD_HEARTBEAT_INTERVAL
                value: "500"
              - name: ETCD_ELECTION_TIMEOUT
                value: "25000"
              - name: ETCD_SNAPSHOT_COUNT
                value: "10000"
              - name: ETCD_ENABLE_PPROF
                value: "true"
              persistence:
                accessMode: ReadWriteOnce
                enabled: true
                size: 30Gi   #SSD Required
                storageClass:
              replicaCount: 3
              resources:
                limits:
                  cpu: 2
                  memory: 4Gi
                requests:
                  cpu: 2
                  memory: 4Gi
        pulsar:
          inCluster:
            deletionPolicy: Delete
            pvcDeletion: true
            values:
              components:
                autorecovery: false
              zookeeper:
                volumes:
                  data:
                    size: 20Gi
              bookkeeper:
                volumes:
                  useSingleCommonVolume: true
                  common:
                    size: 50Gi
              proxy:
                resources:
                  requests:
                    memory: 10Gi
                    cpu: 4
      components:
        resources:
          limits:
            cpu: 4
            memory: 8Gi
          requests:
            cpu: 4
            memory: 8Gi
        dataCoord:
          replicas: 1
        dataNode:
          replicas: 1
        indexCoord:
          replicas: 1
        indexNode:
          replicas: 1
        proxy:
          replicas: 1
          serviceType: LoadBalancer
        queryCoord:
          replicas: 1
        queryNode:
          replicas: 1
        rootCoord:
          replicas: 1
      config: {}
  5. Finally install and setup the milvus cluster using install_milvus.sh
    #!/usr/bin/sh
    
    # Setup nfs storage class
    helm repo add nfs-subdir-external-provisioner https://kubernetes-sigs.github.io/nfs-subdir-external-provisioner/
    helm install nfs-subdir-external-provisioner nfs-subdir-external-provisioner/nfs-subdir-external-provisioner --set nfs.server=192.168.2.1 --set nfs.path=/srv/nfs/milvus-storage
    
    kubectl patch storageclass nfs-client -p '{"metadata": {"annotations": {"storageclass.kubernetes.io/is-default-class":"true"}}}'
    
    # install cert manager
    kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.5.3/cert-manager.yaml
    
    # install milvus operator
    helm install milvus-operator -n milvus-operator --create-namespace --wait --wait-for-jobs https://github.com/zilliztech/milvus-operator/releases/download/v1.0.1/milvus-operator-1.0.1.tgz
    
    # setup milvus cluster
    kubectl create namespace milvus-workshop
    kubectl apply -f operator-milvus-config.yaml -n milvus-workshop


### Milvus Log

using the `export-milvus-log.sh` the logs are here: 
[logs.tar.gz](https://github.com/user-attachments/files/17633965/logs.tar.gz)


### Anything else?

the logs show the same error as #28242 
@SalBakraa SalBakraa added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 5, 2024
@xiaofan-luan
Copy link
Collaborator

missing source yaml

@LoveEachDay could you help on investigating on this?

@xiaofan-luan
Copy link
Collaborator

meanwhile @SalBakraa could you verify which document you are actually following?

did you put the right milvus config file

@yanliang567
Copy link
Contributor

/assign @LoveEachDay
/unassign

@yanliang567 yanliang567 added help wanted Extra attention is needed and removed kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 6, 2024
@LoveEachDay
Copy link
Contributor

@SalBakraa Could you show us the configmap named like my-release ? And also provide the output of kubectl describe milvus my-release here.

@SalBakraa
Copy link
Author

meanwhile @SalBakraa could you verify which document you are actually following?

did you put the right milvus config file

I followed the step on https://milvus.io/docs/install_cluster-milvusoperator.md for installing milvus. For the milvus config I tried different configs. I tried the default cluster config, I also tried the one generate from the milvus sizing tool. The final config I settled on is a an edited version of the one generated by the milvus sizing tool.

@SalBakraa Could you show us the configmap named like my-release ? And also provide the output of kubectl describe milvus my-release here.

$ kubectl describe configmaps my-release -n milvus-workshop
Name:         my-release
Namespace:    milvus-workshop
Labels:       app.kubernetes.io/instance=my-release
              app.kubernetes.io/managed-by=milvus-operator
              app.kubernetes.io/name=milvus
Annotations:  <none>

Data
====
user.yaml:
----
dataCoord:
  enableActiveStandby: true
etcd:
  endpoints:
  - my-release-etcd.milvus-workshop:2379
  rootPath: my-release
indexCoord:
  enableActiveStandby: true
messageQueue: pulsar
minio:
  accessKeyID: minioadmin
  address: my-release-minio.milvus-workshop
  bucketName: my-release
  port: 9000
  secretAccessKey: minioadmin
mq:
  type: pulsar
msgChannel:
  chanNamePrefix:
    cluster: my-release
pulsar:
  address: my-release-pulsar-proxy.milvus-workshop
  port: 6650
queryCoord:
  enableActiveStandby: true
rootCoord:
  enableActiveStandby: true



BinaryData
====

Events:  <none>
$ kubectl describe milvus my-release -n milvus-workshop
Name:         my-release
Namespace:    milvus-workshop
Labels:       app=milvus
              milvus.io/operator-version=1.0.1
Annotations:  milvus.io/dependency-values-merged: true
              milvus.io/pod-service-label-added: true
              milvus.io/querynode-current-group-id: 0
API Version:  milvus.io/v1beta1
Kind:         Milvus
Metadata:
  Creation Timestamp:  2024-11-05T13:28:57Z
  Finalizers:
    milvus.milvus.io/finalizer
  Generation:        1
  Resource Version:  76131
  UID:               ac3976e4-ed4c-4eaa-80f0-040d52c68caf
Spec:
  Components:
    Data Coord:
      Paused:    false
      Replicas:  1
    Data Node:
      Paused:               false
      Replicas:             1
    Disable Metric:         false
    Enable Rolling Update:  true
    Image:                  milvusdb/milvus:v2.4.6
    Image Update Mode:      rollingUpgrade
    Index Coord:
      Paused:    false
      Replicas:  1
    Index Node:
      Paused:         false
      Replicas:       1
    Metric Interval:
    Paused:           false
    Proxy:
      Paused:        false
      Replicas:      1
      Service Type:  LoadBalancer
    Query Coord:
      Paused:    false
      Replicas:  1
    Query Node:
      Paused:    false
      Replicas:  1
    Resources:
      Limits:
        Cpu:     4
        Memory:  8Gi
      Requests:
        Cpu:       4
        Memory:    8Gi
    Rolling Mode:  2
    Root Coord:
      Paused:    false
      Replicas:  1
    Standalone:
      Paused:        false
      Replicas:      0
      Service Type:  ClusterIP
  Config:
    Data Coord:
      Enable Active Standby:  true
    Index Coord:
      Enable Active Standby:  true
    Query Coord:
      Enable Active Standby:  true
    Root Coord:
      Enable Active Standby:  true
  Dependencies:
    Custom Msg Stream:  <nil>
    Etcd:
      Endpoints:
        my-release-etcd.milvus-workshop:2379
      External:  false
      In Cluster:
        Deletion Policy:  Delete
        Pvc Deletion:     true
        Values:
          Auth:
            Rbac:
              Enabled:                false
          Auto Compaction Mode:       revision
          Auto Compaction Retention:  1000
          Enabled:                    true
          Extra Env Vars:
            Name:   ETCD_QUOTA_BACKEND_BYTES
            Value:  4294967296
            Name:   ETCD_HEARTBEAT_INTERVAL
            Value:  500
            Name:   ETCD_ELECTION_TIMEOUT
            Value:  25000
            Name:   ETCD_SNAPSHOT_COUNT
            Value:  10000
            Name:   ETCD_ENABLE_PPROF
            Value:  true
          Image:
            Pull Policy:  IfNotPresent
            Repository:   milvusdb/etcd
            Tag:          3.5.5-r4
          Liveness Probe:
            Enabled:          true
            Timeout Seconds:  10
          Name:               etcd
          Pdb:
            Create:  false
          Persistence:
            Access Mode:    ReadWriteOnce
            Enabled:        true
            Size:           30Gi
            Storage Class:  <nil>
          Readiness Probe:
            Enabled:          true
            Period Seconds:   20
            Timeout Seconds:  10
          Replica Count:      3
          Resources:
            Limits:
              Cpu:     2
              Memory:  4Gi
            Requests:
              Cpu:     2
              Memory:  4Gi
          Service:
            Peer Port:  2380
            Port:       2379
            Type:       ClusterIP
    Kafka:
      External:       false
    Msg Stream Type:  pulsar
    Natsmq:
      Persistence:
        Persistent Volume Claim:
          Spec:  <nil>
    Pulsar:
      Endpoint:  my-release-pulsar-proxy.milvus-workshop:6650
      External:  false
      In Cluster:
        Deletion Policy:  Delete
        Pvc Deletion:     true
        Values:
          Affinity:
            anti_affinity:  false
          Autorecovery:
            Resources:
              Requests:
                Cpu:     1
                Memory:  512Mi
          Bookkeeper:
            Config Data:
              PULSAR_GC:  -Dio.netty.leakDetectionLevel=disabled -Dio.netty.recycler.linkCapacity=1024 -XX:+UseG1GC -XX:MaxGCPauseMillis=10 -XX:+ParallelRefProcEnabled -XX:+UnlockExperimentalVMOptions -XX:+DoEscapeAnalysis -XX:ParallelGCThreads=32 -XX:ConcGCThreads=32 -XX:G1NewSizePercent=50 -XX:+DisableExplicitGC -XX:-ResizePLAB -XX:+ExitOnOutOfMemoryError -XX:+PerfDisableSharedMem -XX:+PrintGCDetails

              PULSAR_MEM:  -Xms4096m -Xmx4096m -XX:MaxDirectMemorySize=8192m

              Netty Max Frame Size Bytes:  104867840
            Pdb:
              Use Policy:   false
            Replica Count:  3
            Resources:
              Requests:
                Cpu:     1
                Memory:  2048Mi
            Volumes:
              Common:
                Size:  50Gi
              Journal:
                Name:  journal
                Size:  100Gi
              Ledgers:
                Name:                    ledgers
                Size:                    200Gi
              Use Single Common Volume:  true
          Broker:
            Component:  broker
            Config Data:
              PULSAR_GC:  -Dio.netty.leakDetectionLevel=disabled -Dio.netty.recycler.linkCapacity=1024 -XX:+ParallelRefProcEnabled -XX:+UnlockExperimentalVMOptions -XX:+DoEscapeAnalysis -XX:ParallelGCThreads=32 -XX:ConcGCThreads=32 -XX:G1NewSizePercent=50 -XX:+DisableExplicitGC -XX:-ResizePLAB -XX:+ExitOnOutOfMemoryError

              PULSAR_MEM:  -Xms4096m -Xmx4096m -XX:MaxDirectMemorySize=8192m

              Backlog Quota Default Limit GB:          8
              Backlog Quota Default Retention Policy:  producer_exception
              Default Retention Size In MB:            -1
              Default Retention Time In Minutes:       10080
              Max Message Size:                        104857600
              Subscription Expiration Time Minutes:    3
              Ttl Duration Default In Seconds:         259200
            Pdb:
              Use Policy:  false
            Pod Monitor:
              Enabled:      false
            Replica Count:  1
            Resources:
              Requests:
                Cpu:     1.5
                Memory:  4096Mi
          Components:
            Autorecovery:     false
            Bookkeeper:       true
            Broker:           true
            Functions:        false
            Proxy:            true
            pulsar_manager:   false
            Toolset:          false
            Zookeeper:        true
          Enabled:            true
          Fullname Override:
          Images:
            Autorecovery:
              Pull Policy:  IfNotPresent
              Repository:   apachepulsar/pulsar
              Tag:          2.9.5
            Bookie:
              Pull Policy:  IfNotPresent
              Repository:   apachepulsar/pulsar
              Tag:          2.9.5
            Broker:
              Pull Policy:  IfNotPresent
              Repository:   apachepulsar/pulsar
              Tag:          2.9.5
            Proxy:
              Pull Policy:  IfNotPresent
              Repository:   apachepulsar/pulsar
              Tag:          2.9.5
            pulsar_manager:
              Pull Policy:  IfNotPresent
              Repository:   apachepulsar/pulsar-manager
              Tag:          v0.1.0
            Zookeeper:
              Pull Policy:   IfNotPresent
              Repository:    apachepulsar/pulsar
              Tag:           2.9.5
          Max Message Size:  5242880
          Monitoring:
            alert_manager:  false
            Grafana:        false
            node_exporter:  false
            Prometheus:     false
          Name:             pulsar
          Persistence:      true
          Proxy:
            Config Data:
              PULSAR_GC:  -XX:MaxDirectMemorySize=2048m

              PULSAR_MEM:  -Xms2048m -Xmx2048m

              Http Num Threads:  100
            Pdb:
              Use Policy:  false
            Pod Monitor:
              Enabled:  false
            Ports:
              Http:         8080
              Pulsar:       6650
            Replica Count:  1
            Resources:
              Requests:
                Cpu:     4
                Memory:  10Gi
            Service:
              Type:  ClusterIP
          pulsar_manager:
            Service:
              Type:  ClusterIP
          pulsar_metadata:
            Component:  pulsar-init
            Image:
              Repository:  apachepulsar/pulsar
              Tag:         2.9.5
          Rbac:
            Enabled:             false
            limit_to_namespace:  true
            Psp:                 false
          Zookeeper:
            Config Data:
              PULSAR_GC:  -Dcom.sun.management.jmxremote -Djute.maxbuffer=10485760 -XX:+ParallelRefProcEnabled -XX:+UnlockExperimentalVMOptions -XX:+DoEscapeAnalysis -XX:+DisableExplicitGC -XX:+PerfDisableSharedMem -Dzookeeper.forceSync=no

              PULSAR_MEM:  -Xms1024m -Xmx1024m

            Pdb:
              Use Policy:  false
            Resources:
              Requests:
                Cpu:     0.3
                Memory:  1024Mi
            Volumes:
              Data:
                Size:  20Gi
    Rocksmq:
      Persistence:
        Persistent Volume Claim:
          Spec:  <nil>
    Storage:
      Endpoint:  my-release-minio.milvus-workshop:9000
      External:  false
      In Cluster:
        Deletion Policy:  Delete
        Pvc Deletion:     true
        Values:
          Access Key:       minioadmin
          Bucket Name:      milvus-bucket
          Enabled:          true
          Existing Secret:
          Iam Endpoint:
          Image:
            Pull Policy:  IfNotPresent
            Tag:          RELEASE.2023-03-20T20-16-18Z
          Liveness Probe:
            Enabled:                true
            Failure Threshold:      5
            Initial Delay Seconds:  5
            Period Seconds:         5
            Success Threshold:      1
            Timeout Seconds:        5
          Mode:                     distributed
          Name:                     minio
          Persistence:
            Access Mode:     ReadWriteOnce
            Enabled:         true
            Existing Claim:
            Size:            200Gi
            Storage Class:   <nil>
          Pod Disruption Budget:
            Enabled:  false
          Readiness Probe:
            Enabled:                true
            Failure Threshold:      5
            Initial Delay Seconds:  5
            Period Seconds:         5
            Success Threshold:      1
            Timeout Seconds:        1
          Region:
          Resources:
            Requests:
              Memory:  2Gi
          Root Path:   file
          Secret Key:  minioadmin
          Service:
            Port:  9000
            Type:  ClusterIP
          Startup Probe:
            Enabled:                true
            Failure Threshold:      60
            Initial Delay Seconds:  0
            Period Seconds:         10
            Success Threshold:      1
            Timeout Seconds:        5
          Use IAM:                  false
          Use Virtual Host:         false
      Secret Ref:                   my-release-minio
      Type:                         MinIO
  Hook Config:                      <nil>
  Mode:                             cluster
Status:
  Components Deploy Status:
    Datacoord:
      Generation:  1
      Image:       milvusdb/milvus:v2.4.6
      Status:
        Conditions:
          Last Transition Time:  2024-11-05T13:35:14Z
          Last Update Time:      2024-11-05T13:35:14Z
          Message:               Deployment does not have minimum availability.
          Reason:                MinimumReplicasUnavailable
          Status:                False
          Type:                  Available
          Last Transition Time:  2024-11-05T13:35:13Z
          Last Update Time:      2024-11-05T13:35:14Z
          Message:               ReplicaSet "my-release-milvus-datacoord-d8c598b86" is progressing.
          Reason:                ReplicaSetUpdated
          Status:                True
          Type:                  Progressing
        Observed Generation:     1
        Replicas:                1
        Unavailable Replicas:    1
        Updated Replicas:        1
    Datanode:
      Generation:  1
      Image:       milvusdb/milvus:v2.4.6
      Status:
        Conditions:
          Last Transition Time:  2024-11-05T13:35:14Z
          Last Update Time:      2024-11-05T13:35:14Z
          Message:               Deployment does not have minimum availability.
          Reason:                MinimumReplicasUnavailable
          Status:                False
          Type:                  Available
          Last Transition Time:  2024-11-05T13:35:13Z
          Last Update Time:      2024-11-05T13:35:14Z
          Message:               ReplicaSet "my-release-milvus-datanode-6c9b95645f" is progressing.
          Reason:                ReplicaSetUpdated
          Status:                True
          Type:                  Progressing
        Observed Generation:     1
        Replicas:                1
        Unavailable Replicas:    1
        Updated Replicas:        1
    Indexcoord:
      Generation:  1
      Image:       milvusdb/milvus:v2.4.6
      Status:
        Conditions:
          Last Transition Time:  2024-11-05T13:35:14Z
          Last Update Time:      2024-11-05T13:35:14Z
          Message:               Deployment does not have minimum availability.
          Reason:                MinimumReplicasUnavailable
          Status:                False
          Type:                  Available
          Last Transition Time:  2024-11-05T13:35:13Z
          Last Update Time:      2024-11-05T13:35:14Z
          Message:               ReplicaSet "my-release-milvus-indexcoord-694758d846" is progressing.
          Reason:                ReplicaSetUpdated
          Status:                True
          Type:                  Progressing
        Observed Generation:     1
        Replicas:                1
        Unavailable Replicas:    1
        Updated Replicas:        1
    Indexnode:
      Generation:  1
      Image:       milvusdb/milvus:v2.4.6
      Status:
        Conditions:
          Last Transition Time:  2024-11-05T13:35:14Z
          Last Update Time:      2024-11-05T13:35:14Z
          Message:               Deployment does not have minimum availability.
          Reason:                MinimumReplicasUnavailable
          Status:                False
          Type:                  Available
          Last Transition Time:  2024-11-05T13:35:13Z
          Last Update Time:      2024-11-05T13:35:14Z
          Message:               ReplicaSet "my-release-milvus-indexnode-87c6cd97" is progressing.
          Reason:                ReplicaSetUpdated
          Status:                True
          Type:                  Progressing
        Observed Generation:     1
        Replicas:                1
        Unavailable Replicas:    1
        Updated Replicas:        1
    Proxy:
      Generation:  1
      Image:       milvusdb/milvus:v2.4.6
      Status:
        Conditions:
          Last Transition Time:  2024-11-05T13:35:14Z
          Last Update Time:      2024-11-05T13:35:14Z
          Message:               Deployment does not have minimum availability.
          Reason:                MinimumReplicasUnavailable
          Status:                False
          Type:                  Available
          Last Transition Time:  2024-11-05T13:35:13Z
          Last Update Time:      2024-11-05T13:35:14Z
          Message:               ReplicaSet "my-release-milvus-proxy-6cc57f7b7d" is progressing.
          Reason:                ReplicaSetUpdated
          Status:                True
          Type:                  Progressing
        Observed Generation:     1
        Replicas:                1
        Unavailable Replicas:    1
        Updated Replicas:        1
    Querycoord:
      Generation:  1
      Image:       milvusdb/milvus:v2.4.6
      Status:
        Conditions:
          Last Transition Time:  2024-11-05T13:35:13Z
          Last Update Time:      2024-11-05T13:35:13Z
          Message:               Deployment does not have minimum availability.
          Reason:                MinimumReplicasUnavailable
          Status:                False
          Type:                  Available
          Last Transition Time:  2024-11-05T13:35:13Z
          Last Update Time:      2024-11-05T13:35:14Z
          Message:               ReplicaSet "my-release-milvus-querycoord-9b84c6974" is progressing.
          Reason:                ReplicaSetUpdated
          Status:                True
          Type:                  Progressing
        Observed Generation:     1
        Replicas:                1
        Unavailable Replicas:    1
        Updated Replicas:        1
    Querynode:
      Generation:  2
      Image:       milvusdb/milvus:v2.4.6
      Status:
        Conditions:
          Last Transition Time:  2024-11-05T13:35:15Z
          Last Update Time:      2024-11-05T13:35:15Z
          Message:               ReplicaSet "my-release-milvus-querynode-0-7485d47ffd" has successfully progressed.
          Reason:                NewReplicaSetAvailable
          Status:                True
          Type:                  Progressing
          Last Transition Time:  2024-11-05T13:35:16Z
          Last Update Time:      2024-11-05T13:35:16Z
          Message:               Deployment does not have minimum availability.
          Reason:                MinimumReplicasUnavailable
          Status:                False
          Type:                  Available
        Observed Generation:     2
        Replicas:                1
        Unavailable Replicas:    1
        Updated Replicas:        1
    Rootcoord:
      Generation:  1
      Image:       milvusdb/milvus:v2.4.6
      Status:
        Conditions:
          Last Transition Time:  2024-11-05T13:35:13Z
          Last Update Time:      2024-11-05T13:35:13Z
          Message:               Deployment does not have minimum availability.
          Reason:                MinimumReplicasUnavailable
          Status:                False
          Type:                  Available
          Last Transition Time:  2024-11-05T13:35:13Z
          Last Update Time:      2024-11-05T13:35:14Z
          Message:               ReplicaSet "my-release-milvus-rootcoord-6746987cc4" is progressing.
          Reason:                ReplicaSetUpdated
          Status:                True
          Type:                  Progressing
        Observed Generation:     1
        Replicas:                1
        Unavailable Replicas:    1
        Updated Replicas:        1
    Standalone:
      Generation:  2
      Image:       milvusdb/milvus:v2.4.6
      Status:
        Conditions:
          Last Transition Time:  2024-11-05T13:35:14Z
          Last Update Time:      2024-11-05T13:35:14Z
          Message:               Deployment has minimum availability.
          Reason:                MinimumReplicasAvailable
          Status:                True
          Type:                  Available
          Last Transition Time:  2024-11-05T13:35:13Z
          Last Update Time:      2024-11-05T13:35:15Z
          Message:               ReplicaSet "my-release-milvus-standalone-758bdbb75c" has successfully progressed.
          Reason:                NewReplicaSetAvailable
          Status:                True
          Type:                  Progressing
        Observed Generation:     2
  Conditions:
    Last Transition Time:  2024-11-05T13:30:34Z
    Message:               Etcd endpoints is healthy
    Reason:                EtcdReady
    Status:                True
    Type:                  EtcdReady
    Last Transition Time:  2024-11-05T13:30:10Z
    Reason:                StorageReady
    Status:                True
    Type:                  StorageReady
    Last Transition Time:  2024-11-05T13:35:12Z
    Message:               MsgStream is ready
    Reason:                MsgStreamReady
    Status:                True
    Type:                  MsgStreamReady
    Last Transition Time:  2024-11-05T15:40:06Z
    Message:               [rootcoord datacoord querycoord indexcoord datanode querynode indexnode proxy] not ready, detail: component[rootcoord]: pod[my-release-milvus-rootcoord-6746987cc4-d4xdp]: container[rootcoord]: restartCount[29] lastState[terminated] reason[Error]
    Reason:                MilvusComponentNotHealthy
    Status:                False
    Type:                  MilvusReady
    Last Transition Time:  2024-11-05T13:35:34Z
    Message:               Milvus components[rootcoord,datacoord,querycoord,indexcoord,datanode,indexnode,proxy] are updating
    Reason:                MilvusComponentsUpdating
    Status:                False
    Type:                  MilvusUpdated
  Ingress:
    Load Balancer:
  Observed Generation:  1
  Replicas:
  Rolling Mode Version:  2
  Status:                Pending
Events:                  <none>

@LoveEachDay
Copy link
Contributor

@SalBakraa Could you log in to the worker node where the Milvus pod is running and check the CPU instructions using the following command?

lscpu | grep Flags

@SalBakraa
Copy link
Author

SalBakraa commented Nov 6, 2024

$ lscpu | grep Flags

Flags:                                fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx lm rep_good nopl cpuid extd_apicid pni cx16 hypervisor lahf_lm cmp_legacy svm 3dnowprefetch vmmcall

Here are the flags on the host machine:

Flags:                                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin brs arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm debug_swap

@SalBakraa
Copy link
Author

@LoveEachDay Thank you so much for the help. I only checked if the host CPU supported the instruction sets required by milvus; It didn't occur to me that VM CPU would be any different. Adding -enable-kvm -cpu host to the qemu command fixed the start up issue.

Now issue #34983 popped up and the querynode is crashing, but thankfully it looks like the fix is simple.

Again, thank you so much for your help in debugging this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants