A known issue in Longhorn that can occur after a node is restarted in the cluster are volume attachment errors for Pods attempting to mount a PVC backed by Longhorn. These errors will be similar to:
volume pvc-xxxxxx has GET error for volume attachment csi-xxxxx: volumeattachments.storage.k8s.io “csi-xxxxxx” not found
or
MountVolume.WaitForAttach failed for volume “pvc-xxxxx” : volume pvc-xxxxxhas GET error for volume attachment csi-xxxxx: volumeattachments.storage.k8s.io “csi-xxxxx” is forbidden: User “system:node:ip-xxxxx” cannot get resource “volumeattachments” in API group “storage.k8s.io” at the cluster scope: no relationship found between node ‘ip-xxxxx’ and this object
When this happens, the workaround is to scale down the deployment or statefulset mounting the volume to 0 replicas, wait for the pod to fully terminate, and then scale the workload back up.
For all users of Replicated’s kURL installer, Replicated is recommend that you move away from Longhorn. For more details on why we’ve made this decision, see our blog post on the subject - Why Replicated has moved away from recommending Longhorn for kURL storage
Bumped this problem. After reboot of ec2 instance pods of kotsadm, kotsadm-rqlite and our application can’t attach volumes. Scaling of Statefulset doesn’t help.
In events of kotsadm pod:
Warning FailedMount 5m44s kubelet Unable to attach or mount volumes: unmounted volumes=[kotsadmdata], unattached volumes=[kotsadmdata backup host-cacerts kotsadm-web-scripts kubelet-client-cert kurl-proxy-kotsadm-tls-cert migrations kube-api-access-qld22]: timed out waiting for the condition
Warning FailedMount 3m27s kubelet Unable to attach or mount volumes: unmounted volumes=[kotsadmdata], unattached volumes=[kubelet-client-cert kurl-proxy-kotsadm-tls-cert migrations kube-api-access-qld22 kotsadmdata backup host-cacerts kotsadm-web-scripts]: timed out waiting for the condition
Warning FailedAttachVolume 90s (x11 over 7m47s) attachdetach-controller AttachVolume.Attach failed for volume "pvc-f44c135f-b394-4c3d-b2a9-a0bd957e8a29" : rpc error: code = Aborted desc = volume pvc-f44c135f-b394-4c3d-b2a9-a0bd957e8a29 is not ready for workloads
Warning FailedMount 73s kubelet Unable to attach or mount volumes: unmounted volumes=[kotsadmdata], unattached volumes=[backup host-cacerts kotsadm-web-scripts kubelet-client-cert kurl-proxy-kotsadm-tls-cert migrations kube-api-access-qld22 kotsadmdata]: timed out waiting for the condition
How can we resolve this issue?
Host OS is Ubuntu 20.04.5 LTS
Version of Longhorn is v1.2.4
@Vitaliy Deeply sorry that your question here was missed. If you’re still seeing these issues after you’ve scaled the workload down to 0 and then back up, then further investigation may be required in which it would be best to open a support issue at Replicated with a support bundle from the affected environment.
The problem could be caused by Longhorn provisioner. We are on the way of moving to OpenEBS.