Whenever the primary node is restarted replicated-shared-fs-snapshotter-* doesn’t come up and goes into Init:CrashLoopBackOff
we need to manually force delete the pod in order to fix it. Due to this the other pods which are dependent on the shared filesystem goes into Init:CrashLoopBackOff
due to race condition as mentioned in this.
We are running on DigitalOcean 4vCPU 8GB machine. The cluster has one primary and 2 worker nodes. The issue is observed in a single primary node as well.
We have a single primary node running on AWS. There it works out fine. How to solve this? Screen shot attached for reference.