Once a Rook-Ceph cluster is running. It will have multiple files under
/var/lib/rook/exporter/ of the host. If someone delete those files accidentally, it will cause some your pods under rook-ceph namespace in
For example, you will have
rook-ceph-mgr-b-xxx in crash,
rook-ceph-osd-0-*** and “rook-ceph-mon-a-***” not ready.
When you use kubectl to describe those failling pods, you will find this error
admin_socket: exception getting command descriptions: [Errno 2] No such file or directory
You can recover those pods and missing files under
/var/lib/rook/exporter/ by following instructions.
You need to run
kubectl get deployment -n rook-ceph to list all the
For example, you have
rook-ceph-osd-0 0/1 1 0 3h36m
Then you can run
kubectl scale deployment -n rook-ceph rook-ceph-osd-0 --replicas 0
If you have multiple osd nodes, please make sure those deployments
rook-ceph-osd-* scaled to 0 first.
After that, you need to run
kubectl get pod -n rook-ceph | grep rook-ceph-mon-a to get the running pod name of
Use the pod name from previous command to restart the
kubectl delete pod rook-ceph-mon-a-******** -n rook-ceph
After few minutes, you will be able to find both missing files of
/var/lib/rook/exporter/ and pods back to normal.
Your osd deployment will scale by to the original number via rook ceph automatically.