Replicated Community

How to recover Rook-Ceph cluster when missing files under /var/lib/rook/exporter/

Dexter_Yan October 17, 2023, 3:15am 1

Once a Rook-Ceph cluster is running. It will have multiple files under /var/lib/rook/exporter/ of the host. If someone delete those files accidentally, it will cause some your pods under rook-ceph namespace in CrashLoopBackOff.

For example, you will have rook-ceph-mgr-b-xxx in crash, rook-ceph-osd-0-*** and “rook-ceph-mon-a-***” not ready.

When you use kubectl to describe those failling pods, you will find this error

admin_socket: exception getting command descriptions: [Errno 2] No such file or directory

You can recover those pods and missing files under /var/lib/rook/exporter/ by following instructions.

Recover Steps

You need to run kubectl get deployment -n rook-ceph to list all the rook-ceph-osd-* deployment.

For example, you have

rook-ceph-osd-0                             0/1     1            0           3h36m

Then you can run

kubectl scale deployment -n rook-ceph rook-ceph-osd-0 --replicas 0

If you have multiple osd nodes, please make sure those deployments rook-ceph-osd-* scaled to 0 first.

After that, you need to run kubectl get pod -n rook-ceph | grep rook-ceph-mon-a to get the running pod name of rook-ceph-mon-a.

Use the pod name from previous command to restart the rook-ceph-mon-a pod

kubectl delete pod rook-ceph-mon-a-******** -n rook-ceph

After few minutes, you will be able to find both missing files of /var/lib/rook/exporter/ and pods back to normal.

Your osd deployment will scale by to the original number via rook ceph automatically.

Topic		Replies	Views	Activity
kURL: How can I delete rook-ceph when you have kots application installed How do I? kurl	0	250	April 28, 2023
How to safely resize a kURL cluster containing rook-ceph nodes Troubleshooting	0	1785	September 27, 2022
Managing nodes when the previous Rook version is in use might leave Ceph in an unhealthy state where mon pods are not rescheduled Supporting your customers kurl , rook	0	506	January 24, 2023
Troubleshoot OpenEBS to Rook-Ceph storage migrations Troubleshooting kurl , ceph , rook	0	19	April 17, 2025
Removing nodes from Kubernetes clusters Supporting your customers	1	3103	September 10, 2020