Troubleshoot OpenEBS to Rook-Ceph storage migrations

GerardNguyen · April 17, 2025, 5:40am

In kURL installations, the migration from OpenEBS (non-HA) storage to Rook-Ceph (HA) is automatically handled by the kURL EKCO operator. This migration process is triggered when:

At least 3 nodes are available (meeting the minimumNodeCount requirement)
The user accepts the migration prompt during the third node join operation

If these conditions are met but the migration doesn’t complete successfully (e.g., no monitor or OSD pods are created), follow these troubleshooting steps:

Check if there’s error in Rook operator

kubectl logs deploy/rook-ceph-operator -n rook-ceph

Check if there’s error in EKCO operator

kubectl logs deploy/ekc-operator -n kurl | grep rook

Describe CephCluster CRD to check for error

kubectl describe cephclusters/rook-ceph -n rook-ceph

e.g. here by describing the CephCluster CR we can see that there’s reconcile error in mon quorum.

Events:
  Type     Reason           Age                 From                          Message
  ----     ------           ----                ----                          -------
  Warning  ReconcileFailed  24h (x10 over 45h)  rook-ceph-cluster-controller  failed to reconcile CephCluster "rook-ceph/rook-ceph". failed to reconcile cluster "rook-ceph": failed to configure local ceph cluster: failed to create cluster: failed to start ceph monitors: failed to start mon pods: failed to check mon quorum b: failed to wait for mon quorum: exceeded max retry count waiting for monitors to reach quorum
  Warning  ReconcileFailed  23h (x15 over 44h)  rook-ceph-cluster-controller  failed to reconcile CephCluster "rook-ceph/rook-ceph". failed to reconcile cluster "rook-ceph": failed to configure local ceph cluster: failed to create cluster: failed to start ceph monitors: failed to start mon pods: failed to check mon quorum a: failed to wait for mon quorum: exceeded max retry count waiting for monitors to reach quorum

From this we can get the rook-ceph mon pod, and troubleshoot further on why there are restarts

NAME                                            READY   STATUS    RESTARTS      AGE
rook-ceph-mon-b-b588cf9d6-skmn9                 2/2     Running   6 (23h ago)   45h
rook-ceph-mon-c-5b94dc7558-rhfbr                2/2     Running   6 (23h ago)   45h

In this case, describing the pod shows that there’s OOM event that prevent mon pods from reaching a quorum

    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137

We will have to modify the Rook add-on nodes setting with higher resources limit for the mon pods.

ref: CephCluster CRD - Rook Ceph Documentation

Topic		Replies	Views
Running migration from OpenEBS to Rook networking requirements, networking setup How does it work? kurl , rook	3	280	January 8, 2024
How to safely resize a kURL cluster containing rook-ceph nodes Troubleshooting	0	1804	September 27, 2022
Error running installation script (Rook 1.0.4 is not compatible with Kubernetes 1.20+) Supporting your customers kurl	34	1628	January 31, 2022
Managing nodes when the previous Rook version is in use might leave Ceph in an unhealthy state where mon pods are not rescheduled Supporting your customers kurl , rook	0	529	January 24, 2023
How to recover Rook-Ceph cluster when missing files under /var/lib/rook/exporter/ How do I? kurl	0	366	October 17, 2023

Troubleshoot OpenEBS to Rook-Ceph storage migrations

Related topics