In an HA Kubernetes cluster the REK operator will automatically purge failed nodes that have been unreachable for more than an hour. For master nodes this includes the following steps:
- Delete the Deployment resource for the OSD from the
rook-ceph
namespace - Exec into the Rook operator pod and run the command
ceph osd purge <id>
- Delete the Node resource
- Remove the node from the
CephCluster
resource namedrook-ceph
in therook-ceph
namespace unless storage is managed automatically withuseAllNodes: true
- (Masters only) Connect to the etcd cluster and remove the peer
- (Masters only) Remove the apiEndpoint for the node from the
kubeadm-config
ConfigMap in thekube-system
namespace
All of these steps can be performed manually if needed. For removing etcd peers, exec into one of the remaining etcd pods in the kube-system
namespace. You can use the etcdctl
CLI there with the certificates mounted in /etc/kubernetes/pki/etcd
:
$ cd /etc/kubernetes/pki/etcd
$ etcdctl --endpoints=https://127.0.0.1:2379 --cert-file=healthcheck-client.crt --key-file=healthcheck-client.key --ca-file=ca.crt member list
a1316b56d7099abf: name=node-k7d4 peerURLs=https://10.128.0.124:2380 clientURLs=https://10.128.0.124:2379 isLeader=false
ab67f9f870c32907: name=node-wbf1 peerURLs=https://10.128.0.125:2380 clientURLs=https://10.128.0.125:2379 isLeader=false
d9228c5ac755a5c6: name=node-hrrm peerURLs=https://10.128.0.123:2380 clientURLs=https://10.128.0.123:2379 isLeader=true
$ etcdctl --endpoints=https://127.0.0.1:2379 --cert-file=healthcheck-client.crt --key-file=healthcheck-client.key --ca-file=ca.crt member remove a1316b56d7099ab