Removing nodes from Kubernetes clusters

In an HA Kubernetes cluster the REK operator will automatically purge failed nodes that have been unreachable for more than an hour. For master nodes this includes the following steps:

  1. Delete the Deployment resource for the OSD from the rook-ceph namespace
  2. Exec into the Rook operator pod and run the command ceph osd purge <id>
  3. Delete the Node resource
  4. Remove the node from the CephCluster resource named rook-ceph in the rook-ceph namespace unless storage is managed automatically with useAllNodes: true
  5. (Masters only) Connect to the etcd cluster and remove the peer
  6. (Masters only) Remove the apiEndpoint for the node from the kubeadm-config ConfigMap in the kube-system namespace

All of these steps can be performed manually if needed. For removing etcd peers, exec into one of the remaining etcd pods in the kube-system namespace. You can use the etcdctl CLI there with the certificates mounted in /etc/kubernetes/pki/etcd:

$ cd /etc/kubernetes/pki/etcd
$ etcdctl --endpoints=https://127.0.0.1:2379 --cert-file=healthcheck-client.crt --key-file=healthcheck-client.key --ca-file=ca.crt member list

a1316b56d7099abf: name=node-k7d4 peerURLs=https://10.128.0.124:2380 clientURLs=https://10.128.0.124:2379 isLeader=false
ab67f9f870c32907: name=node-wbf1 peerURLs=https://10.128.0.125:2380 clientURLs=https://10.128.0.125:2379 isLeader=false
d9228c5ac755a5c6: name=node-hrrm peerURLs=https://10.128.0.123:2380 clientURLs=https://10.128.0.123:2379 isLeader=true

$ etcdctl --endpoints=https://127.0.0.1:2379 --cert-file=healthcheck-client.crt --key-file=healthcheck-client.key --ca-file=ca.crt member remove a1316b56d7099ab

Newer versions of etcdctl use different command line arguments

--cert=healthcheck-client.crt --key=healthcheck-client.key --cacert=ca.crt