What do I have to do to reboot a Kubernetes Replicated host?

Right now I’m testing replicated on a AWS EC2 instance and issuing a reboot causes the VM to hang indefinitely because of some ceph/rook volumes which cannot be unmounted (At least I think this is the problem. It’s not that easy to view into the VM once the reboot command is issued).

The node needs to be drained before reboot. After the successful drain, the node can be rebooted as usual.

Because kubectl drain command automatically marks the node as unschedulable (kubectl cordon effect), the node needs to be uncordoned once it’s back online.

Drain the node:

$ kubectl drain <node-name> --ignore-daemonsets --delete-local-data

Uncordon the node:

kubectl uncordon <node-name>

rook/ceph-common-issues.md at master · rook/rook · GitHub

Some systems may hang on reboot even after a kubectl drain. It is recommended to remove all pods that use a Rook-provisioned PVC prior to drain. For single-node installs the drain step is not required after scaling down the deployments.

replicatedctl app stop -a
kubectl scale deployment replicated replicated-premkit retraced-postgres --replicas=0

If using the Rook shared filesystem, also scale down the snapshotter deployment:

kubectl scale deployment replicated-shared-fs-snapshotter --replicas=0

Wait for those pods to terminate before running the drain command.

After reboot:

kubectl scale deployment replicated replicated-premkit retraced-postgres --replicas=1
kubectl scale deployment replicated-shared-fs-snapshotter --replicas=1 # if used
replicatedctl app start

The AKA reboot service now handles removing all pods with rook mounts during shutdown. Due to race conditions during shutdown this script may not complete. To prevent corruption, always run /opt/replicated/shutdown.sh manually prior to shutting down a node.