Issue
An embedded cluster node (Replicated Embedded Cluster, k0s-based) fails to become Ready after the host has been powered off or unreachable during the time that certificates should rotate.
Symptoms
- The node shows
Readystatus asUnknownand conditions such asMemoryPressure,DiskPressure, andPIDPressureareUnknown:- Reason:
NodeStatusUnknown - Message:
Kubelet stopped posting node status.
- Reason:
- The kubelet logs show authentication errors similar to:
Unable to authenticate the request
err="x509: certificate has expired or is not yet valid: current time 2026-06-05T16:01:23Z is after 2026-05-14T18:09:59Z"
systemctl restart k0scontrollerdoes not resolve the issue.
Root cause
The kubelet’s client certificate is stored at:
/var/lib/embedded-cluster/k0s/kubelet/pki/kubelet-client-current.pem
This certificate has a 1-year lifetime. Under normal operation, the kubelet rotates the certificate before it expires. However, if the host is offline past the certificate expiry date, automatic rotation cannot occur, because the renewal request itself uses the expired certificate to authenticate.
Restarting k0scontroller only regenerates the server-side certificates. The kubelet authentication kubeconfig (/var/lib/embedded-cluster/k0s/kubelet.conf) and the expired kubelet client certificate are left unchanged.
Resolution
The following procedure regenerates the kubelet client certificate. All existing kubelet configuration and PKI files are backed up first, so they can be restored if needed.
1. Stop the k0s controller
sudo systemctl stop k0scontroller
2. Back up the expired kubelet configuration
sudo mv /var/lib/embedded-cluster/k0s/kubelet.conf /tmp/kubelet.conf.expired
sudo cp -a /var/lib/embedded-cluster/k0s/kubelet/pki /tmp/kubelet-pki.expired-bak
3. Remove the expired kubelet client certificates
sudo rm -f /var/lib/embedded-cluster/k0s/kubelet/pki/kubelet-client-*
4. Restart the k0s controller and monitor the logs
sudo systemctl start k0scontroller
sudo journalctl -u k0scontroller --no-pager -f
Expected behavior
- The node should return to
Readystatus within approximately 45 seconds. - All pods should be back to
1/1 Runningwithin 3 to 4 minutes. - A small number of pods may restart once while their service account tokens refresh. This is expected after a long outage and resolves automatically.
5. Verify clock synchronization before resuming normal operations
After the cluster is healthy, ensure the host’s system clock is accurate and NTP is enabled before returning the node to production use.
Verification
Run the following commands to confirm the cluster is healthy:
sudo kubectl get nodes
sudo kubectl get pods -A
The node should report Ready and all pods should be Running.
Applies to
- Replicated Embedded Cluster