Recovering an Embedded Cluster with an expired Kubelet client certificate

Xav_Paice · June 16, 2026, 8:21pm

Issue

An embedded cluster node (Replicated Embedded Cluster, k0s-based) fails to become Ready after the host has been powered off or unreachable during the time that certificates should rotate.

Symptoms

The node shows Ready status as Unknown and conditions such as MemoryPressure, DiskPressure, and PIDPressure are Unknown:
- Reason: NodeStatusUnknown
- Message: Kubelet stopped posting node status.
The kubelet logs show authentication errors similar to:

Unable to authenticate the request
err="x509: certificate has expired or is not yet valid: current time 2026-06-05T16:01:23Z is after 2026-05-14T18:09:59Z"

systemctl restart k0scontroller does not resolve the issue.

Root cause

The kubelet’s client certificate is stored at:

/var/lib/embedded-cluster/k0s/kubelet/pki/kubelet-client-current.pem

This certificate has a 1-year lifetime. Under normal operation, the kubelet rotates the certificate before it expires. However, if the host is offline past the certificate expiry date, automatic rotation cannot occur, because the renewal request itself uses the expired certificate to authenticate.

Restarting k0scontroller only regenerates the server-side certificates. The kubelet authentication kubeconfig (/var/lib/embedded-cluster/k0s/kubelet.conf) and the expired kubelet client certificate are left unchanged.

Resolution

The following procedure regenerates the kubelet client certificate. All existing kubelet configuration and PKI files are backed up first, so they can be restored if needed.

1. Stop the k0s controller

sudo systemctl stop k0scontroller

2. Back up the expired kubelet configuration

sudo mv /var/lib/embedded-cluster/k0s/kubelet.conf /tmp/kubelet.conf.expired
sudo cp -a /var/lib/embedded-cluster/k0s/kubelet/pki /tmp/kubelet-pki.expired-bak

3. Remove the expired kubelet client certificates

sudo rm -f /var/lib/embedded-cluster/k0s/kubelet/pki/kubelet-client-*

4. Restart the k0s controller and monitor the logs

sudo systemctl start k0scontroller
sudo journalctl -u k0scontroller --no-pager -f

Expected behavior

The node should return to Ready status within approximately 45 seconds.
All pods should be back to 1/1 Running within 3 to 4 minutes.
A small number of pods may restart once while their service account tokens refresh. This is expected after a long outage and resolves automatically.

5. Verify clock synchronization before resuming normal operations

After the cluster is healthy, ensure the host’s system clock is accurate and NTP is enabled before returning the node to production use.

Verification

Run the following commands to confirm the cluster is healthy:

sudo kubectl get nodes
sudo kubectl get pods -A

The node should report Ready and all pods should be Running.

Applies to

Replicated Embedded Cluster

Topic		Replies	Views
Error running installation script (the certificate has expired for etcd-server) Troubleshooting	5	1034	January 6, 2022
KOTS: What do I do when a customer's Kubernetes certificate has expired? Supporting your customers kots	1	718	July 29, 2022
Troubleshooting kubectl exec and logs Commands After Certificate Renewal Troubleshooting kurl	0	348	June 11, 2025
What to do if the kURL registry certificates have expired Troubleshooting	1	384	February 17, 2025
Workaround for when Joining a Node Fails with the error: x509: certificate signed by unknown authority Supporting your customers kurl	0	985	July 27, 2023