Workaround for when Joining a Node Fails with the error: x509: certificate signed by unknown authority

There is a rare case where a Kubernetes node cannot join an existing cluster due to the CA certificate generated on the new node being different from the CA certificate on existing nodes. A symptom of such issue is when joining a node fails with the following error:

[preflight] Reading configuration from the cluster…
[preflight] FYI: You can look at this config file with ‘kubectl -n kube-system get cm kubeadm-config -o yaml’
[preflight] Running pre-flight checks before initializing the new control plane instance
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using ‘kubeadm config images pull’
[download-certs] Downloading the certificates in Secret “kubeadm-certs” in the “kube-system” Namespace
[certs] Using certificateDir folder “/etc/kubernetes/pki”
[certs] Generating “etcd/ca” certificate and key
[certs] Generating “etcd/server” certificate and key
[certs] etcd/server serving cert is signed for DNS names [azl-t-pp-pam03 localhost] and IPs [137.181.100.135 127.0.0.1 ::1]
[certs] Generating “etcd/peer” certificate and key
[certs] etcd/peer serving cert is signed for DNS names [azl-t-pp-pam03 localhost] and IPs [137.181.100.135 127.0.0.1 ::1]
[certs] Generating “apiserver-etcd-client” certificate and key
[certs] Generating “etcd/healthcheck-client” certificate and key
[certs] Generating “ca” certificate and key
[certs] Generating “apiserver” certificate and key
[certs] apiserver serving cert is signed for DNS names [azl-t-pp-pam03 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local localhost] and IPs [10.96.0.1 137.181.100.135]
[certs] Generating “apiserver-kubelet-client” certificate and key
[certs] Generating “front-proxy-ca” certificate and key
[certs] Generating “front-proxy-client” certificate and key
[certs] Valid certificates and keys now exist in “/etc/kubernetes/pki”
[certs] Generating “sa” key and public key
[kubeconfig] Generating kubeconfig files
[kubeconfig] Using kubeconfig folder “/etc/kubernetes”
W0717 17:46:04.085492 9111 endpoint.go:57] [endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing “admin.conf” kubeconfig file
W0717 17:46:04.241083 9111 endpoint.go:57] [endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing “controller-manager.conf” kubeconfig file
W0717 17:46:04.650818 9111 endpoint.go:57] [endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing “scheduler.conf” kubeconfig file
[control-plane] Using manifest folder “/etc/kubernetes/manifests”
[control-plane] Creating static Pod manifest for “kube-apiserver”
[control-plane] Creating static Pod manifest for “kube-controller-manager”
[control-plane] Creating static Pod manifest for “kube-scheduler”
[check-etcd] Checking that the etcd cluster is healthy
error execution phase check-etcd: could not retrieve the list of etcd endpoints: Get “https://localhost:6444/api/v1/namespaces/kube-system/pods?labelSelector=component%3Detcd%2Ctier%3Dcontrol-plane”: x509: certificate signed by unknown authority (possibly because of “crypto/rsa: verification error” while trying to verify candidate authority certificate “kubernetes”)
To see the stack trace of this error execute with --v=5 or higher
e[0;31mFailed to join the kubernetes cluster.e[0m

To successfully join a node for a cluster in this state, a cluster/system administration will need to do the following:

  1. Copy /etcd/kubernetes/pki directory from an existing Kubernetes node to the new node
  2. On the new node, remove the following etcd certs: rm /etc/kubernetes/pki/etcd/{healthcheck-client.crt, healthcheck-client.key,peer.crt,peer.key,server.crt,server.key}
  3. Generate a new join command:

    curl -sSL https://kurl.sh/latest/tasks.sh | sudo bash -s join_token

  4. Run the join command on the node to be added to the cluster