If you see the following Alert in your cluster SupportBundle then you have been hit by a known
Kubeadm cluster status ConfigMap is in an inconsistent state.
This error may cause Cluster upgrades to fail and to solve it a manual intervention is required.
Why am I being hit by this ?
Kubeadm uses a ConfigMap called
kubeadm-config in the
kube-system namespace to keep track of what are the configured Control Plane API endpoints. If you are seeing this error it means that at some stage in the past you had used
ekco-purge-node command line utility to remove one of your cluster nodes.
Due to a bug,
ekco-purge-node was rendering an invalid
ClusterStatus YAML and writing it in the
kubeadm-config ConfigMap. This invalid YAML can’t then be read by
kubeadm anymore so cluster upgrades may fail.
How to solve it
We need to manually adjust the ConfigMap, this can be achieved by editing the ConfigMap with the following command:
kubectl edit cm kubeadm-config -n kube-system
For sake of simplicity we should focus only on the
ClusterStatus property of the ConfigMap. After running the command above you most likely gonna see something very similar to this (the YAMLs below are redacted for easier understanding):
apiVersion: v1 kind: ConfigMap metadata: name: kubeadm-config namespace: kube-system data: ClusterConfiguration: | <redacted> ClusterStatus: | typemeta: apiversion: kubeadm.k8s.io/v1beta2 kind: ClusterStatus apiendpoints: ip-172-16-10-30: advertiseaddress: 172.16.10.30 bindport: 6443 ip-172-16-10-88: advertiseaddress: 172.16.10.88 bindport: 6443
The ClusterStatus property must be adjusted to something like this:
apiVersion: v1 kind: ConfigMap metadata: name: kubeadm-config namespace: kube-system data: ClusterConfiguration: | <redacted> ClusterStatus: | apiVersion: kubeadm.k8s.io/v1beta2 kind: ClusterStatus apiEndpoints: ip-172-16-10-30: advertiseAddress: 172.16.10.30 bindPort: 6443 ip-172-16-10-88: advertiseAddress: 172.16.10.88 bindPort: 6443
Please note that we are only editing the ClusterStatus portion of the ConfigMap, everything else should be kept as is. To summarise the changes we are executing:
typemetasection needs to be removed and its properties must be moved up by one level.
apiversionproperty is renamed to
apiendpointsproperty is renamed to
advertiseaddressproperty is renamed to
bindportproperty is renamed to
After that you can save the ConfigMap and you should not see the alert anymore in your SupportBundle.