On GCP, for instances created running Linux distributions which utilize Network Manager, it’s possible after a reboot to observe issues with a running Kubernetes cluster like kURL where pods are unable to start and the node reports as NotReady
. Upon further investigation you may see the following in the kubelet
logs:
kubelet_node_status.go:92] "Unable to register node with API server" err="nodes \"some-node\" is forbidden: node \"some-node.some.domain.internal\" is not allowed to modify node \"some-node\"" node="some-node"
This happens in certain Linux distributions on GCP where a script at the location of /etc/dhcp/dhclient.d/google_hostname.sh
is run whenever the DHCP client renews its lease. This script runs the command nmcli general hostname "${new_host_name%%.*}"
to set the hostname to the short hand name. Before the script is run, the hostname is generally the full fqdn(some-node.some.domain.internal
), and afterwards the hostname is simply some-node
causing the node to enter a NotReady
state.
To workaround this you can either:
-
Remove the script before running the kURL install with
rm -rf /etc/dhcp/dhclient.d/google_hostname.sh
. It’s worth noting that this approach may affect functionality likegcloud compute instances create vmname --hostname=xxxx
, so you should proceed with caution. -
Manually force the script to run as part of the DHCP lease renewal process before running the kURL install with
dhclient -r && dhclient
. -
Set a permenant hostname manually. Google Compute Engine: how to set hostname permanently? - Stack Overflow
Relevant Links: