Preventing kURL Cluster Outages: Excluding Critical Packages from Automatic Updates
The Problem
A cause of kURL cluster outages occurs when system administrators run package manager updates (such as apt upgrade, yum update, dnf upgrade, etc.) on nodes running kURL-managed Kubernetes clusters. When critical Kubernetes and container runtime packages are automatically updated, they can drift away from the versions that kURL ships and manages, leading to:
- API version incompatibilities between kubelet and the control plane
- Container runtime failures due to version mismatches
- Networking issues from CNI plugin incompatibilities
- Cluster instability and potential service outages
The Solution: Package Exclusion
kURL installs and manages specific versions of critical host packages to ensure compatibility across the entire Kubernetes stack. To prevent automatic updates from breaking your cluster, you need to exclude these packages from your system’s automatic update mechanisms.
Critical Packages to Exclude
Kubernetes Core Components (CRITICAL)
These packages have strict version compatibility requirements and must be excluded from automatic updates:
kubelet- The Kubernetes node agentkubectl- The Kubernetes command-line toolkubeadm- The Kubernetes cluster bootstrap toolkubernetes-cni- Container Network Interface plugins
For systems where kURL installs containerd (not installed by user):
containerd.io- The container runtime (primary CRI)
Why These Packages Are Critical
Version Compatibility Requirements
- Kubernetes packages have strict version skew policies - mismatched versions can cause API incompatibilities, networking issues, or cluster instability
- Container runtime packages need to be compatible with the specific Kubernetes version - mismatches can cause pod startup failures
- kURL manages these packages through its own upgrade process, ensuring version compatibility across the entire stack
What Happens When They’re Updated
- kubelet version mismatches can cause nodes to be marked as “NotReady”
- kubectl version mismatches can cause API communication failures
- Container runtime version mismatches can prevent pods from starting
- CNI plugin updates can break pod networking
Emergency Recovery Procedure
If critical packages have already been updated and your cluster is experiencing issues, you can recover by downgrading to the kURL-managed versions:
Step 1: Locate kURL Package Assets
# Navigate to kURL assets directory
cd /var/lib/kurl/assets
# Find the correct package files
ls -la | grep -E "(kubelet|kubectl|kubeadm|containerd|docker)"
Step 2: Downgrade to kURL-Managed Versions
# For kubelet (example with .deb packages)
sudo apt install ./kubelet-*.deb --allow-downgrades -y
# For kubectl
sudo apt install ./kubectl-*.deb --allow-downgrades -y
# For kubeadm
sudo apt install ./kubeadm-*.deb --allow-downgrades -y
# For containerd
sudo apt install ./containerd.io-*.deb --allow-downgrades -y
Step 3: Restart Affected Services
# Restart kubelet
sudo systemctl restart kubelet
# Restart containerd (if applicable)
sudo systemctl restart containerd
# Check service status
sudo systemctl status kubelet
sudo systemctl status containerd
Step 4: Verify Cluster Health
# Check node status
kubectl get nodes
# Check pod status
kubectl get pods --all-namespaces
# Verify kotsadm console access
kubectl get pods -n kurl
Conclusion
Preventing automatic updates of kURL-managed packages is crucial for maintaining cluster stability. By implementing package exclusions for the critical packages outlined in this article, you can prevent the common scenario where package manager updates break your kURL cluster.
Remember: kURL manages these packages through its own upgrade mechanisms to ensure compatibility. Let kURL handle the upgrades, and exclude these packages from your system’s automatic update process.