Troubleshooting Calico networking issues
Replicated Embedded Cluster uses Calico as the default networking solution. This guide provides step-by-step instructions for troubleshooting networking issues in Embedded Cluster related to Calico.
Possible symptoms
- Pod stuck in
CrashLoopBackOff
state with failed health checks
Warning Unhealthy 6h51m (x3 over 6h52m) kubelet Liveness probe failed: Get "http://<ip:port>/readyz": dial tcp <ip:port>: connect: no route to host
Warning Unhealthy 6h51m (x19 over 6h52m) kubelet Readiness probe failed: Get "http://<ip:port>/readyz": dial tcp <ip:port>: connect: no route to host
....
Unhealthy pod/registry-dc699cbcf-pkkbr Readiness probe failed: Get "https://<ip:port>/": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Unhealthy pod/registry-dc699cbcf-pkkbr Liveness probe failed: Get "https://<ip:port>/": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
...
- Pod log contains
i/o timeout
server APIs: config.k8ssandra.io/v1beta1: Get \"https://***HIDDEN***:443/apis/config.k8ssandra.io/v1beta1\": dial tcp ***HIDDEN***:443: i/o timeout"}
Common checks
Verify pod communication
- Get IP address of the pod
- Is the communication working between: pod <=> pod, pod <=> service, pod <=> K8S API server?, kubelet <=> pod?
- Is the communication working cross-node?
Ensure that existing firewall rules are not blocking necessary ports for Calico overlay network traffic.
Review ports requirements
Switch SELinux or AppArmor to permissive mode
This is to verify that certain network operations are not blocked by SELinux or AppArmor.
Ensure consistent MTU size
Ensure that the MTU size is consistent across all nodes in the cluster to avoid fragmentation issues.
ip link show | grep mtu
More info on determining MTU size
Ensure all Calico components are running
kubectl get po -n kube-system -l 'k8s-app in (calico-node, calico-kube-controllers)'
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-6c8697c78c-kbp7v 1/1 Running 0 21h
calico-node-rhcl2 1/1 Running 0 21h
Verify that all pods are Ready and have a status of Running
.
calico-node
is a DaemonSet pod that runs on every node in the cluster. It is responsible for configuring the network interfaces and routing tables on each node.
calico-kube-controllers
is a Deployment that runs on the controller node. It is responsible for managing the Calico networking resources in the cluster.
Common issues
Overlapping of podCIDR
and serviceCIDR
with host network CIDR.
Verify with
cat /etc/k0s/k0s.yaml | grep -i cidr
podCIDR: 10.244.0.0/17
serviceCIDR: 10.244.128.0/17
The default podCIDR is 10.244.0.0/16
and serviceCIDR is 10.96.0.0/12
.
View pod network interfaces excluding Calico interfaces, and ensure no overlapping CIDRs.
ip route | grep -v cali
default via 10.152.0.1 dev ens4 proto dhcp src 10.152.0.4 metric 100
10.152.0.1 dev ens4 proto dhcp scope link src 10.152.0.4 metric 100
blackhole 10.244.101.192/26 proto 80
169.254.169.254 via 10.152.0.1 dev ens4 proto dhcp src 10.152.0.4 metric 100
If there is overlap, consider reset and reinstall the application with alternate CIDRs. More info
Incorrect kernel parameters values
Use sysctl
to verify that these parameters are set correctly:
net.ipv4.conf.default.arp_filter = 0
net.ipv4.conf.default.arp_ignore = 0
net.ipv4.ip_forward = 1
If the values are not set correctly, run the following command to set them:
- Reset and reboot the installation
sysctl -w net.ipv4.conf.default.arp_filter=0
sysctl -w net.ipv4.conf.default.arp_ignore=0
sysctl -w net.ipv4.ip_forward=1
echo "net.ipv4.conf.default.arp_filter=0" >> /etc/sysctl.conf
echo "net.ipv4.conf.default.arp_ignore=0" >> /etc/sysctl.conf
echo "net.ipv4.ip_forward=1" >> /etc/sysctl.conf
sysctl -p
- Re-run the installation
VXLAN traffic getting dropped
By default, Calico uses VXLAN as the overlay networking protocol, with Always
mode. This mode encapsulates all pod-to-pod traffic in VXLAN packets. If for some reasons, the VXLAN packets get filtered by the network, the pod will not able to communicate with other pods.
As a temporary troubleshooting measure, try to set the mode to CrossSubnet
and see if the issue persists. This mode only encapsulates traffic between pods across different subnets with VXLAN.
kubectl patch ippool default-ipv4-ippool --type=merge -p '{"spec": {"vxlanMode": "CrossSubnet"}}'
If this resolves the connectivity issues, this indicates there’s likely an underlying network configuration problem with VXLAN traffic that should be addressed.
If customer has followed all the troubleshooting steps above and the issue persists, please contact Replicated with outputs of troubleshooting steps above and a support bundle.