Troubleshooting Calico networking issues

Troubleshooting Calico networking issues

Replicated Embedded Cluster uses Calico as the default networking solution. This guide provides step-by-step instructions for troubleshooting networking issues in Embedded Cluster related to Calico.

Possible symptoms

  • Pod stuck in CrashLoopBackOff state with failed health checks
Warning Unhealthy 6h51m (x3 over 6h52m) kubelet Liveness probe failed: Get "http://<ip:port>/readyz": dial tcp <ip:port>: connect: no route to host
Warning Unhealthy 6h51m (x19 over 6h52m) kubelet Readiness probe failed: Get "http://<ip:port>/readyz": dial tcp <ip:port>: connect: no route to host
....
Unhealthy               pod/registry-dc699cbcf-pkkbr     Readiness probe failed: Get "https://<ip:port>/": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Unhealthy               pod/registry-dc699cbcf-pkkbr     Liveness probe failed: Get "https://<ip:port>/": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
...
  • Pod log contains i/o timeout
server APIs: config.k8ssandra.io/v1beta1: Get \"https://***HIDDEN***:443/apis/config.k8ssandra.io/v1beta1\": dial tcp ***HIDDEN***:443: i/o timeout"}

Common checks

Verify pod communication

  • Get IP address of the pod
  • Is the communication working between: pod <=> pod, pod <=> service, pod <=> K8S API server?, kubelet <=> pod?
  • Is the communication working cross-node?

Ensure that existing firewall rules are not blocking necessary ports for Calico overlay network traffic.

Review ports requirements

Switch SELinux or AppArmor to permissive mode

This is to verify that certain network operations are not blocked by SELinux or AppArmor.

Ensure consistent MTU size

Ensure that the MTU size is consistent across all nodes in the cluster to avoid fragmentation issues.

ip link show | grep mtu

More info on determining MTU size

Ensure all Calico components are running

kubectl get po -n kube-system -l 'k8s-app in (calico-node, calico-kube-controllers)'
NAME                                       READY   STATUS    RESTARTS   AGE
calico-kube-controllers-6c8697c78c-kbp7v   1/1     Running   0          21h
calico-node-rhcl2                          1/1     Running   0          21h

Verify that all pods are Ready and have a status of Running.

calico-node is a DaemonSet pod that runs on every node in the cluster. It is responsible for configuring the network interfaces and routing tables on each node.
calico-kube-controllers is a Deployment that runs on the controller node. It is responsible for managing the Calico networking resources in the cluster.

Common issues

Overlapping of podCIDR and serviceCIDR with host network CIDR.

Verify with

cat /etc/k0s/k0s.yaml | grep -i cidr
    podCIDR: 10.244.0.0/17
    serviceCIDR: 10.244.128.0/17

The default podCIDR is 10.244.0.0/16 and serviceCIDR is 10.96.0.0/12.

View pod network interfaces excluding Calico interfaces, and ensure no overlapping CIDRs.

ip route | grep -v cali
default via 10.152.0.1 dev ens4 proto dhcp src 10.152.0.4 metric 100
10.152.0.1 dev ens4 proto dhcp scope link src 10.152.0.4 metric 100
blackhole 10.244.101.192/26 proto 80
169.254.169.254 via 10.152.0.1 dev ens4 proto dhcp src 10.152.0.4 metric 100

If there is overlap, consider reset and reinstall the application with alternate CIDRs. More info

Incorrect kernel parameters values

Use sysctl to verify that these parameters are set correctly:

net.ipv4.conf.default.arp_filter = 0
net.ipv4.conf.default.arp_ignore = 0
net.ipv4.ip_forward = 1

If the values are not set correctly, run the following command to set them:

sysctl -w net.ipv4.conf.default.arp_filter=0
sysctl -w net.ipv4.conf.default.arp_ignore=0
sysctl -w net.ipv4.ip_forward=1


echo "net.ipv4.conf.default.arp_filter=0" >> /etc/sysctl.conf
echo "net.ipv4.conf.default.arp_ignore=0" >> /etc/sysctl.conf
echo "net.ipv4.ip_forward=1" >> /etc/sysctl.conf

sysctl -p
  • Re-run the installation

VXLAN traffic getting dropped

By default, Calico uses VXLAN as the overlay networking protocol, with Always mode. This mode encapsulates all pod-to-pod traffic in VXLAN packets. If for some reasons, the VXLAN packets get filtered by the network, the pod will not able to communicate with other pods.

As a temporary troubleshooting measure, try to set the mode to CrossSubnet and see if the issue persists. This mode only encapsulates traffic between pods across different subnets with VXLAN.

kubectl patch ippool default-ipv4-ippool --type=merge -p '{"spec": {"vxlanMode": "CrossSubnet"}}'

If this resolves the connectivity issues, this indicates there’s likely an underlying network configuration problem with VXLAN traffic that should be addressed.

If customer has followed all the troubleshooting steps above and the issue persists, please contact Replicated with outputs of troubleshooting steps above and a support bundle.

1 Like