Flannel with VMWare,

In several installations with VMWare vNIC drivers & Flannel, we have noticed a particular edge case requiring TCP checksum offloading disabled in the Linux kernel. This can result in seemingly random network issues between cluster nodes.

The following command disables the checksum offloading:

ethtool -K flannel.1 tx-checksum-ip-generic off

To make the change persistent, add the following to a new systemd unit flannel-ethtool.service:

[Unit]
Description=Disable vxlan checksum offloading for flannel.1
After=sys-devices-virtual-net-flannel.1.device
Requires=sys-devices-virtual-net-flannel.1.device

[Service]
Type=oneshot
ExecStart=/sbin/ethtool -K flannel.1 tx-checksum-ip-generic off
RemainAfterExit=yes

[Install]
WantedBy=sys-devices-virtual-net-flannel.1.device

See the Red Hat docs or a more concise article here for information about using systemd units.

The simpler, and preferred, answer is to update the kURL installer and Flannel as this is built in as of Flannel 0.24.4.

The root cause of this problem is a flaw in the hardware offloading mechanism on VMware platforms, specifically in the virtual NIC’s handling of checksum calculations. This flaw can lead to incorrect checksum values for inner packets—the original packets encapsulated within the VXLAN tunnel.

This issue particularly affects modern Linux distributions, which enable checksum offloading by default. When these miscalculated checksums are encountered, the Linux networking stack treats the packets as corrupted and drops them, resulting in network connectivity problems.

The solution is to disable checksum offloading to the virtual NIC, allowing the Linux kernel to calculate checksums in software instead, ensuring packet integrity is maintained through the VXLAN tunnel.

This file can be created in /etc/systemd/system/flannel-ethtool.service file. Once done, run sudo systemctl enable flannel-ethtool.service --now to enable the service. It will also be started.