Testing Kubernetes Network Connectivity with the Goldpinger Troubleshoot Collector

Testing Kubernetes Network Connectivity with the Goldpinger Troubleshoot Collector

Network connectivity issues are among the most challenging problems to diagnose in Kubernetes clusters. Pods can’t reach each other, services are unreachable, or mysterious timeouts occur seemingly at random. The goldpinger troubleshoot collector provides a powerful way to quickly assess your cluster’s network health and identify connectivity problems.

What is Goldpinger?

Goldpinger is a network monitoring tool originally developed by Bloomberg that tests connectivity between nodes in a Kubernetes cluster. It works by deploying pods across your cluster nodes and having them ping each other to create a comprehensive network connectivity matrix.

The troubleshoot framework includes a goldpinger collector that can temporarily deploy goldpinger to your cluster, run connectivity tests, and collect the results in a support bundle for analysis.

How the Goldpinger Collector Works

The goldpinger collector makes a request to the <host>/check_all endpoint. If this collector is run within a kubernetes cluster, the collector will directly make the http request to the goldpinger endpoint (http://goldpinger.<namespace>.svc.cluster.local:80/check_all). If not, the collector attempts to launch a pod in the cluster, configured with the podLaunchOptions parameter, and makes the request within the running container.

If goldpinger is not installed, the collector will attempt to temporarily install it, and uninstall goldpinger once the collector has completed.

This automatic behavior means the collector works in three scenarios:

  1. Existing goldpinger installation - Queries the existing service directly
  2. No goldpinger, running in-cluster - Temporarily installs goldpinger, collects data, then cleans up
  3. No goldpinger, running externally - Launches a pod to make internal requests to temporary goldpinger

Output Files

Result of each collector will be stored in goldpinger/ directory of the support bundle.

The collector generates one of two files:

goldpinger/check_all.json

This file will contain the response of <host>/check_all endpoint with the full connectivity matrix.

goldpinger/error.txt

In case there is an error fetching results goldpinger/error.txt will contain the error message. Resulting file will contain either goldpinger/check_all.json or goldpinger/error.txt but never both.

The simplest way to run the goldpinger collector is with a minimal spec:

1. Create the Troubleshoot Spec

cat > goldpinger-spec.yaml << 'EOF'
apiVersion: troubleshoot.sh/v1beta2
kind: SupportBundle
metadata:
  name: goldpinger
spec:
  collectors:
    - goldpinger: {}
  analyzers:
    - goldpinger: {}
EOF

2. Run the Collector

kubectl support-bundle goldpinger-spec.yaml

That’s it! The collector will:

  • Automatically install goldpinger if it’s not already running
  • Deploy goldpinger pods across your cluster nodes
  • Test connectivity between all nodes
  • Collect the results
  • Clean up the temporary goldpinger installation
  • Generate a support bundle with the connectivity data

Understanding the Results

After running the collector, you’ll get a support bundle containing a goldpinger/ directory with connectivity results. Here’s what to look for:

Healthy Single-Node Cluster Example

{
  "hosts": [
    {
      "hostIP": "10.0.0.191",
      "podIP": "10.244.62.25", 
      "podName": "ts-goldpinger-jzc4f"
    }
  ],
  "responses": {
    "ts-goldpinger-jzc4f": {
      "HostIP": "10.0.0.191",
      "OK": true,
      "PodIP": "10.244.62.25",
      "response": {
        "podResults": {
          "ts-goldpinger-jzc4f": {
            "HostIP": "10.0.0.191",
            "OK": true,
            "PingTime": "2025-06-13T20:32:08.504Z",
            "PodIP": "10.244.62.25",
            "response-time-ms": 1,
            "status-code": 200
          }
        }
      }
    }
  }
}

Key Metrics to Check

  • OK: true/false - Overall connectivity status
  • response-time-ms - Network latency between nodes
  • status-code: 200 - HTTP response indicating successful connectivity
  • Multiple hosts - In multi-node clusters, you should see entries for each node

Multi-Node Cluster Results

In a healthy multi-node cluster, you’ll see connectivity results between all node pairs:

  • Node A → Node B, Node C, Node D (all OK: true)
  • Node B → Node A, Node C, Node D (all OK: true)
  • And so on…

Common Network Issues Goldpinger Can Detect

1. Node Isolation

{
  "OK": false,
  "Error": "connection timeout"
}

Diagnosis: A node can’t reach other nodes, possibly due to firewall rules or network misconfiguration.

2. High Latency

{
  "OK": true,
  "response-time-ms": 2500
}

Diagnosis: Nodes can connect but with high latency (>1000ms may indicate network issues).

3. Partial Connectivity

Some node pairs work fine while others fail - often indicates asymmetric routing or security group issues.

Advanced Configuration

The goldpinger collector supports several configuration options for customizing its behavior:

Custom Namespace

collectors:
  - goldpinger:
      namespace: kurl  # Look for goldpinger in 'kurl' namespace

Custom Goldpinger Image

collectors:
  - goldpinger:
      image: my-registry/goldpinger:custom-tag  # Use custom goldpinger image

Collection Delay

collectors:
  - goldpinger:
      collectDelay: 10s  # Wait 10 seconds after goldpinger starts

Pod Launch Options

When goldpinger needs to make requests from within the cluster, you can customize the pod it launches:

collectors:
  - goldpinger:
      namespace: kurl
      podLaunchOptions:
        namespace: monitoring           # Launch pod in monitoring namespace
        image: alpine:latest           # Use alpine (needs wget)
        imagePullSecret: my-secret     # Use custom image pull secret
        serviceAccountName: goldpinger # Use specific service account

Resources