Gather support bundle hangs indefinitely

Hey,

Issue
KOTS support bundle gathering hangs indefinitely.

Description
Recently we been on a few troubleshooting calls with our customers, who where having issues with their embedded clusters. During which we noticed that once they were asked to create support bundle for us, the ‘analyser’ would just hang indefinitely in both KOTS Admin Console and when running it manually via kubectl kots.

I saw there is similar issue, however in our case it just never completed (we waited around 15 minutes, when usually it take 1min).

Cases
There were couple different cases, but here are a few:

  • The rook-ceph was in unhealthy state (Error), and the analyser was stuck on ‘collecting CEPH data’.
  • Hung when trying to collect sysctl information.

What is our expectation?
The analyser should time-out after X amount of seconds if unable to collect data and move on, ideally reporting the pod events & logs, even if it’s in unhealthy state.

Any help appreciated.

This probably warrants a support ticket with Replicated, as it’s not the intended behavior and is likely a bug.

I’d ask however if you’re using the most recent version of Troubleshoot - we have made some changes lately which do affect the timeouts. If you’re running this within the admin console, the version of Troubleshoot is tied to the version of kots - you can run the support bundle from the CLI to get the most recent version of the binary (instructions here.