Log Collection for Crashed and Terminated Pods

Hi all,

Currently, our support-bundle only includes logs from running and recently terminated pods.

However, it is important to also have access to logs from crashed or terminated pods in order to troubleshoot and resolve issues effectively.

I am considering using Collectd or Fluentd with local file output.

I would appreciate any other recommendations you may have for log collection solutions.

Thanks,
Thomas

Hello @Thomas_Corvazier :wave:

Are you referring to Pods which are no longer visible with kubectl get pods? If so, you’re correct that you’d want to aggregate the logs somewhere so they persist as Pods come and go.

We do have a suggestion on the Troubleshoot Github repo for a collector that could retrieve logs from an aggregator like ElasticSearch/Logstash, but today I believe something like Fluentd with local file output and our CopyFromHost collector would be your best path forward.

1 Like

Thanks for your response Diamon
Yes, correct, I meant pods deleted by the cluster and not showing in kubectl get pods.
After some investigation, it seems I will need a log collector that could work with many kubernetes distributions and using fluentd seems to make a lot of assumptions on how log files are stored in the cluster (stored in /var/log/containers? JSON or plain text?) so maybe the best would be to use a k8s API based log collector instead?

Sorry @Thomas_Corvazier for the extremely late reply here.

so maybe the best would be to use a k8s API based log collector instead?

I think I’d need some context on what you mean by a k8s API based log collector. Are you referring to what we have available today - Pod Logs - Troubleshoot Docs - Troubleshoot Docs? As you have already discovered, this will only collect logs from the containers that are still running or visible with kubectl get pods.

To capture logs from pods that no longer exist you would need some sort of log aggregation. I think your idea about fluentd would be the easy way to go about this. I imagine that each node would have the logs available at /some/file/path/to/fluentd/logs as you mentioned, and then you could scoop them up with Copy Files and Directories from Hosts - Troubleshoot Docs - Troubleshoot Docs from each node in the cluster. Does that make sense?