In this article, I’ll share a workflow that I find very useful for discovering information from logs contained in Support Bundles.
Given that a Support Bundle can contain many many log files and I might not know exactly what pod slug I need to look for, and that the directory structure of any given support bundle is contingent upon the spec that was used to generate it, it would be handy to have a way to surface the most relevant information without having to navigate through a Support Bundle by hand.
Enter find
find
can be used to search for all kinds of metadata about files and directories. Using find
to its fullest is outside the scope of this article but I’ll demonstrate how I use it in a troubleshooting context.
Let’s say I have a support bundle from a cluster, unzipped in my current working directory and I am investigating a problem with the Flannel component. I know that I want to see logs related to Flannel, but I don’t necessarily know any of the flannel pod names or container names.
Begin using find
with some very loose parameters, and then get more focused. Pod logs typically end in .log
so I might start by looking for ALL the log files in the support bundle:
$ find . -type f -iname "*.log"
./cluster-resources/pods/logs/kurl/ekc-operator-7ddb5d6f7-9xjmf/ekc-operator.log
./cluster-resources/pods/logs/kurl/registry-86d7f5fb54-5kxmt/registry-backup.log
./cluster-resources/pods/logs/kurl/registry-86d7f5fb54-5kxmt/restore.log
./cluster-resources/pods/logs/kurl/registry-86d7f5fb54-5kxmt/registry.log
./cluster-resources/pods/logs/kurl/registry-86d7f5fb54-m6t9x/registry-backup.log
./cluster-resources/pods/logs/kurl/registry-86d7f5fb54-m6t9x/restore.log
./cluster-resources/pods/logs/kurl/registry-86d7f5fb54-m6t9x/registry.log
./cluster-resources/pods/logs/kube-flannel/kube-flannel-ds-rgxfw/install-cni.log
./cluster-resources/pods/logs/kube-flannel/kube-flannel-ds-rgxfw/install-cni-plugin.log
./cluster-resources/pods/logs/kube-flannel/kube-flannel-ds-rgxfw/kube-flannel.log
./cluster-resources/pods/logs/default/barnacle-5c648bb78f-c6wnb/sidekiq.log
./cluster-resources/pods/logs/default/barnacle-5c648bb78f-c6wnb/1-redis-wait.log
./cluster-resources/pods/logs/default/barnacle-5c648bb78f-c6wnb/2-postgres-wait.log
./cluster-resources/pods/logs/default/kotsadm-rqlite-0/rqlite.log
./cluster-resources/pods/logs/default/wave-5b797bf69d-bn657/1-redis-wait.log
./cluster-resources/pods/logs/default/wave-5b797bf69d-bn657/3-wave-start.log
./cluster-resources/pods/logs/default/wave-5b797bf69d-bn657/2-postgres-wait.log
./cluster-resources/pods/logs/default/wave-5b797bf69d-bn657/ecfcm.log
./cluster-resources/pods/logs/default/kotsadm-65478df544-vclc7/restore-s3.log
./cluster-resources/pods/logs/default/kotsadm-65478df544-vclc7/schemahero-plan.log
./cluster-resources/pods/logs/default/kotsadm-65478df544-vclc7/kotsadm.log
./cluster-resources/pods/logs/default/kotsadm-65478df544-vclc7/restore-db.log
./cluster-resources/pods/logs/default/kotsadm-65478df544-vclc7/schemahero-apply.log
./cluster-resources/pods/logs/default/kurl-proxy-kotsadm-5c889df745-qvhzh/proxy.log
./cluster-resources/pods/logs/projectcontour/envoy-fr6w5/shutdown-manager.log
./cluster-resources/pods/logs/projectcontour/envoy-fr6w5/envoy-initconfig.log
./cluster-resources/pods/logs/projectcontour/envoy-fr6w5/envoy.log
Finding log files
First, lets examine find
so we can reason about what we want to search for:
find . -type f -iname "*.log"
find .
- the base command
find
followed by a dot character (.
) indicates to run thefind
command starting in the current working directory.
- the base command
-type f
- the
-type
flag tellsfind
what kind of file to search for; in this casef
indicates that we want to search for only files. Contrast withd
which would indicate directories.
- the
-iname "*.log"
- the
-iname
parameter tells find we want to search using a file name using a case-insensitive search, not including the path preceding the file (this is a very important caveat!), and the"*.log"
pattern is a glob expression that matches all files ending in “.log”
- the
Now that we have a high-level view of what this bundle offers, let’s try to narrow the scope. Say I wanted to investigate a problem with the Registry in the embedded cluster.
I know that I have 2 registry pods, indicated by /logs/kurl/registry-86d7f5fb54-5kxmt
and logs/kurl/registry-86d7f5fb54-m6t9x
, and each of those have 3 containers, registry
, registry-backup
, and restore
- rather than viewing all of these individually, it might be very useful to see all of the messages from all of these pods at once, and preferably organized in some way that makes it easy to follow events in a timeline.
Enter lnav
lnav
is a tool for navigating log files. Check out the full documentation on their website. It parses files that are fed into it and builds a “view” of those files based on timestamp data that it parses from each line. It has many many features, just like find
, which are outside the scope of this article, but I’ll touch on a few of them that I use regularly.
Reading logs with lnav
First, let’s look at a minimal example before we combine with find
, still using the same support bundle from above. Let’s open a single file with lnav
:
lnav ./cluster-resources/pods/logs/kurl/registry-86d7f5fb54-5kxmt/registry.log
You should see an interactive display in your terminal:
2024-04-08T12:49:42 EDT Press ` to focus on the breadcrumb bar
LOG ❭2023-12-13T11:30:20.000❭logfmt_log❭registry.log[0]❭
┌time=2023-12-13 11:30:20 level=warning msg="Ignoring unrecognized environment │
│time=2023-12-13 11:30:20 level=warning msg="Ignoring unrecognized environment │
│time=2023-12-13 11:30:20 level=warning msg="Ignoring unrecognized environment │
│time=2023-12-13 11:30:20 level=warning msg="Ignoring unrecognized environment │
│time=2023-12-13 11:30:20 level=warning msg="Ignoring unrecognized environment │
│time=2023-12-13 11:30:20 level=warning msg="Ignoring unrecognized environment │
│time=2023-12-13 11:30:20 level=warning msg="Ignoring unrecognized environment │
│time=2023-12-13 11:30:20 level=warning msg="Ignoring unrecognized environment │
│time="2023-12-13T16:30:20.129351141Z" level=info msg="redis not configured" go│
│time="2023-12-13T16:30:20.129483754Z" level=info msg="backend redirection disa│
│time="2023-12-13T16:30:20.129512479Z" level=info msg="using inmemory blob desc│
│time="2023-12-13T16:30:20.136312329Z" level=info msg="restricting TLS version │
│time="2023-12-13T16:30:20.136398614Z" level=info msg="restricting TLS cipher s│
│time="2023-12-13T16:30:20.137028934Z" level=info msg="listening on [::]:443, t│
│
│
Files :: Text Filters :: Press TAB to edit
L0 100% ?:View Help
Press e/E to move forward/backward through error messages
You can use the arrow keys & pg-up/pg-down to navigate through lines, especially if lines trail off the end of the screen. If you hit the left arrow from the initial view, you can see the name of the file that presented a given line:
2024-04-08T12:51:59 EDT
LOG ❭2023-12-13T11:30:20.000❭logfmt_log❭registry.log[0]❭
registry.log┌time=2023-12-13 11:30:20 level=warning msg="Ignoring unrecognized │
registry.log│time=2023-12-13 11:30:20 level=warning msg="Ignoring unrecognized │
registry.log│time=2023-12-13 11:30:20 level=warning msg="Ignoring unrecognized │
...
lnav
commands
lnav
has a command mode very similar to vim
. Commands are entered starting with the :
character. Check out the full documentation on lnav
and in this example let’s focus on my favorite commands: filtering.
Filtering is very powerful in lnav
- often it’s the case that we are presented with much more information than is relevant; lnav
’s filters give us a great tool for whittling away irrelevant information so we are left with only what’s truly relevant to solve a problem.
Here’s a simple example of what filtering looks like in lnav
, investigating the kurl-proxy access log:
$ lnav ./cluster-resources/pods/logs/default/kurl-proxy-kotsadm-5c889df745-qvhzh/proxy.log
2024-04-08T13:01:53 EDT Press ` to focus on the breadcrumb bar
LOG ❭2023-12-13T16:30:56.000❭generic_log❭proxy.log[45]❭
│[GIN] 2023/12/13 - 16:33:03 | 200 | 36.098107ms | ***HIDDEN*** | GET "/api/v1/troubleshoot/app/manifold/support│
│[GIN] 2023/12/13 - 16:33:03 | 200 | 19.024517ms | ***HIDDEN*** | GET "/api/v1/velero" │
│[GIN] 2023/12/13 - 16:33:03 | 200 | 33.007606ms | ***HIDDEN*** | GET "/api/v1/troubleshoot/app/manifold/support│
│[GIN] 2023/12/13 - 16:33:03 | 200 | 86.568663ms | ***HIDDEN*** | GET "/api/v1/apps" │
│[GIN] 2023/12/13 - 16:33:03 | 200 | 30.352735ms | ***HIDDEN*** | GET "/api/v1/troubleshoot/app/manifold/support│
│[GIN] 2023/12/13 - 16:33:03 | 200 | 45.456995ms | ***HIDDEN*** | GET "/api/v1/troubleshoot/app/manifold/support│
│[GIN] 2023/12/13 - 16:33:03 | 200 | 23.25274ms | ***HIDDEN*** | GET "/api/v1/troubleshoot/app/manifold/support│
│[GIN] 2023/12/13 - 16:33:04 | 200 | 3.060387ms | ***HIDDEN*** | GET "/api/v1/ping?slugs=" │
│[GIN] 2023/12/13 - 16:33:05 | 200 | 30.457194ms | ***HIDDEN*** | GET "/api/v1/troubleshoot/app/manifold/support│
│[GIN] 2023/12/13 - 16:33:08 | 200 | 42.19222ms | ***HIDDEN*** | GET "/api/v1/troubleshoot/app/manifold/support│
│[GIN] 2023/12/13 - 16:33:08 | 202 | 442.216123ms | ***HIDDEN*** | POST "/api/v1/troubleshoot/supportbundle/app/2Y│
│[GIN] 2023/12/13 - 16:33:08 | 200 | 31.854912ms | ***HIDDEN*** | GET "/api/v1/velero" │
│[GIN] 2023/12/13 - 16:33:08 | 200 | 44.175024ms | ***HIDDEN*** | GET "/api/v1/troubleshoot/supportbundle/2zummc│
│[GIN] 2023/12/13 - 16:33:08 | 200 | 54.798277ms | ***HIDDEN*** | GET "/api/v1/apps" │
│[GIN] 2023/12/13 - 16:33:08 | 200 | 35.875345ms | ***HIDDEN*** | GET "/api/v1/apps" │
│[GIN] 2023/12/13 - 16:33:09 | 200 | 26.289952ms | ***HIDDEN*** | GET "/api/v1/troubleshoot/supportbundle/2zummc│
│[GIN] 2023/12/13 - 16:33:09 | 200 | 14.860932ms | ***HIDDEN*** | GET "/api/v1/ping?slugs=manifold" │
│[GIN] 2023/12/13 - 16:33:10 | 200 | 33.838169ms | ***HIDDEN*** | GET "/api/v1/troubleshoot/supportbundle/2zummc│
Files :: Text Filters :: Press TAB to edit
L178 100% ?:View Help
Press e/E to move forward/backward through error messages
We see there are access records in this log, but we might not be interested in the records of the /ping
or /troubleshoot
endpoints. We can filter those lines out of this view by using the :filter-out
command. From the view, type :filter-out \/ping
. Notice as you type \/ping
, that the lines matching that regular expression will start to highlight, giving you an indication of what would be filtered out by the filter when executed. Also notice that this is a regular expression, so we might need to escape special characters like /
when we want to match them:
Hit Enter, and the filter will be applied. Any lines containing the regular expression ping
will be hidden from the view. Reset the view by hitting Ctrl+R
Combining find
and lnav
for great good
Going back to our examples using find
, now let’s try feeding several of our log files into lnav
. Let’s say we want to see all the logs related to registry pods. Looking back at our list of all files matching “*.log” from above, we notice that there are several containers in the registry pods:
./cluster-resources/pods/logs/kurl/registry-86d7f5fb54-5kxmt/registry-backup.log
./cluster-resources/pods/logs/kurl/registry-86d7f5fb54-5kxmt/restore.log
./cluster-resources/pods/logs/kurl/registry-86d7f5fb54-5kxmt/registry.log
./cluster-resources/pods/logs/kurl/registry-86d7f5fb54-m6t9x/registry-backup.log
./cluster-resources/pods/logs/kurl/registry-86d7f5fb54-m6t9x/restore.log
./cluster-resources/pods/logs/kurl/registry-86d7f5fb54-m6t9x/registry.log
It might be hard to create a filename pattern that matches all 3 of these container names, but we can change our search parameter in find
to let us use path names instead. Since all of the registry pods are contained in folders matching registry-*
, that might be a more suitable search pattern. Lets try find . -type f -iwholename ""
instead. -iwholename
is similiar to the -iname
option except that it searches the full path to a file for a match:
$ find . -type f -iwholename "*registry-*/*.log"
./cluster-resources/pods/logs/kurl/registry-86d7f5fb54-5kxmt/registry-backup.log
./cluster-resources/pods/logs/kurl/registry-86d7f5fb54-5kxmt/restore.log
./cluster-resources/pods/logs/kurl/registry-86d7f5fb54-5kxmt/registry.log
./cluster-resources/pods/logs/kurl/registry-86d7f5fb54-m6t9x/registry-backup.log
./cluster-resources/pods/logs/kurl/registry-86d7f5fb54-m6t9x/restore.log
./cluster-resources/pods/logs/kurl/registry-86d7f5fb54-m6t9x/registry.log
Finally, we can pass all of these files into lnav
using find
’s -exec
option:
find . -type f -iwholename "*registry-*/*.log" -exec lnav {} +;
Breaking this down a bit, we add the -exec
option that takes a few arguments of its own:
-exec cmd {} +;
cmd
is the command you want to run using the results of thefind
command{}
is the placeholder for the list of the files returned byfind
+;
or\;
the final argument tells find how to callcmd
; either with a concatenated list of all the results (+ mode) or by callingcmd
once for each argument (\ mode).
And notice how the different log lines from the different files will be interleaved based on timestamp:
Here’s another example; this time using a support bundle containing kURL host-collectors and looking for kubelet logs. I start by looking for any directories containing “kubelet” in the path, then filter more narrowly to only the log files I care about, and finally pass those to lnav
:
$ find . -type f -iwholename "*kubelet*"
./support-bundle-2023-12-11T15_52_31/host-collectors/diskUsage/var-lib-kubelet.json
./support-bundle-2023-12-11T15_52_31/host-collectors/run-host/systemctl-kubelet-status-info.json
./support-bundle-2023-12-11T15_52_31/host-collectors/run-host/systemctl-kubelet-status.txt
./support-bundle-2023-12-11T15_52_31/host-collectors/run-host/systemctl-cat-kubelet.txt
./support-bundle-2023-12-11T15_52_31/host-collectors/run-host/journalctl-kubelet-info.json
./support-bundle-2023-12-11T15_52_31/host-collectors/run-host/journalctl-kubelet.txt
./support-bundle-2023-12-11T15_52_31/host-collectors/run-host/systemctl-cat-kubelet-info.json
./support-bundle-2023-12-11T13_57_15/host-collectors/diskUsage/var-lib-kubelet.json
./support-bundle-2023-12-11T13_57_15/host-collectors/run-host/systemctl-kubelet-status-info.json
./support-bundle-2023-12-11T13_57_15/host-collectors/run-host/systemctl-kubelet-status.txt
./support-bundle-2023-12-11T13_57_15/host-collectors/run-host/systemctl-cat-kubelet.txt
./support-bundle-2023-12-11T13_57_15/host-collectors/run-host/journalctl-kubelet-info.json
./support-bundle-2023-12-11T13_57_15/host-collectors/run-host/journalctl-kubelet.txt
./support-bundle-2023-12-11T13_57_15/host-collectors/run-host/systemctl-cat-kubelet-info.json
$ find . -type f -iname "journalctl-kubelet*"
./support-bundle-2023-12-11T15_52_31/host-collectors/run-host/journalctl-kubelet-info.json
./support-bundle-2023-12-11T15_52_31/host-collectors/run-host/journalctl-kubelet.txt
./support-bundle-2023-12-11T13_57_15/host-collectors/run-host/journalctl-kubelet-info.json
./support-bundle-2023-12-11T13_57_15/host-collectors/run-host/journalctl-kubelet.txt
$ find . -type f -iname "journalctl-kubelet.txt"
./support-bundle-2023-12-11T15_52_31/host-collectors/run-host/journalctl-kubelet.txt
./support-bundle-2023-12-11T13_57_15/host-collectors/run-host/journalctl-kubelet.txt
$ find . -type f -iname "journalctl-kubelet.txt" -exec lnav {} +;
2024-04-08T13:55:21 EDT Press ` to focus on the breadcrumb bar
LOG ❭2023-12-04T13:58:26.000❭syslog_log❭[support-bundle-2023-12-11T13_57_15]/journalctl-kubelet.txt[0]❭
┌-- Logs begin at Sat 2023-11-25 07:04:06 CST, end at Mon 2023-12-11 13:58:22 CST. -- │
│Dec 04 13:58:26 abcd1234 kubelet[2800374]: E1204 13:58:26.431060 2800374 remote_runtime.go:269] "StopPodSandbox from runtime service│
│Dec 04 13:58:26 abcd1234 kubelet[2800374]: E1204 13:58:26.431121 2800374 kuberuntime_gc.go:176] "Failed to stop sandbox before remov│
│Dec 04 13:58:26 abcd1234 kubelet[2800374]: E1204 13:58:26.431947 2800374 remote_runtime.go:269] "StopPodSandbox from runtime service│
│Dec 04 13:58:26 abcd1234 kubelet[2800374]: E1204 13:58:26.431988 2800374 kuberuntime_gc.go:176] "Failed to stop sandbox before remov│
│Dec 04 13:58:26 abcd1234 kubelet[2800374]: E1204 13:58:26.432740 2800374 remote_runtime.go:269] "StopPodSandbox from runtime service│
│Dec 04 13:58:26 abcd1234 kubelet[2800374]: E1204 13:58:26.432808 2800374 kuberuntime_gc.go:176] "Failed to stop sandbox before remov│
│Dec 04 13:58:26 abcd1234 kubelet[2800374]: E1204 13:58:26.433572 2800374 remote_runtime.go:269] "StopPodSandbox from runtime service│
│Dec 04 13:58:26 abcd1234 kubelet[2800374]: E1204 13:58:26.433630 2800374 kuberuntime_gc.go:176] "Failed to stop sandbox before remov│
│Dec 04 13:58:26 abcd1234 kubelet[2800374]: E1204 13:58:26.434333 2800374 remote_runtime.go:269] "StopPodSandbox from runtime service│
│Dec 04 13:58:26 abcd1234 kubelet[2800374]: E1204 13:58:26.434384 2800374 kuberuntime_gc.go:176] "Failed to stop sandbox before remov│
│Dec 04 13:58:26 abcd1234 kubelet[2800374]: E1204 13:58:26.435021 2800374 remote_runtime.go:269] "StopPodSandbox from runtime service│
│Dec 04 13:58:26 abcd1234 kubelet[2800374]: E1204 13:58:26.435073 2800374 kuberuntime_gc.go:176] "Failed to stop sandbox before remov│
│Dec 04 13:58:26 abcd1234 kubelet[2800374]: E1204 13:58:26.435769 2800374 remote_runtime.go:269] "StopPodSandbox from runtime service│
│Dec 04 13:58:26 abcd1234 kubelet[2800374]: E1204 13:58:26.435821 2800374 kuberuntime_gc.go:176] "Failed to stop sandbox before remov│
│Dec 04 13:58:26 abcd1234 kubelet[2800374]: E1204 13:58:26.436492 2800374 remote_runtime.go:269] "StopPodSandbox from runtime service│
│Dec 04 13:58:26 abcd1234 kubelet[2800374]: E1204 13:58:26.436543 2800374 kuberuntime_gc.go:176] "Failed to stop sandbox before remov│
│Dec 04 13:58:26 abcd1234 kubelet[2800374]: E1204 13:58:26.437183 2800374 remote_runtime.go:269] "StopPodSandbox from runtime service│
│Dec 04 13:58:26 abcd1234 kubelet[2800374]: E1204 13:58:26.437234 2800374 kuberuntime_gc.go:176] "Failed to stop sandbox before remov│
│Dec 04 13:58:26 abcd1234 kubelet[2800374]: E1204 13:58:26.437924 2800374 remote_runtime.go:269] "StopPodSandbox from runtime service│
│Dec 04 13:58:26 abcd1234 kubelet[2800374]: E1204 13:58:26.437988 2800374 kuberuntime_gc.go:176] "Failed to stop sandbox before remov│
│Dec 04 13:58:26 abcd1234 kubelet[2800374]: E1204 13:58:26.438653 2800374 remote_runtime.go:269] "StopPodSandbox from runtime service│
Files :: Text Filters :: Press TAB to edit
L0 0% ?:View Help
✔ restored session from just now; press CTRL-R to reset session Press e/E to move forward/backward through error messages