Resolving KOTS "Pending Download" Issues

Overview

When KOTS (Kubernetes Off-The-Shelf) is configured for automatic updates, failed downloads due to failure modes such as Docker rate limits can create “Pending download” sequences in the release history. These orphaned entries can accumulate over time, leading to serious performance issues including high CPU usage.

This article provides a comprehensive guide to identify and resolve these issues in your KOTS deployment.

Understanding the Problem

Root Cause

When KOTS attempts to automatically download an upstream release, various issues can cause the download process to fail, leaving behind incomplete “Pending download” entries in the system.

Impact

  • Performance Degradation: Multiple pending downloads consume system resources, leading to high CPU usage by KOTS and inability to user Admin Console.

  • UI Clutter: The release history becomes polluted with duplicate pending entries

  • System Instability: In severe cases, accumulated entries can affect overall system stability as a side effect of etcd getting filled up by KOTS.

Solution Overview

:warning: CRITICAL: Backup Required

Create a backup before proceeding. Database changes are irreversible.

The resolution involves cleaning up pending download sequences from the KOTS database using rqlite commands.

Step-by-Step Resolution Guide

Prerequisites

  • kubectl access to your Kubernetes cluster

Clean Up Pending Download Sequences

Step 1: Access the KOTS Database Credentials

First, retrieve the rqlite database credentials:


kubectl get secret kotsadm-rqlite -o jsonpath='{.data.authconfig\.json}' | base64 -d

Make note of the password displayed in the output.

Step 2: Connect to the rqlite Pod

Execute into the rqlite pod:


kubectl exec -it statefulsets/kotsadm-rqlite -- bash

Step 3: Start the rqlite Shell

Once inside the pod, start an rqlite shell using the credentials from Step 1:


rqlite -u kotsadm:<password>

Replace <password> with the actual password retrieved earlier.

Step 4: Delete Pending Download Entries

Execute the following SQL commands to remove all pending download sequences:


delete from app_version where sequence in (select parent_sequence from app_downstream_version where status = 'pending_download');

delete from app_downstream_version where status = 'pending_download';

Step 5: Verify the Cleanup

You can verify the cleanup by checking if any pending downloads remain:


select count(*) from app_downstream_version where status = 'pending_download';

The result should be 0.

Prevention

To prevent this issue from occurring, disable automatic updates in KOTS until this issue is resolved in a future release.

Troubleshooting Tips

Issue: Cannot connect to rqlite

  • Verify the pod name and namespace

  • Ensure the kotsadm-rqlite secret exists

  • Check pod status with kubectl get pods

Issue: High CPU usage persists

  • Verify all pending downloads were removed

  • Check for other resource-intensive operations

  • Review KOTS logs for other issues

Conclusion

By following this guide, you can resolve existing pending download issues that may be causing high CPU usage in your KOTS deployment. Remember to always backup your data before performing database maintenance operations.


This article addresses a known issue where failed automatic updates in KOTS can lead to performance problems. This issue will be fixed in a future release. Always test these procedures in a non-production environment first.

@Evans_Mungai, would like to clarify if the value of status in the rqlite database is the same across all environments, or can it be different depending on how customers deploy? You mentioned running

select count(*) from app_downstream_version where status = 'pending_download';

but in our env, we found it to be pending:

127.0.0.1:4001> select distinct status from app_downstream_version;
+----------+
| status   |
+----------+
| deployed |
+----------+
| pending  |
+----------+

Thanks,
Migs

Below are all the possible values the status column in a DB row can have. Its also the same for all deployment types

"unknown"                    // we don't know
"pending_cluster_management" // needs cluster configuration
"pending_config"             // needs required configuration
"pending_download"           // needs to be downloaded from the upstream source
"pending_preflight"          // waiting for preflights to finish
"pending"                    // can be deployed, but is not yet
"deploying"                  // is being deployed
"deployed"                   // did deploy successfully
"failed"                     // did not deploy successfully

In your query, all records in app_downstream_version have the status field as pending or deployed which is expected behaviour. The status field is transitioned through different values as kotsadm processes a sequence. Once processing is complete, the sequence should either be in deployed or failed status. These are the valid terminal states.

Having sequences stuck in any other status for a prolonged period is an indication of there being a problem which needs to be investigated. Please raise a support ticket for Replicated support team to investigate. Ensure to attach a support bundle.

1 Like