Best Practices for Migrating a single-node instance from Replicated Native Scheduler to Replicated KOTS

dex · July 1, 2022, 6:42pm

Got another great question today

We’re migrating a Native Scheduler instance to Replicated’s Kubernetes-Based KOTS App Manager.
The end user would like to stop the Native Scheduler processes, install the KOTS version, and then restore all their configuration.
If anything goes wrong, they need the option to roll back the change and go back to running the Native Scheduler version.
What are the best practices for maximizing the chances of success for this? What steps should we follow?

DISCLAIMER

This assumes the deployed app has no state stored on the node in question.
This will help to migrate internal Relicated state like configuration options, but all app state is expected to be in external databases configured by customer-provided ConfigOption parameters.
For applications that store state in embedded databases, you will need to build a plan to migrate that manually as well.

Before you start

Familiarize yourself with the methods for collecting a support bundle and sharing support bundles with the Replicated team. Ensure you have access to Replicated from your team, and access to the shared private GitHub repo used to track support requests.
Familiarize yourself with the process for submitting a support issue at Replicated
Take a snapshot of the instance in case you need to restore to a new instance.
This guide assumes no application state, and that all state is stored externally.

Step-by-step guide

1. Take a Snapshot

If you skipped the “Before you start” steps, go back and do them, including taking an snapshot of the running application.
If you’re using direct-to-disk snapshots, it might be worth backing up the snapshots directory on a separate server.

2. Export Config

Use the replicated CLI to export the application’s configuration options. Store this in a safe place. See the docs.

Run replicatedctl app-config export --hidden to export all configuration including passwords, or replicatedctl app-config export (without --hidden) to export only non-password items.

3. Get your certificates

For Native Scheduler apps, these will be on-disk at $LOCATION, or you can re-provision these from wherever you normally get your certs.
In either case, drop them in a working directory on the server at tls.crt and tls.key (or whatever filenames you prefer).

4. Stop Application

Stop the application via the Replicated UI.

5. Stop Replicated

Use the relevant init command to stop the replicated processes, for example, on systemd servers, run the following as root or with sudo:

systemctl stop replicated && systemctl disable replicated
systemctl stop replicated-ui && systemctl disable replicated-ui
systemctl stop replicated-operator && systemctl disable replicated-operator

To Do – are there other containers that have to be stopped? Are there other containers that should be stopped?
To Do – as long as the app is stopped, can the replicated containers be left up and running during this process, and only stopped on success?

6. Installing KOTS

Run the kubernetes installer command, something like curl https://k8s.kurl.sh/app-name | sudo bash

7. Configuring and bootstrapping KOTS

Upload your certs, upload a license, and use your exported config options to fill the new config screen.

8. Validate the application

Check to make sure everything is working as expected.

If something goes wrong

Follow the steps at collecting a support bundle and sharing support bundles with the Replicated team to submit a support issue to the Replicated team.

Rolling back

To do – is there a safe way to do this? I’m guessing the kurl uninstall script will also try to rip out docker and friends, which might mean you have to re-run the Replicated Native install script to reinstall docker and friends, but will pick up the previous data dirs?

Proposal (needs sudo)

kubeadm reset
systemctl start replicated && systemctl enable replicated
systemctl start replicated-ui && systemctl enable replicated-ui
systemctl start replicated-operator && systemctl enable replicated-operator
# start the app in the replicated UI
# if anything goes wrong, restart the server and repeat

dmitriy · July 1, 2022, 9:40pm

To Do – are there other containers that have to be stopped? Are there other containers that should be stopped?
To Do – as long as the app is stopped, can the replicated containers be left up and running during this process, and only stopped on success?

It is safer to ensure that nothing is running. After application is stopped and Replicated system services are stopped, remove all leftover containers manually like this:

docker rm -f $(docker ps -aq)

laverya · July 1, 2022, 10:45pm

If things continue to go wrong, that’s what the snapshot you made beforehand is for

dex · July 1, 2022, 11:06pm