Got another great question today
We’re migrating a Swarm Scheduler instance to Replicated’s Kubernetes-Based KOTS App Manager.
The end user would like to stop the Swarm Scheduler processes, install the KOTS version, and then restore all their configuration.
If anything goes wrong, they need the option to roll back the change and go back to running the Swarm Scheduler version.
What are the best practices for maximizing the chances of success for this? What steps should we follow?
DISCLAIMER
This assumes the deployed app has no state stored on the node in question.
This will help to migrate internal Relicated state like configuration options, but all app state is expected to be in external databases configured by customer-provided ConfigOption
parameters.
For applications that store state in embedded databases, you will need to build a plan to migrate that manually as well.
Before you start
- Familiarize yourself with the methods for collecting a support bundle and sharing support bundles with the Replicated team. Ensure you have access to Replicated from your team, and access to the shared private GitHub repo used to track support requests.
- Familiarize yourself with the process for submitting a support issue at Replicated
- Take a snapshot of the instance in case you need to restore to a new instance.
- This guide assumes no application state, and that all state is stored externally.
Step-by-step guide
1. Take a Snapshot
If you skipped the “Before you start” steps, go back and do them, including taking an snapshot of the running application.
If you’re using direct-to-disk snapshots, it might be worth backing up the snapshots directory on a separate server.
2. Export Config
Use the replicated
CLI to export the application’s configuration options. Store this in a safe place. See the docs.
Run replicatedctl app-config export --hidden
to export all configuration including passwords, or replicatedctl app-config export
(without --hidden
) to export only non-password items.
3. Get your certificates
For Swarm Scheduler apps, these will be on-disk at $LOCATION, or you can re-provision these from wherever you normally get your certs.
In either case, drop them in a working directory on the server at tls.crt
and tls.key
(or whatever filenames you prefer).
4. Stop Application
Stop the application via the Replicated UI.
5. Stop Replicated
Use the relevant init command to stop the replicated processes, for example, on systemd servers, run the following as root or with sudo
:
systemctl stop replicated && systemctl disable replicated
systemctl stop replicated-ui && systemctl disable replicated-ui
systemctl stop replicated-operator && systemctl disable replicated-operator
To Do – are there other containers that have to be stopped? Are there other containers that should be stopped?
To Do – as long as the app is stopped, can the replicated containers be left up and running during this process, and only stopped on success?
6. Installing KOTS
Run the kubernetes installer command, something like curl https://k8s.kurl.sh/app-name | sudo bash
7. Configuring and bootstrapping KOTS
Upload your certs, upload a license, and use your exported config options to fill the new config screen.
8. Validate the application
Check to make sure everything is working as expected.
If something goes wrong
Follow the steps at collecting a support bundle and sharing support bundles with the Replicated team to submit a support issue to the Replicated team.
Rolling back
To do – is there a safe way to do this? I’m guessing the kurl uninstall script will also try to rip out docker and friends, which might mean you have to re-run the Replicated Swarm install script to reinstall docker and friends, but will pick up the previous data dirs?
Proposal (needs sudo)
kubeadm reset
systemctl start replicated && systemctl enable replicated
systemctl start replicated-ui && systemctl enable replicated-ui
systemctl start replicated-operator && systemctl enable replicated-operator
# start the app in the replicated UI
# if anything goes wrong, restart the server and repeat