KOTS: managed Kubernetes ceph rook not available by default

Hi There,
We are planning to migrate to KOTS.
We are trying it out on DigitalOcean’s Managed Kubernetes. It is found that in managed Kubernetes KOTS won’t install the Ceph cluster. Is this the actual behavior or are we experiencing something strange?
If it is the actual behavior are we supposed to install Ceph rook following this and change default storage class to rook? or are we supposed to do something else?
I believe by default in replicated some of the components from the above Ceph rook installations will be skipped.

Hi Mani – great question.

Reason 1 – cluster operator domain

There are two methods for installing w/ KOTS, the “embedded cluster” where the https://kurl.sh platform brings everything needed for the cluster, including storage via rook ceph.

In the alternate mode “existing cluster”, which is what you’re doing when you install to a managed kubernetes provider, we expect the cluster operator (e.g. your customer) to bring resources for things like:

  • storage
  • networking
  • ingress

In short, you are shifting the burden of building/maintaining Kubernetes to your end customer, for those customers who want to use it. For example, in AWS EKS, an EBS-based storage class is provided. In GCP, a Google-Cloud-Disk integration is made availble. We wouldn’t want to install something like rook-ceph when there’s a tight cloud-provider integration for storage.

Reason 2: Installation Requirements

This is especially true of rook-ceph as it requires privileged mode on any nodes that would provide rook-backed volumes in order to install special kernel-level integrations like rbd that are required to make ceph work. When installing into someone else’s cluster, we find that requiring privileged mode to modify nodes can create a lot of friction or be a total blocker to installing. We don’t have this problem in the “embedded cluster” since the user is bringing bare nodes and kURL is laying down the whole cluster.

Does that help with your question?

Yeah that helps.
Now it means we have to figure out how to create ReadWriteMany pvc’s on top of cloud provider’s storage classes. I think we might need rook for that. Isn’t it?

Hi Mani – yes if you need ReadWriteMany then you’ll need some kind of distributed filesystem. Rook can help with that, or EFS can be used in AWS, and I believe GCP and Azure also provide a NFS-as-a-service for this.

My real recommendation, however, would be to find a way to architect your app so that you don’t need ReadWriteMany volumes. A distributed multi-writer filesystem can be quite unstable depending on your scaling needs, and can quickly become difficult to maintain when you don’t have access to the infrastructure to debug. If you can swing it, building your app on an Object Store (S3-compatible API) is a much more performant and stable way to create a portable storage layer.

Thanks for the recommendation. It was really helpful. We will check those.