We are currently supporting a customer installing an embedded cluster setup.
The kurl installer declaration we are using includes some outdated add-ons which we want to upgrade.
Specifically, we are investigating to replace Rook with Longhorn, however, it utilizes disk space differently and requests more CPU upfront.
Therefore since hardware resource usage appears to differ depending on certain Kurl add-ons being selected, such as Longhorn or Rook, is there maybe an official document or guideline that delves deeper into the system specs required and would help us to provide some clarification to our customer?
I don’t have a super hard-line answer here, but I don’t believe we have a way to calculate this today in advance. There’s the brute force “add up all the CPU and Memory requests across all pods” or even just “spin up an instance with some headroom and see what gets used”
As an aside though, I’m interested to learn why you’re looking to switch from Rook to Longhorn – my understanding is that more of our development effort is going into Rook these days and it has more recent supported versions available.
Thanks for the input, we weren’t aware that Rook is receiving more focus.
Initially we were planning to upgrade Rook from 1.0.4, but found that newer versions (1.7.x) require an additional disk mount to be added specifically for Rook, otherwise the installer fails. We started looking into using Longhorn since it offers similar functionality to Rook but without this specific requirement, as well as the Longhorn Dashboard working out of the box.
Are there any significant advantages or recommendations we should be aware of to stick with Rook? Are there any long-term plans to discontinue Longhorn support?
Thank you for any feedback!
While Longhorn does allow you to use folders still as opposed to block devices they do recommend you use block devices. The reality is distributed storage is best run on block devices for both data stability and performance. Rook (Ceph) removed the option specifically because it generates more failures and support issues than the project deemed responsible.
We do not currently expect to maintain long term support of Longhorn, of course that could change in the future and we will support environments that have used it in the interim. For a bit of insight, our experience after deploying and using some Longhorn environments are that it has a number of failure conditions that cause support issues which have no identified root cause. Here are two examples of cases we’ve been working with upstream and not received a particularly impressive response to.
Currently rebooting nodes deployed with Longhorn in the spec shows a high rate of failure after the reboot. Additionally we’ve seen volume corruption after excessive writes or reboots. We aren’t currently able to reliably recommend a resolution to either.
Our new default recommendation is OpenEBS local-pv for local and Rook for distributed storage, with Rook requiring dedicated block devices as you have stated. This is still a work in progress although you will see the new default spec has now made this change we haven’t yet recommended migrations for existing users.
I hope that helps clarify our experience with Longhorn and why we’ve decided to return to Rook although with increased requirements to ensure when it is used it is stable.