Error running installation script (Rook 1.0.4 is not compatible with Kubernetes 1.20+)

Hi,
We got this error “Rook 1.0.4 is not compatible with Kubernetes 1.20+” while running the installation script. Our installer is never changed and everything seems “latest” in it. Any idea on how to fix this?

Best,
Zey

1 Like

Hi there!

I’m afraid that there wasn’t a good path from Rook-Ceph 1.0 to 1.5+ (due to the removal of hostpath storage), but Rook-Ceph 1.0.4 only supports Kubernetes versions up to 1.19. So we can’t change Rook-Ceph ‘latest’ to a more modern version, but leaving k8s ‘latest’ as 1.19 would lead to security vulnerabilities remaining unpatched.

If you wish to continue using k8s 1.19, you can set the version to 1.19.x.

I would recommend instead removing Rook-Ceph and instead including Longhorn and Minio, which will automatically migrate data from Rook-Ceph as described here.

Thanks! This is really helpful. I have some other questions:

  1. Shall we use either Longhorn or Minio, or both? They both seem storage solutions
  2. Is there is a latest default installer we could refer to? We have never updated it since we started using Replicated.

You’d use both - Longhorn provides PVCs, while MinIO uses PVCs to provide block storage (s3), which is used by some other things like the builtin docker registry and kotsadm.

The latest yaml is viewable in the right box at https://kurl.sh/ - at time of writing that is the following yaml:

metadata: 
  name: "latest"
spec: 
  kubernetes: 
    version: "1.21.x"
  weave: 
    version: "2.6.x"
  contour: 
    version: "1.19.x"
  prometheus: 
    version: "0.49.x"
  registry: 
    version: "2.7.x"
  containerd: 
    version: "1.4.x"
  ekco: 
    version: "latest"
  minio: 
    version: "2020-01-25T02-50-51Z"
  longhorn: 
    version: "1.2.x"
1 Like

Thanks! it seems working now but there is a problem that minio-pv-claim is bound to the original PV we use for our own application, is there a way to bind the PV back to our application? And also avoids this in the future when we run the script again?

ubuntu@ip-172-31-37-111:~$ kubectl get pvc
NAME                                  STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
einblick-claim                        Pending   pvc-90c94bd5-55ff-4a8a-b040-efa4f1fb70c3   0                         longhorn       84m
kotsadm-postgres-kotsadm-postgres-0   Bound     pvc-0c7360e5-d0cc-4d4b-b517-742732ef1980   1Gi        RWO            longhorn       84m
ubuntu@ip-172-31-37-111:~$ kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                           STORAGECLASS   REASON   AGE
pvc-0c7360e5-d0cc-4d4b-b517-742732ef1980   1Gi        RWO            Delete           Bound    default/kotsadm-postgres-kotsadm-postgres-0     longhorn                3h51m
pvc-45304dd7-a0bf-455b-8e8b-e8a5caeb81e4   10Gi       RWO            Delete           Bound    monitoring/prometheus-k8s-db-prometheus-k8s-0   longhorn                3h51m
pvc-90c94bd5-55ff-4a8a-b040-efa4f1fb70c3   61036Mi    RWO            Delete           Bound    minio/minio-pv-claim                            longhorn                3h51m
pvc-b35c181b-db20-47ca-8075-8e19f3a6a3f9   10Gi       RWO            Delete           Bound    monitoring/prometheus-k8s-db-prometheus-k8s-1   longhorn                3h51m

That is decidedly NOT the normal/desired output when running this script. Would you mind posting kubectl get pvc -o yaml einblick-claim here? I just want to see what StorageClass is specified, etc. Presumably the 61036Mi PV was what was originally backing that?

ubuntu@ip-172-31-37-111:~$ kubectl get pvc -o yaml einblick-claim
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  creationTimestamp: "2022-01-04T19:06:03Z"
  finalizers:
  - kubernetes.io/pvc-protection
  labels:
    kots.io/app-slug: einblick
    kots.io/backup: velero
  name: einblick-claim
  namespace: default
  resourceVersion: "183359926"
  uid: b074008c-f5e8-4fdb-9dc3-6553f2fc4d9b
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 64G
  storageClassName: longhorn
  volumeMode: Filesystem
  volumeName: pvc-90c94bd5-55ff-4a8a-b040-efa4f1fb70c3
status:
  phase: Pending

Yeah this PV was originally backing our application (bound to this PVC)

I am still rather confused as to how this all got into this state.
Would you mind uploading a support bundle?

I was relatively sure that I had stomped out all of the obvious bugs in pvmigrate, but evidently not if it can improperly route things again after the migration. (perhaps the MinIO PVC was freshly created, and was bound while the migration process was in progress? But that would likely have caused errors in pvmigrate itself - did that complete successfully?)

The fact that the PVCs were created 84 minutes ago and the PVs multiple hours ago might be related, if you ran those commands at the same time - but after running pvmigrate they should all match up within a few minutes in my experience.

EDIT: Also, hopefully this is a test instance? If not, we should arrange a time to do some recovery work. (As presumably you’d want the app data back and mounted to the right PVC)

Thanks! The support bundle can be downloaded at supportbundle-2022-01-04T21_47_43.tar.gz - Google Drive

Yeah this is a test instance but we are worried about if this could happen again on our customer side, therefore it would be great if we can figure out a way to recover it.

And just to make sure, the kurl.sh script completed successfully when it did the migration in question?

Do you still have the logs from that run?

Logs: logs.txt - Google Drive

Found pod einblick-xxxx" seemed stuck so I deleted the pod and then it continued. Was that an issue?

No, deleting that pod was exactly the right thing to do! pvmigrate should scale down the deployment for the pod, but if there’s a pod disruption budget keeping it running that can fail.

And from the logs, it appears that minio was NOT one of the PVCs being migrated, so that’s the race condition in question! I’ll have to see what can be done on the kURL side to make sure that PVC is bound before the migration starts.

Thanks. One thing might be helpful is that I added Longhorn first and the script was stuck at the same place. Then I stoppped that script, added Minio and ran the updated script again.

Hmm, might be relevant

Thanks!

1 Like

Thanks. Do you think it is safe to push a script with both minio and longhorn? Or we better wait?

I think this scenario is generally safe, but to be extra sure you could make sure that the minio PVC is mounted properly before deleting your application process when pvmigrate is running.

Sounds good, thanks!

If you DO experience an issue with this, and the minio PVC was already associated with a PV, please let me know! (I don’t know how that would happen, but things don’t need to be understood by me in order to happen…)

1 Like

Thanks! We did a fresh installation with the old script and tried running the new script to make sure the migration would work for our customers. However, it got stuck at this line:

Found 2 matching PVCs to migrate across 1 namespaces:
namespace: pvc:                               pv:                                      size:
monitoring prometheus-k8s-db-prometheus-k8s-0 pvc-85a00387-076b-455c-a5e0-929adae6d026 10Gi
monitoring prometheus-k8s-db-prometheus-k8s-1 pvc-d54aafc3-dab6-435f-8e1b-1264a485493b 10Gi

Creating new PVCs to migrate data to using the longhorn StorageClass
created new PVC prometheus-k8s-db-prometheus-k8s-0-pvcmigrate with size 10Gi in monitoring
found existing PVC with name prometheus-k8s-db-prometheus-k8s-1-pvcmigrate, not creating new one

Found 1 matching pods to migrate across 1 namespaces:
namespace: pod:
monitoring prometheus-k8s-0

Scaling down StatefulSets and Deployments with matching PVCs
scaling StatefulSet prometheus-k8s from 1 to 0 in monitoring

Waiting for pods with mounted PVCs to be cleaned up
Found pod prometheus-k8s-0 in monitoring mounting to-be-migrated PVC prometheus-k8s-db-prometheus-k8s-0, waiting
All pods removed successfully

Copying data from default PVCs to longhorn PVCs
Copying data from prometheus-k8s-db-prometheus-k8s-0 (pvc-85a00387-076b-455c-a5e0-929adae6d026) to prometheus-k8s-db-prometheus-k8s-0-pvcmigrate in monitoring
waiting for pod migrate-prometheus-k8s-db-prometheus-k8s-0 to start in monitoring

We found that migrate-prometheus-k8s-db-prometheus-k8s-0 is unable to start because it cannot mount the source volume, which is in use by prometheus-k8s-0, any idea of what we should do here?

Thanks!

Hmm, generally the script should run the prometheus scale down/scale up commands itself - but you can run kubectl patch prometheus -n monitoring k8s --type='json' --patch '[{"op": "replace", "path": "/spec/replicas", value: 0}]' to get past this.