Hi,
We got this error “Rook 1.0.4 is not compatible with Kubernetes 1.20+” while running the installation script. Our installer is never changed and everything seems “latest” in it. Any idea on how to fix this?
Best,
Zey
Hi,
We got this error “Rook 1.0.4 is not compatible with Kubernetes 1.20+” while running the installation script. Our installer is never changed and everything seems “latest” in it. Any idea on how to fix this?
Best,
Zey
Hi there!
I’m afraid that there wasn’t a good path from Rook-Ceph 1.0 to 1.5+ (due to the removal of hostpath storage), but Rook-Ceph 1.0.4 only supports Kubernetes versions up to 1.19. So we can’t change Rook-Ceph ‘latest’ to a more modern version, but leaving k8s ‘latest’ as 1.19 would lead to security vulnerabilities remaining unpatched.
If you wish to continue using k8s 1.19, you can set the version to 1.19.x
.
I would recommend instead removing Rook-Ceph and instead including Longhorn and Minio, which will automatically migrate data from Rook-Ceph as described here.
Thanks! This is really helpful. I have some other questions:
You’d use both - Longhorn provides PVCs, while MinIO uses PVCs to provide block storage (s3), which is used by some other things like the builtin docker registry and kotsadm.
The latest yaml is viewable in the right box at https://kurl.sh/ - at time of writing that is the following yaml:
metadata:
name: "latest"
spec:
kubernetes:
version: "1.21.x"
weave:
version: "2.6.x"
contour:
version: "1.19.x"
prometheus:
version: "0.49.x"
registry:
version: "2.7.x"
containerd:
version: "1.4.x"
ekco:
version: "latest"
minio:
version: "2020-01-25T02-50-51Z"
longhorn:
version: "1.2.x"
Thanks! it seems working now but there is a problem that minio-pv-claim
is bound to the original PV we use for our own application, is there a way to bind the PV back to our application? And also avoids this in the future when we run the script again?
ubuntu@ip-172-31-37-111:~$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
einblick-claim Pending pvc-90c94bd5-55ff-4a8a-b040-efa4f1fb70c3 0 longhorn 84m
kotsadm-postgres-kotsadm-postgres-0 Bound pvc-0c7360e5-d0cc-4d4b-b517-742732ef1980 1Gi RWO longhorn 84m
ubuntu@ip-172-31-37-111:~$ kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-0c7360e5-d0cc-4d4b-b517-742732ef1980 1Gi RWO Delete Bound default/kotsadm-postgres-kotsadm-postgres-0 longhorn 3h51m
pvc-45304dd7-a0bf-455b-8e8b-e8a5caeb81e4 10Gi RWO Delete Bound monitoring/prometheus-k8s-db-prometheus-k8s-0 longhorn 3h51m
pvc-90c94bd5-55ff-4a8a-b040-efa4f1fb70c3 61036Mi RWO Delete Bound minio/minio-pv-claim longhorn 3h51m
pvc-b35c181b-db20-47ca-8075-8e19f3a6a3f9 10Gi RWO Delete Bound monitoring/prometheus-k8s-db-prometheus-k8s-1 longhorn 3h51m
That is decidedly NOT the normal/desired output when running this script. Would you mind posting kubectl get pvc -o yaml einblick-claim
here? I just want to see what StorageClass is specified, etc. Presumably the 61036Mi
PV was what was originally backing that?
ubuntu@ip-172-31-37-111:~$ kubectl get pvc -o yaml einblick-claim
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
creationTimestamp: "2022-01-04T19:06:03Z"
finalizers:
- kubernetes.io/pvc-protection
labels:
kots.io/app-slug: einblick
kots.io/backup: velero
name: einblick-claim
namespace: default
resourceVersion: "183359926"
uid: b074008c-f5e8-4fdb-9dc3-6553f2fc4d9b
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 64G
storageClassName: longhorn
volumeMode: Filesystem
volumeName: pvc-90c94bd5-55ff-4a8a-b040-efa4f1fb70c3
status:
phase: Pending
Yeah this PV was originally backing our application (bound to this PVC)
I am still rather confused as to how this all got into this state.
Would you mind uploading a support bundle?
I was relatively sure that I had stomped out all of the obvious bugs in pvmigrate, but evidently not if it can improperly route things again after the migration. (perhaps the MinIO PVC was freshly created, and was bound while the migration process was in progress? But that would likely have caused errors in pvmigrate itself - did that complete successfully?)
The fact that the PVCs were created 84 minutes ago and the PVs multiple hours ago might be related, if you ran those commands at the same time - but after running pvmigrate they should all match up within a few minutes in my experience.
EDIT: Also, hopefully this is a test instance? If not, we should arrange a time to do some recovery work. (As presumably you’d want the app data back and mounted to the right PVC)
Thanks! The support bundle can be downloaded at supportbundle-2022-01-04T21_47_43.tar.gz - Google Drive
Yeah this is a test instance but we are worried about if this could happen again on our customer side, therefore it would be great if we can figure out a way to recover it.
And just to make sure, the kurl.sh script completed successfully when it did the migration in question?
Do you still have the logs from that run?
Logs: logs.txt - Google Drive
Found pod einblick-xxxx" seemed stuck so I deleted the pod and then it continued. Was that an issue?
No, deleting that pod was exactly the right thing to do! pvmigrate should scale down the deployment for the pod, but if there’s a pod disruption budget keeping it running that can fail.
And from the logs, it appears that minio was NOT one of the PVCs being migrated, so that’s the race condition in question! I’ll have to see what can be done on the kURL side to make sure that PVC is bound before the migration starts.
Thanks. One thing might be helpful is that I added Longhorn first and the script was stuck at the same place. Then I stoppped that script, added Minio and ran the updated script again.
Hmm, might be relevant
Thanks!
Thanks. Do you think it is safe to push a script with both minio and longhorn? Or we better wait?
I think this scenario is generally safe, but to be extra sure you could make sure that the minio PVC is mounted properly before deleting your application process when pvmigrate is running.
Sounds good, thanks!
If you DO experience an issue with this, and the minio PVC was already associated with a PV, please let me know! (I don’t know how that would happen, but things don’t need to be understood by me in order to happen…)
Thanks! We did a fresh installation with the old script and tried running the new script to make sure the migration would work for our customers. However, it got stuck at this line:
Found 2 matching PVCs to migrate across 1 namespaces:
namespace: pvc: pv: size:
monitoring prometheus-k8s-db-prometheus-k8s-0 pvc-85a00387-076b-455c-a5e0-929adae6d026 10Gi
monitoring prometheus-k8s-db-prometheus-k8s-1 pvc-d54aafc3-dab6-435f-8e1b-1264a485493b 10Gi
Creating new PVCs to migrate data to using the longhorn StorageClass
created new PVC prometheus-k8s-db-prometheus-k8s-0-pvcmigrate with size 10Gi in monitoring
found existing PVC with name prometheus-k8s-db-prometheus-k8s-1-pvcmigrate, not creating new one
Found 1 matching pods to migrate across 1 namespaces:
namespace: pod:
monitoring prometheus-k8s-0
Scaling down StatefulSets and Deployments with matching PVCs
scaling StatefulSet prometheus-k8s from 1 to 0 in monitoring
Waiting for pods with mounted PVCs to be cleaned up
Found pod prometheus-k8s-0 in monitoring mounting to-be-migrated PVC prometheus-k8s-db-prometheus-k8s-0, waiting
All pods removed successfully
Copying data from default PVCs to longhorn PVCs
Copying data from prometheus-k8s-db-prometheus-k8s-0 (pvc-85a00387-076b-455c-a5e0-929adae6d026) to prometheus-k8s-db-prometheus-k8s-0-pvcmigrate in monitoring
waiting for pod migrate-prometheus-k8s-db-prometheus-k8s-0 to start in monitoring
We found that migrate-prometheus-k8s-db-prometheus-k8s-0
is unable to start because it cannot mount the source volume, which is in use by prometheus-k8s-0
, any idea of what we should do here?
Thanks!
Hmm, generally the script should run the prometheus scale down/scale up commands itself - but you can run kubectl patch prometheus -n monitoring k8s --type='json' --patch '[{"op": "replace", "path": "/spec/replicas", value: 0}]'
to get past this.