Error running installation script (Rook 1.0.4 is not compatible with Kubernetes 1.20+)

Cool, thanks! It went through that and failed with the following error

Swapping PVC prometheus-k8s-db-prometheus-k8s-0 in monitoring to the new StorageClass
Marking original PV pvc-a4fbfba9-f5e7-43b3-96b8-bfa5b350bfd0 as to-be-retained
Marking migrated-to PV pvc-0b327f5e-341c-4fbe-ada0-d35185df8bec as to-be-retained
Deleting original PVC prometheus-k8s-db-prometheus-k8s-0 in monitoring to free up the name
Deleting migrated-to PVC prometheus-k8s-db-prometheus-k8s-0 in monitoring to release the PV
Removing claimref from original PV pvc-a4fbfba9-f5e7-43b3-96b8-bfa5b350bfd0
Removing claimref from migrated-to PV pvc-0b327f5e-341c-4fbe-ada0-d35185df8bec
Creating new PVC prometheus-k8s-db-prometheus-k8s-0 with migrated-to PV pvc-0b327f5e-341c-4fbe-ada0-d35185df8bec
failed to swap PVs for PVC prometheus-k8s-db-prometheus-k8s-0 in monitoring: failed to create migrated-to PVC prometheus-k8s-db-prometheus-k8s-0 in monitoring: object is being deleted: persistentvolumeclaims "prometheus-k8s-db-prometheus-k8s-0" already exists

Looks like you just ran into "Object is being deleted" Error during migration · Issue #56 · replicatedhq/pvmigrate · GitHub - I just made a PR to fix that (after seeing you hit it, it became more urgent) wait for PVCs to be deleted before reusing the PVC's name by laverya · Pull Request #57 · replicatedhq/pvmigrate · GitHub

Great, thanks! Is there any workaround we can do for now? Also, after this is merged, when will the change be released?

Hi I saw you guys have a new tag here (Tags · replicatedhq/pvmigrate · GitHub), I am wondering if the change has been reflected to the latest script? Thanks!

Indeed, that was released in kURL yesterday! https://kurl.sh/release-notes/v2022.01.18-0

1 Like

Thanks! I just tried it and it seems fixing the previous issue. However it failed again

Migrating data from default to longhorn
PV pvc-0b327f5e-341c-4fbe-ada0-d35185df8bec does not match source SC default, not migrating
PV pvc-5bb88f68-d88d-4f9f-b3da-6635237bb096 does not match source SC default, not migrating
PV pvc-7c182312-4d57-4b05-98f3-5c9794fd3690 does not match source SC default, not migrating
PV pvc-870b07c5-215c-4081-ae29-6d6332e54ba6 does not match source SC default, not migrating

Found 3 matching PVCs to migrate across 1 namespaces:
namespace: pvc:                               pv:                                      size:
monitoring prometheus-k8s-db-prometheus-k8s-0 pvc-0b327f5e-341c-4fbe-ada0-d35185df8bec 0
monitoring prometheus-k8s-db-prometheus-k8s-0 pvc-0b327f5e-341c-4fbe-ada0-d35185df8bec 0
monitoring prometheus-k8s-db-prometheus-k8s-1 pvc-d54aafc3-dab6-435f-8e1b-1264a485493b 10Gi

Creating new PVCs to migrate data to using the longhorn StorageClass
failed to find existing PV pvc-0b327f5e-341c-4fbe-ada0-d35185df8bec for PVC prometheus-k8s-db-prometheus-k8s-0 in monitoring

The same PVC shows up twice, wondering if you have any idea what could be the issue?

That pvc pvc-0b327f5e-341c-4fbe-ada0-d35185df8bec is in there three times actually - it’s also listed as “not default SC, not migrating”. Would you mind running kubectl get pvc -A and kubectl get pv and posting that here?

Is this still the same test instance as before?

Here you go. Yeah it is the same instance.

NAMESPACE    NAME                                            STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
default      einblick-claim                                  Bound    pvc-870b07c5-215c-4081-ae29-6d6332e54ba6   61036Mi    RWO            longhorn       13d
default      kotsadm-postgres-kotsadm-postgres-0             Bound    pvc-7c182312-4d57-4b05-98f3-5c9794fd3690   1Gi        RWO            longhorn       13d
monitoring   prometheus-k8s-db-prometheus-k8s-0              Bound    pvc-0b327f5e-341c-4fbe-ada0-d35185df8bec   10Gi       RWO            longhorn       3h4m
monitoring   prometheus-k8s-db-prometheus-k8s-1              Bound    pvc-d54aafc3-dab6-435f-8e1b-1264a485493b   10Gi       RWO            default        13d
monitoring   prometheus-k8s-db-prometheus-k8s-1-pvcmigrate   Bound    pvc-5bb88f68-d88d-4f9f-b3da-6635237bb096   10Gi       RWO            longhorn       13d
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS     CLAIM                                                      STORAGECLASS   REASON   AGE
pvc-0b327f5e-341c-4fbe-ada0-d35185df8bec   10Gi       RWO            Delete           Bound      monitoring/prometheus-k8s-db-prometheus-k8s-0              longhorn                13d
pvc-5bb88f68-d88d-4f9f-b3da-6635237bb096   10Gi       RWO            Delete           Bound      monitoring/prometheus-k8s-db-prometheus-k8s-1-pvcmigrate   longhorn                13d
pvc-7c182312-4d57-4b05-98f3-5c9794fd3690   1Gi        RWO            Delete           Bound      default/kotsadm-postgres-kotsadm-postgres-0                longhorn                13d
pvc-85a00387-076b-455c-a5e0-929adae6d026   10Gi       RWO            Retain           Released   monitoring/prometheus-k8s-db-prometheus-k8s-0              default                 13d
pvc-870b07c5-215c-4081-ae29-6d6332e54ba6   61036Mi    RWO            Delete           Bound      default/einblick-claim                                     longhorn                13d
pvc-a4fbfba9-f5e7-43b3-96b8-bfa5b350bfd0   10Gi       RWO            Retain           Released   monitoring/prometheus-k8s-db-prometheus-k8s-0              default                 13d
pvc-d54aafc3-dab6-435f-8e1b-1264a485493b   10Gi       RWO            Delete           Bound      monitoring/prometheus-k8s-db-prometheus-k8s-1              default                 13d

Would it be possible for you to test again on a fresh instance? I think that the unsuccessful migrations here caused an issue, and while we could fix it with surgery that is not something I intend to automate.

Sounds good. Will do that and let you know

We got this error:

failed to scale down pods: pod prometheus-k8s-0 in monitoring mounting prometheus-k8s-db-prometheus-k8s-0 was created at 2022-01-27T16:09:30Z, after scale-down started at 2022-01-27T16:08:36Z. It is likely that there is some other operator scaling this back up

I think prometheus-operator was doing that scaling up/down? Is there a way we could find out which operator is scaling this up?

Just checked helm list and it is empty

I noticed that if I do

kubectl patch prometheus -n monitoring k8s --type='json' --patch '[{"op": "replace", "path": "/spec/replicas", value: 0}]'

Then it will be changed back to 1 at some point.

Was that running pvmigrate manually, or running it as part of the kURL migration script?

Because the kURL migration script should scale down prometheus itself with the same patch command I showed you.

EDIT:
Actually, let me look at something - a recent change to ekco might have made it so that it will automatically be scaled back up

EDIT2: no, not so far as I can see

We just ran the kURL script as a whole. Something like curl -sSL https://k8s.kurl.sh/einblick-unstable | sudo bash

We deleted the old k8s and removed the whole monitoring namespace. It seems fine now, thanks!

One question is that it seems longhorm is using too many CPUs, we note this:

longhorn-system             instance-manager-e-4f7d5612                 1920m (12%)   0 (0%)      0 (0%)           0 (0%)
 longhorn-system             instance-manager-r-ff750983                 1920m (12%)   0 (0%)      0 (0%)           0 (0%) 

Those two pods are using close to 4000m CPUs. Is that normal? Is there a way that we could make it smaller? (We could definitely edit the pod for example but wanted to find a way that can automatically do the trick)