Kubernetes cluster is down and reporting: etcdserver: mvcc: database size exceeded

On some early versions of etcd, compaction may not be turned on or may be at a time interval that allows the database size to grow too large.

The etcd key-value store in an Kubernetes cluster keeps track of all changes made to keys through the history of the cluster. In some cases, the size of the etcd database can grow too large and may require compaction. See the etcd documentation on maintenance for more information.

If the etcd database grows too large, it can cause the application to crash and cause knock-on outages in the Kubernetes cluster. You may be unable to make changes to the cluster (create new pods, edit deployments, etc.) until this is resolved. If this happens, etcd should report an alarm and you may see errors in logs like:

etcdserver: mvcc: database space exceeded

If you encounter this problem, you can resolve it by triggering etcd’s compaction and defragmentation processes, and then reset the alarm:

kubectl exec -it $(kubectl -n kube-system get pod -l component=etcd --output=jsonpath={.items..metadata.name}) -n kube-system  -- /bin/sh -c "ETCDCTL_API=3 etcdctl --endpoints 127.0.0.1:2379  --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key --cacert=/etc/kubernetes/pki/etcd/ca.crt compact 3"
kubectl exec -it $(kubectl -n kube-system get pod -l component=etcd --output=jsonpath={.items..metadata.name}) -n kube-system  -- /bin/sh -c "ETCDCTL_API=3 etcdctl --endpoints 127.0.0.1:2379  --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key --cacert=/etc/kubernetes/pki/etcd/ca.crt defrag"
kubectl exec -it $(kubectl -n kube-system get pod -l component=etcd --output=jsonpath={.items..metadata.name}) -n kube-system  -- /bin/sh -c "ETCDCTL_API=3 etcdctl --endpoints 127.0.0.1:2379  --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key --cacert=/etc/kubernetes/pki/etcd/ca.crt alarm disarm"