On some early versions of etcd
, compaction may not be turned on or may be at a time interval that allows the database size to grow too large.
The etcd
key-value store in an Kubernetes cluster keeps track of all changes made to keys through the history of the cluster. In some cases, the size of the etcd
database can grow too large and may require compaction. See the etcd
documentation on maintenance for more information.
If the etcd
database grows too large, it can cause the application to crash and cause knock-on outages in the Kubernetes cluster. You may be unable to make changes to the cluster (create new pods, edit deployments, etc.) until this is resolved. If this happens, etcd
should report an alarm and you may see errors in logs like:
etcdserver: mvcc: database space exceeded
If you encounter this problem, you can resolve it by triggering etcd’s compaction and defragmentation processes, and then reset the alarm:
kubectl exec -it $(kubectl -n kube-system get pod -l component=etcd --output=jsonpath={.items..metadata.name}) -n kube-system -- /bin/sh -c "ETCDCTL_API=3 etcdctl --endpoints 127.0.0.1:2379 --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key --cacert=/etc/kubernetes/pki/etcd/ca.crt compact 3"
kubectl exec -it $(kubectl -n kube-system get pod -l component=etcd --output=jsonpath={.items..metadata.name}) -n kube-system -- /bin/sh -c "ETCDCTL_API=3 etcdctl --endpoints 127.0.0.1:2379 --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key --cacert=/etc/kubernetes/pki/etcd/ca.crt defrag"
kubectl exec -it $(kubectl -n kube-system get pod -l component=etcd --output=jsonpath={.items..metadata.name}) -n kube-system -- /bin/sh -c "ETCDCTL_API=3 etcdctl --endpoints 127.0.0.1:2379 --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key --cacert=/etc/kubernetes/pki/etcd/ca.crt alarm disarm"