This is a deep dive into the Replicated Airgapped Kubernetes Appliance (AKA) architecture, and a reference for those installing and supporting these installations.
Installation
The kubernetes-init
script brings up a single-node Kubernetes cluster running Replicated in a Deployment. Most of the cluster components are brought up with kubeadm
. The kubernetes-init
script gathers configuration for kubeadm, runs kubeadm init
, adds on Weave, adds on Rook, and then installs Replicated.
Preparation for kubeadm init
These steps are performed by the Replicated AKA installer before kubeadm init
is invoked
-
Install Docker on the host. For Ubuntu this will be 1.12.3 from the docker-engine repo and for RHEL it will be 1.13.1 from the yum docker repo. Also check that docker is not configured to use loopback mode with devicemapper. Overlay2 is the preferred storage driver where supported. If devicemapper has to be used because of the kernel version, it must have a thinpool provisioned with a block device. Devicemapper Installation Warning
-
Install kubelet, kubectl, and kubeadm on the host. These come from deb/rpm packages, which are bundled into a docker image and loaded on the customer’s machine.
-
/opt/replicated/kubeadm.conf This file is generated from the flags and prompts in the kubernetes-init script.
-
Disable SELinux. Kubeadm is expected to be able to bring up a cluster that works with SELinux enforcing in the 1.14 release. Currently it’s not possible to run with SELinux. Document kubeadm usage with SELinux · Issue #279 · kubernetes/kubeadm · GitHub
kubeadm init
The kubeadm config file we created at /opt/replicated/kubeadm.conf is expanded with defaults. The full config can be viewed with kubeadm config view
. This is used to configure the following components required for a Kubernetes cluster:
Kubelet
The kubelet section of kubeadm config is used to create the file /var/lib/kubelet/config.yaml and then the kubelet systemd service is started.
Static pods in the control plane (master only)
The kubeadm config is used to customize the command flags passed to four static pods that make up the control plane. The yaml config for these pods is found in /etc/kubernetes/manifests. Kubelet will run anything in this directory as a static pod.
These static pods run in the kube-system
namespace, so once the cluster is running you can view these pods with kubectl -n kube-system get pods
and get logs from them in the normal way.
Non-static System Pods
These components are also deployed to the kube-system
namespace, but not as static pods. They can be scheduled on worker nodes and can be edited with kubectl
. The Kubernetes cluster would still be able to run pods without these services, but DNS and service networking would not work.
- CoreDNS (Deployment)
- Kube-proxy (DaemonSet)
Networking
Pod Networking
After kubeadm init
has completed, Replicated deploys weave as the CNI plugin for Kubernetes. Weave is deployed as a DaemonSet in the kube-system
namespace. The pod started on each node copies the weave-ipam
and weave-net
binaries to the /opt/cni/bin
directory to be called directly by kubelet when creating pod sandboxes.
Weave is responsible for implementing the Kubernetes networking model. It assigns an IP address to every Pod, and ensures IP packets can be routed between pods and between nodes and pods.
IPAM - weave will assign IPs to pods from the subnet 10.32.0.0/12 unless another subnet was passed to the ip-alloc-range
flag of the kubernetes-init
script. Weave will set up a routing rule on every host so that all traffic addressed to an IP in the 10.32.0.0/12 subnet is routed to the weave interface. The weave interface is a bridge and can be viewed with ip -d link show weave
. All pods on the same host have a virtual ethernet interface pair with the host end in the weave bridge. For clustered installs with multiple nodes there will also be a VTEP for each remote node attached to the weave bridge. Traffic destined for a Pod IP on a remote node will be forwarded to the remote weave bridge through the appropriate VTEP and then delivered locally.
Service Networking
Most cluster traffic is addressed to a service IP rather than a Pod IP. Kube-proxy is responsible for ensuring that traffic addressed to a service IP gets routed to a Pod IP. A service is essentially an in-cluster load balancer routing traffic to multiple upstreams. You can see the backends available for every service by running kubectl get endpoints <service>
.
Cluster DNS
CoreDNS allows in-cluster clients to address services by hostname rather than by IP. Every pod gets a simple /etc/resolv.conf
with a single nameserver, 10.96.0.10. This is the service IP of the K8s DNS service, which for legacy reasons is still named kube-dns
. It resides in the kube-system namespace along with the CoreDNS deployment. The CoreDNS pods have an /etc/resolv.conf
created from the host’s. If a request does not match any cluster services, it will be forwarded to the same nameservers serving the host. Note that only the first 2 nameservers and the first 3 search records from the hosts /etc/resolv.conf
will be used.
Firewalls
Storage
Storage Checklist
- At least 1 GB of disk space at /var/lib/etcd
- At least 40 GB of disk space at /opt/replicated
- At least 10 GB of disk space at /var/lib/docker
Ceph
Rook will use the directory /opt/replicated/rook
for storage for provisioning PersistentVolumes on every host. An OSD will be created for every node to manage this directory. These can be viewed in the rook-ceph
namespace. Additionally, three MONs will be created in the same namespace to supervise the cluster. A single MGR will also be created in the rook-ceph
namespace to publish the Ceph dashboard.
You can manually configure the Ceph cluster and pool by using kubectl -n rook-ceph edit cluster rook-ceph
and kubectl -n rook-ceph edit pool replicapool
.
Rook
The Rook Operator and agents will be created in the rook-ceph-system
namespace. The Rook Agent is a DaemonSet. When each pod starts on a new node, it will copy its FlexVolume plugin binary to /usr/libexec/kubernetes/kubelet-plugins/volume/exec/ceph.rook.io~rook-ceph-system
. The plugin will be called by kubelet
when creating Pod sandboxes and will send a request to the kernel to create a new block device backed by ceph. These block devices can be viewed with lsblk
and will be named rbd0
, rbd1
, etc.
The Ceph dashboard provides information on cluster health and the status of OSDs and MONs. It can be found in the OnPrem Console under the /ceph path.
General Troubleshooting
Check disk space
Kubelet will begin pruning unused images when the system disk usage hits 80%, and will kill running containers at 85%. If either threshold is met, the system may become unrecoverable.
df -h
Check Node Status
kubectl describe node replicated-test-11
Look for any conditions to be true (e.g. MemoryPressure, DiskPressure)
Check whether all pods are running
kubectl get pods --all-namespaces
Check just the application pods
kubectl get pods --namespace=replicated-<appid>
kubectl describe pod <pod-name> --namespace=replicated-<appid>
Check Docker logs
journalctl -u docker
Check if kubelet is up
sudo systemctl status kubelet
Check kubelet logs
Kubelet runs on every Kubernetes node. If there are errors here, they’re likely to prevent container deployments and/or communication.
journalctl -u kubelet
Get Replicated logs (regular and UI)
kubectl logs -l tier=master -c replicated
kubectl logs -l tier=master -c replicated-ui
Run a command in a weave container to check status
First find the weave pod name running on the node in question:
kubectl -n kube-system get pods -o wide
Then exec into that pod to run status commands:
kubectl -n kube-system exec -it weave-net-92vcs /bin/sh
# ./weave --local status
# ./weave --local status connections