This is a deep dive into the Replicated Airgapped Kubernetes Appliance (AKA) architecture, and a reference for those installing and supporting these installations.
kubernetes-init script brings up a single-node Kubernetes cluster running Replicated in a Deployment. Most of the cluster components are brought up with
kubernetes-init script gathers configuration for kubeadm, runs
kubeadm init, adds on Weave, adds on Rook, and then installs Replicated.
These steps are performed by the Replicated AKA installer before
kubeadm init is invoked
Install Docker on the host. For Ubuntu this will be 1.12.3 from the docker-engine repo and for RHEL it will be 1.13.1 from the yum docker repo. Also check that docker is not configured to use loopback mode with devicemapper. Overlay2 is the preferred storage driver where supported. If devicemapper has to be used because of the kernel version, it must have a thinpool provisioned with a block device. Devicemapper Installation Warning
Install kubelet, kubectl, and kubeadm on the host. These come from deb/rpm packages, which are bundled into a docker image and loaded on the customer’s machine.
/opt/replicated/kubeadm.conf This file is generated from the flags and prompts in the kubernetes-init script.
Disable SELinux. Kubeadm is expected to be able to bring up a cluster that works with SELinux enforcing in the 1.14 release. Currently it’s not possible to run with SELinux. Document kubeadm usage with SELinux · Issue #279 · kubernetes/kubeadm · GitHub
The kubeadm config file we created at /opt/replicated/kubeadm.conf is expanded with defaults. The full config can be viewed with
kubeadm config view. This is used to configure the following components required for a Kubernetes cluster:
The kubelet section of kubeadm config is used to create the file /var/lib/kubelet/config.yaml and then the kubelet systemd service is started.
The kubeadm config is used to customize the command flags passed to four static pods that make up the control plane. The yaml config for these pods is found in /etc/kubernetes/manifests. Kubelet will run anything in this directory as a static pod.
These static pods run in the
kube-system namespace, so once the cluster is running you can view these pods with
kubectl -n kube-system get pods and get logs from them in the normal way.
These components are also deployed to the
kube-system namespace, but not as static pods. They can be scheduled on worker nodes and can be edited with
kubectl. The Kubernetes cluster would still be able to run pods without these services, but DNS and service networking would not work.
kubeadm init has completed, Replicated deploys weave as the CNI plugin for Kubernetes. Weave is deployed as a DaemonSet in the
kube-system namespace. The pod started on each node copies the
weave-net binaries to the
/opt/cni/bin directory to be called directly by kubelet when creating pod sandboxes.
IPAM - weave will assign IPs to pods from the subnet 10.32.0.0/12 unless another subnet was passed to the
ip-alloc-range flag of the
kubernetes-init script. Weave will set up a routing rule on every host so that all traffic addressed to an IP in the 10.32.0.0/12 subnet is routed to the weave interface. The weave interface is a bridge and can be viewed with
ip -d link show weave. All pods on the same host have a virtual ethernet interface pair with the host end in the weave bridge. For clustered installs with multiple nodes there will also be a VTEP for each remote node attached to the weave bridge. Traffic destined for a Pod IP on a remote node will be forwarded to the remote weave bridge through the appropriate VTEP and then delivered locally.
Most cluster traffic is addressed to a service IP rather than a Pod IP. Kube-proxy is responsible for ensuring that traffic addressed to a service IP gets routed to a Pod IP. A service is essentially an in-cluster load balancer routing traffic to multiple upstreams. You can see the backends available for every service by running
kubectl get endpoints <service>.
CoreDNS allows in-cluster clients to address services by hostname rather than by IP. Every pod gets a simple
/etc/resolv.conf with a single nameserver, 10.96.0.10. This is the service IP of the K8s DNS service, which for legacy reasons is still named
kube-dns. It resides in the kube-system namespace along with the CoreDNS deployment. The CoreDNS pods have an
/etc/resolv.conf created from the host’s. If a request does not match any cluster services, it will be forwarded to the same nameservers serving the host. Note that only the first 2 nameservers and the first 3 search records from the hosts
/etc/resolv.conf will be used.
- At least 1 GB of disk space at /var/lib/etcd
- At least 40 GB of disk space at /opt/replicated
- At least 10 GB of disk space at /var/lib/docker
Rook will use the directory
/opt/replicated/rook for storage for provisioning PersistentVolumes on every host. An OSD will be created for every node to manage this directory. These can be viewed in the
rook-ceph namespace. Additionally, three MONs will be created in the same namespace to supervise the cluster. A single MGR will also be created in the
rook-ceph namespace to publish the Ceph dashboard.
The Rook Operator and agents will be created in the
rook-ceph-system namespace. The Rook Agent is a DaemonSet. When each pod starts on a new node, it will copy its FlexVolume plugin binary to
/usr/libexec/kubernetes/kubelet-plugins/volume/exec/ceph.rook.io~rook-ceph-system. The plugin will be called by
kubelet when creating Pod sandboxes and will send a request to the kernel to create a new block device backed by ceph. These block devices can be viewed with
lsblk and will be named
The Ceph dashboard provides information on cluster health and the status of OSDs and MONs. It can be found in the OnPrem Console under the /ceph path.
Kubelet will begin pruning unused images when the system disk usage hits 80%, and will kill running containers at 85%. If either threshold is met, the system may become unrecoverable.
kubectl describe node replicated-test-11
Look for any conditions to be true (e.g. MemoryPressure, DiskPressure)
kubectl get pods --all-namespaces
kubectl get pods --namespace=replicated-<appid> kubectl describe pod <pod-name> --namespace=replicated-<appid>
journalctl -u docker
sudo systemctl status kubelet
Kubelet runs on every Kubernetes node. If there are errors here, they’re likely to prevent container deployments and/or communication.
journalctl -u kubelet
kubectl logs -l tier=master -c replicated kubectl logs -l tier=master -c replicated-ui
First find the weave pod name running on the node in question:
kubectl -n kube-system get pods -o wide
Then exec into that pod to run status commands:
kubectl -n kube-system exec -it weave-net-92vcs /bin/sh # ./weave --local status # ./weave --local status connections