DISCOVERY

May 13th, 2019

Orchestrating Containers with Kubernetes Part I: Concepts

Kubernetes

Docker

Container

Container Orchestrator

YAML

I recently wrote a series of articles on Docker and containerized applications. In those articles I worked with a single Docker container at a time. Each container was mapped to a port on its host machines IP, making it accessible from the browser. In my case, the host machine was an EC2 instance.

While the single container approach works, it has a number of limitations and drawbacks. First off, a single container isn't scalable. As web traffic to an application increases, the load becomes too much for a single container to handle. Secondly, there isn't a zero-downtime approach to release a new version of an application. The container running the old version has to stop and a container with the new version has to start in its place. While both these limitations are deal breakers in themselves, the worst part about the single container approach is that its a single point of failure. If the container stops or the application crashes, the entire website goes down. This makes deploying a production application on a single container inadequate.

The tool to solve these issues is a container orchestrator.

Container Orchestrator

A container orchestrator is a tool that manages the lifecycle of containers. Orchestrators handle container scaling, self-healing, load balancing, deployment, and more1. Orchestrators operate the same in all environments, no matter if they are running on bare-metal machines or cloud VMs. They effectively abstract away the underlying infrastructure2. This allows DevOps engineers to focus on the most important aspect of a project - the application itself.

The most popular container orchestrator right now is Kubernetes. The second most popular container orchestrator is Docker Swarm, which I will likely explore in a future post. The rest of this post explains the basic concepts of Kubernetes.

To work with Kubernetes, a cluster is created consisting of a master node and worker nodes. A node can be a bare-metal server or virtual machine. Nodes are often virtual machines hosted in the cloud, such as Amazon EC2 instances.

Master nodes run the Kubernetes Control Plane, which you can think of as the brain of the cluster. Worker nodes run application pods and services. Let's explore the master node first.

As previously mentioned, master nodes run the Kubernetes Control Plane. There are four main pieces of the Control Plane - persistent storage, Kubernetes API, Scheduler, and Controller Manager3.

The Kubernetes API server is a CRUD API that describes and modifies objects in the cluster4,5. Interaction with the Kubernetes API server is often done through a CLI called kubectl. However, since the API server is a REST API, you can also interact with it using HTTP requests.

Objects in a Kubernetes cluster are represented as YAML documents. YAML is a superset of JSON, so you can write documents with JSON as well. Objects represent the desired state of the Kubernetes cluster. An example of an object is a pod, which contains one or more containers. Another example is a replica set, which self-heals and horizontally scales a pod (allowing you to have many pods with the same configuration). The YAML file declares the desired state of the pod or replica set in the cluster.

The most common approach for adding an object to a Kubernetes cluster is to send a YAML file to the API server. This is usually done through the kubectl CLI with the kubectl create -f <filename>.yml or kubectl apply -f <filename>.yml commands. For example, the following YAML file creates a pod object which holds a single container.

# pod.yml # Version of the Kubernetes API. Pods exist in the stable v1 version apiVersion: v1 # What kind of object to deploy (in this case a Pod) kind: Pod # Metadata describing the Pod object metadata: name: nodejs-app labels: app: nodejs-app version: v1.0.0 environment: sandbox spec: containers: # Unique name for the container in the pod - name: nodejs-app # Docker image to use for the container from DockerHub image: ajarombek/nodejs-docker-app:latest # Configure a network port for a container ports: # Port to expose on the container IP - containerPort: 3000 # Protocol defaults to TCP protocol: TCP

The Pod contains a container running a simple Node.js/Express web application. To deploy this object on a Kubernetes cluster, the following kubectl command is run:

kubectl create -f pod.yml

kubectl sends the YAML document to the API server. When the API server receives a YAML document, it stores all the Kubernetes objects it declares in persistent storage as JSON6. Kubernetes uses an etcd key-value store for persistent storage.

The last two pieces of the master node are the Scheduler and Controller Manager. The Scheduler runs an algorithm to determine which node a pod should run on7. The Controller Manager maintains multiple different controllers. Controllers watch the API Server for changes to Kubernetes objects8. Most Kubernetes objects have their own Controller. For example, there is a ReplicaSet controller which watches for changes to ReplicaSet objects. When changes occur, controllers perform operations on the cluster. Their objective is to keep the cluster matching the desired state declared in the Kubernetes objects. The only Kubernetes object that Controller's don't handle are Pods, which are configured by a component called the Kubelet. I will provide details about the Kubelet when discussing worker nodes.

Here is a diagram of the master node and its four main components:

Master Node Components

API Server

The central component of the master node. It's a REST API that stores JSON documents in persistent storage. JSON documents are Kubernetes objects which represent the desired state of the cluster.

Persistent Storage

An etcd key-value store which holds JSON documents. The structure of etcd is similar to a filesystem containing JSON files. Each JSON file represents a Kubernetes object. The etcd database is highly available, making it resistant to failures.

Scheduler

Waits for new pod definitions on the API server and schedules them to worker nodes. The scheduler runs an algorithm to determine which node is the best fit for a pod.

Controller Manager

While the API Server is the central component of the master node, it doesn't do anything besides accept API requests and write JSON to etcd. The controller manager performs the actual work in the cluster based on the desired state exposed by the API Server. It tries to keep the cluster in the desired state.

Worker nodes run applications and services. Worker nodes are equipped with a Kubelet, kube-proxy, and container runtime.

The Kubelet has two main jobs. The first is to register the machine its running on as a worker node for the cluster. The second is to watch the API Server for scheduled Pods9. The role of starting Pods is left to the Container Runtime. Kubernetes can be configured with many different Container Runtimes, the most common being Docker.

The last major piece of a worker node is the kube-proxy. While not actually a proxy server as the name suggests, kube-proxy makes sure that Pods are reachable via a static IP address10. This static IP address is created by a Kubernetes object known as a Service. I will discuss services in more detail in my next Kubernetes article.

Here is a diagram of the worker node and its main components:

Worker Node Components

Kubelet

A component that watches the API Server for Pod objects scheduled to run on its node. The Kubelet notifies the Container Runtime to start Pods after they are scheduled. It also destroys Pods after they are removed from the API Server and tells the Container Runtime to restart containers when they fail health checks.

kube-proxy

Ensures that HTTP requests to a Service object are sent to the proper Pod. These requests can come from the internet or locally within the worker node.

Container Runtime

A container runtime executes a container based off an image. The most common Container runtime is containerd, which is part of Docker. The Docker container system consists of multiple decoupled components, one of which is the container runtime (containerd)11. Other container runtimes can be used instead of Docker.

Here is a diagram of a three node cluster with one master node and two worker nodes:

This article provides a high level view of a Kubernetes cluster. In my next Kubernetes article, I'll walk through creating a simple one node cluster on AWS. I'll also discuss all the basic Kubernetes objects used in an application. I'm also building a full Kubernetes prototype application, which is available on GitHub!

[1] Nigel Poulton, The Kubernetes Book (Nigel Poulton, 2018), 5

[2] Marko Lukša, Kubernetes in Action (Shelter Island, NY: Manning, 2018), 2

[3] Lukša., 310

[4] Lukša., 316

[5] "kube-apiserver", https://bit.ly/2VhkJjp

[6] Lukša., 314

[7] Lukša., 319

[8] Lukša., 322

[9] Lukša., 326

[10] Lukša., 327-328

[11] Poulton., 73