Building Resilient Applications on Kubernetes

10 min readJun 23, 2024

Introduction

Features that Kubernetes offers, such as the ability to change infrastructure without bringing down the entire cluster, automatic healing of applications, and dynamic scaling based on traffic, have helped organizations realize the benefits of cloud-native applications. However, the dynamic nature that enables these features raises the question of whether Kubernetes is stable enough to run mission-critical, high-throughput applications. For example, the possibility of pods going down at any time suggests that traffic and user experience can be affected. While this can happen without proper configuration, with the right probes, pod distribution strategies, and scaling options, any application can run robustly on Kubernetes. In this post, we discuss approaches to reduce downtime and improve the resiliency of applications running on Kubernetes.

Configure proper health probes

Health probes are the way to inform the Kubernetes control plane whether your application is healthy, ready to receive traffic, or still booting up. While properly configured health probes can improve your application’s responsiveness, improper configurations can cause unnecessary downtime. Health probes are a part that gets often misconfigured, so it’s important to understand the role of different health probes.

Purpose of different probes

Readiness Probe: Checked before routing traffic to the pod. If the readiness probe fails, the pod no longer receives traffic.

Liveness Probe: Used to determine whether the container within the pod is running. If the liveness probe fails, the container is restarted.

Readiness Probe

Set the readiness probe to pass when the pod has completely started up and is ready to receive traffic. If you need to warm up the application by priming caches or initializing database connection pools, do so before passing the readiness probe.

You only need a readiness probe if your pod is serving traffic through a Kubernetes service. If you’re executing a task like running a batch job or consuming from a queue, you don’t need a readiness probe.

Fail readiness when

The container is overwhelmed by traffic and needs to complete ongoing requests before accepting new traffic.
The application receives SIGTERM. We’ll discuss this more in the next section.

On Choosing a probe

A good candidate would be a dummy endpoint that utilizes the same thread pools as the normal requests. When the request handling thread pools are fully utilized, requests to readiness would also fail, cutting down traffic to the pods.

To further assess the readiness of the application, you can use various metrics, such as the utilization of database connection pools, threads in use, and concurrent requests. When any of these metrics indicate the application’s exhaustion, you can fail readiness to prevent further degradation.

Depending on the type of service, you can use a Command, HTTP Probe, TCP Probe, or gRPC Probe. Avoid TCP probes if possible, as they only check if your service is reachable through the port. Since it won’t reach the application level, it won’t give a correct indication of the application’s ability to accept new requests. Generally, use HTTP probes for web applications and microservices/applications exposing REST or GraphQL interfaces. For services with a gRPC interface, use gRPC probes.

A few tips

Log the time taken for the readiness probe in the application code. This will help to fine tune the thresholds if probe failures become too frequent.
Avoid calling backend-services inside the readiness probe as this can cause the probe to time out and fail. The purpose of the probe is to assert whether the application is ready to take traffic. If backend-service is down it should be handled through proper error handling by the application.
In case the application is solely dependent on backend-services and their statuses are needed before passing the readiness probe, fetch the statuses asynchronously.

Liveness probe

The liveness probe determines whether the application is running or alive. An application not being ready doesn’t mean it isn’t alive (it may be processing in-flight requests without accepting new ones), but if the application isn’t alive, it can never be ready.

Fail liveness when

The container encounters an issue that cannot be resolved without restarting.
The application transitions to a frozen state from which it cannot recover by waiting or cooling down. For example a DB connection pool held up by due to a slow and unresponsive DB server
Connectivity to a critical backend service (like a database or queue) fails and cannot be re-established without restarting.

On Choosing a probe

In my opinion, a liveness probe should be used, if the application can be recovered from a re-start. A model liveness probe would return a very simple ( or a hard coded) response that won’t fail unless something is completely wrong with the application.

If your application needs a restart to establish connectivity with backend services ( like databases or queues) then checking connectivity to these before passing liveness can be a good practice. However, note that cloud native applications developed with modern frameworks such as Express, Spring Boot don’t typically require re-starts to re-establish connectivity. Libraries that handle backend connection pools can do this on their own. If you don’t see the need for a liveness probe, it’s better to avoid one.

Things to note

Don’t use the same configurations for liveness and readiness probes. The requirements for the two probes are different, and the actions you take in the readiness probe might cause the liveness probe to fail prematurely.
If you must use the same endpoint for both probes, ensure you have different configurations (initialDelay, timeout, period, etc.) for each.
Kubernetes doesn’t maintain an order while executing the probes, so make sure the readiness probe fails before the liveness probe. If you’re tolerating three failures for the readiness probe, tolerate five for the liveness probe.

Handle Pod Termination gracefully

Another common point where requests can fail is when the pod is terminating. Unless the pod termination is gracefully handled, requests sent to a terminating pod can fail.

When the containers running within the pod don’t gracefully shut down (i.e., they abruptly stop or crash), inflight requests are affected. Additionally, the Kubernetes Service Object associated with the pod will keep sending traffic until the application exits, leading some requests to receive connection refused errors.

The way to prevent abrupt failures is to handle the SIGTERM signal properly. To understand how that works, let’s examine the pod termination lifecycle in more detail.

Pod Termination steps — Pod Termination Steps

When the pod is marked for deletion, the kubelet calls the prestop handlers of all containers defined in the pod. Along with this, the countdown of the graceful shutdown period ( defined by terminationGracePeriodSeconds) begins.
After the prestop hook finishes, the kubelet issues SIGTERM to all containers in the pod. SIGTERM is a request to terminate the application running in the container. However the application can keep running even after receiving SIGTERM.
After the graceful shutdown period is over, SIGKILL is issued to all containers. If there are already running applications, this command will bring them to a halt.
The Kubernetes control plane removes the pod. After this point, the pod won’t be visible through any client.

Normally, SIGTERM is the first signal containers receive when the pod starts terminating. If a prestop handler is provided, it gets called first. In some situations, specifying a preStop handler helps achieve a graceful pod termination. Let’s first see how to handle SIGTERM and then how a preStop handler achieves the same.

Handling SIGTERM

The surest way to stop receiving traffic is to fail the readiness probe when the pod starts terminating by catching the SIGTERMsignal.

Side note: Handling SIGTERM properly is arguably the single most important thing to improve resiliency. Why? Because it’s through this signal that the application gets notified it’s about to be taken down. The dynamic nature of Kubernetes requires it to terminate pods frequently for various reasons, like rolling out a new application version, scaling down replicas, cluster maintenance tasks, and more. Not handling SIGTERM means you’re paving the way for the dynamic nature of Kubernetes to disrupt the stability of your application.

Upon receiving SIGTERM, applications need to catch it and perform a graceful shutdown by doing the following;

Failing the readiness probe either by marking a condition explicitly or by closing the request accepting thread pool.
Letting inflight requests complete.
Closing any two way connections like WebSockets or gRPC.
Closing the DB connection, connections to queues

When the readiness probe fails, the Service Object attached to the pod, no longer considers the pod’s IP to send requests. This effectively cut down traffic to the pod immediately.

To receive SIGTERM, the application inside the container should be running with process ID 1. If you’ve been using a script to start the application, the script will be the process with PID 1. In that case, you need to save the application’s PID and relay the SIGTERM back to the application using the PID.

Tune RollingUpdates

Downtimes can also occur when rolling out a new version of the application. When a deployment is scaled up to its maximum replica count and a rollout happens, the pods eligible to handle traffic can go down, causing some requests to fail.

By setting maxUnavailable to 0, you can ensure that new pods are scheduled before taking any of the existing pods down.

strategy:
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
    type: RollingUpdate

Note that as soon as the new pods are running, the old pods are taken down. The control plane doesn’t check whether the new pods are ready to receive traffic. If you need to defer terminating old pods until the new pods are ready to take traffic, you can use readiness gates. Currently this feature is only supported out of the box by the AWS Load Balancer. Check this link to find more on using readiness gates with AWS Load Balancer.

Define Pod Disruption Budgets

While performing cluster maintenance tasks like updating the Kubernetes version or upgrading the nodes of the cluster, pods can get terminated or rescheduled on different nodes. A Pod Disruption Budget (PDB) guards the deployment from these disruptions by allowing it to retain a minimum number of pods. The following configuration mandates 50% of the pods to be available during a disruption:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: pdb-front–end
spec:
  minAvailable: 50%
  selector:
	matchLabels:
  	app: front–end

If it’s a stateless application you’re guarding, better to use percentages for minAvailable, maxUnavailable attributes. This is because, over time, the number of replicas needed to run the application can increase, and the PDB definition can soon out date.

If the application is stateful and needs to maintain a minimum number of pods to achieve quorum, then it’s better to specify attributes as numbers.

Also note that PDBs aren’t considered for all disruptions. For example, deleting a deployment with kubectl doesn’t trigger PDB. Typically the node-related changes like kubectl delete <node>, kubectl drain , kubectl cordon will evaluate PDB. This blog post talks in detail about voluntary disruptions that go through PDB. And you can refer to the official K8s document for more recommendations on defining PDBs.

Distributing Pods across nodes

Should a node-level failure occur, having pods spreaded across different nodes will help reduce impact of the failure. You can achieve this by using pod affinity or topology spread constraints.

Pod Anti-affinity

Using anti affinity rules, you can instruct Kubernetes scheduler not to schedule two pods in the same node.

 affinity:
  podAntiAffinity:
     requiredDuringSchedulingIgnoredDuringExecution:
       - labelSelector:
           matchExpressions:
             - key: app
               operator: In
               values:
               - transaction-service
         topologyKey: kubernetes.io/hostname

According to this no two pods with the label app set to transaction-service will run on nodes which have the same value for kubernetes.io/hostname.

Depending how strictly you need to enforce this behavior, you can choose between the policies requiredDuringSchedulingIgnoredDuringExecution or preferredDuringSchedulingIgnoredDuringExecution. When the former is used, kube-scheduler cannot schedule the pod unless the condition is met. If it can’t find a new node that doesn’t have a transaction-service app, it’ll wait for a new node. With the latter, it’ll try to prevent co-locating transaction-service pods, but if a suitable node isn’t available, it will violate the condition and schedule.

Topology Spread Constraints

Suppose you’re running on a managed Kubernetes cluster like EKS or AKS, and you need to distribute your pods evenly across different availability zones. You can use topologySpreadConstraints for that. Availability zones is a method cloud infrastructure providers use to isolate failures. By spreading pods across availability zones, you can make the application highly available.

Even with a proper anti-affinity rule to place pods on different nodes, you applications can get scheduled like this;

With a toplogySpreadConstraint defined like this, you can have an even distribution.

topologySpreadConstraints:
 - labelSelector:
     matchLabels:
       app: transaction-service
   maxSkew: 1
   topologyKey: topology.kubernetes.io/zone
   matchLabelKeys:
  	- pod-template-hash
   whenUnsatisfiable: DoNotSchedule

You might be wondering why you cannot use anti-affinity in this case, simply by changing the topology key to topology.kubernetes.io/zone. Unless you carefully configure the policy and parameters, it can pose restrictions on scaling the application. For example, if you simply change the anti-affinity policy we saw, just by changing the topology key, then your application won’t scale beyond three replicas. Therefore, it’s better to use affinity when your purpose is to co-locate a pod with another, or to avoid placing pods together. Use topology spread constraints to distribute pods across nodes, zones, regions.

Conclusion

Even though it might seem that achieving resiliency and performing zero-downtime deployments with Kubernetes is impossible, with careful planning and proper configuration, the dynamic nature of the platform can be harnessed to run mission-critical applications. By following the strategies explained here — such as configuring proper health probes, handling pod termination gracefully, and defining pod disruption budgets — you can significantly reduce downtimes and improve the overall stability of your applications. Additionally, configuring Cluster Autoscalers and Pod Autoscaling can help scale the cluster and pods to handle traffic surges more effectively.

As Kubernetes continues to evolve, staying updated with the latest features and best practices will further strengthen your ability to build and maintain resilient applications.