Kubernetes Best Practices but easy to forget

What you should do if you are running K8s

Logging and Monitoring

I can’t address more how much logging and monitoring are important, so I will talk about ‘how-to’.

What we should monitor?

  • Every component in the Kubernetes (Control plan, worker nodes)
  • DevOps Pipeline
  • Applications
  • Other cloud instances (Virtual Machines, Networks, Storages) — Hardware
  • Administrator activity logs (for all tools)

To understand more, you should understand end-to-end traffic (data) flow (client — application) and consider you should monitor everywhere the traffic gets through.

You also need to perform a regular audit and analyze the logs. (Don’t just sit and wait for the alert system to alert you, then it’s already late). There are a lot of log analysis tools as well. https://logz.io/blog/open-source-monitoring-tools-for-kubernetes/ “Logging should be for logging it.”

Don’t run a separate logging container (Sidecar) for every application container. (‘hardly a significant overhead) — https://platform9.com/blog/kubernetes-logging-best-practices/#sidecar

Make your application graceful shutdown

Are you having always downtime when you update/upgrade K8s, while they are supporting rolling updates? That may be caused by the application without the readiness of a graceful shutdown.

(How graceful shutdown works? https://pracucci.com/graceful-shutdown-of-kubernetes-pods.html)

  1. A SIGTERM the signal is sent to the main process (PID 1) in each container, and a “grace period” countdown starts (defaults to 30 seconds - see below to change it).
  2. Upon arrival SIGTERM, each container should start a graceful shutdown of the running application and exit.
  3. If a container doesn’t terminate within the grace period, a SIGKILL signal will be sent and the container violently terminated.

So your application must stop accepting new requests on all remaining connections and close the queue is drained. If the application still accepts incoming requests in the grace shutdown time, then you can consider preStop handler https://kubernetes.io/docs/tasks/configure-pod-container/attach-handler-lifecycle-event/#define-poststart-and-prestop-handlers

A SIGTERM the signal is sent to the main process (PID 1) in each container

Your application will capture the SIGTERM which means the pod is going to be terminated and do the right process to shutdown your application gracefully.

Enable the Auto scaler

There are three types of scaling, horizontal pod autoscaler, cluster autoscaler, vertical pod autoscaler. To get higher resiliency of your system, you should consider three autoscaling enabled.

Small container images

Keeping small container images is very important and it’s not about the number of microservices, but the size of the image itself. There are a lot of benefits like

  • Reduce the wasting space by removing unnecessary libraries (less storage)
  • Create faster build and faster pipelines
  • Narrow Attackable surface

Best practices from Google Cloud Architecture Center: https://cloud.google.com/architecture/best-practices-for-building-containers

Readiness and Liveness Probes

https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/

Kubelet uses liveness probes to know when to restart a container, readiness probes to know when a container is ready to start accepting traffic, and startup probes to know when a container application has started.

With these, K8s check increases its reliability by avoiding pod failures. (especially readinessProbe and livenessProbe)

Ref