Know When a Kubernetes Pod is Down

Most modern-day applications operate on containers to achieve robust scalability and consistency. Kubernetes is a popular orchestration platform used to manage containerized applications. Container environments necessarily have many moving parts, and the most basic Kubernetes execution unit is the pod. To perform tasks in a busy application environment, groups of these pods—sometimes numbering in the hundreds—are constantly launched and decommissioned. Without the right tools, this added complexity can make monitoring and tracking the health of your application infrastructure an impossible task.

SolarWinds® Observability, an advanced monitoring solution, provides deep insights into the performance and health of your Kubernetes pods. In this article, we'll introduce you to Kubernetes pods, how they work, and the types of issues producing failures. We'll also give you an overview of how SolarWinds Observability works and demonstrate how it detects pods at risk of failure.

A brief overview of Kubernetes Pods

Kubernetes environments are organized across many layers. Configuration instructions, application code, and necessary software dependencies are delivered as code and sent to the Kubernetes environment through an API. The control plane contains administration software designed to relay instructions to active containers. These instructions go through the pods that exist within individual compute nodes.

You can think of pods as objects hosting one or more containers, along with any associated storage resources and network settings. Pods are flexible micro-environments for running services within your larger application infrastructure. Each pod has a unique network identity and communicates with other pods in the deployment. Pods are ephemeral, which means they can be created, destroyed, or scaled based on workload demands.

 A typical public-facing application built using a microservices architecture could comprise hundreds of containers deployed across physical servers in distant remote locations and carrying out dozens of tasks. At scale, monitoring the health and state of Kubernetes pods is important, as traffic routed to failed pods could result in elevated error levels returned to the caller. The impact could range from impaired performance to a complete outage.

Several common issues can impact application health and reliability. For most of these issues, it’s vital to remember containers may be virtual, but they run on physical hardware. However, physical hardware—even cloud-based hardware—has limits. While it's unlikely ever to reach a hard limit on a cloud platform, users might fail to allocate sufficient resources.

With adequate monitoring and alerts, you can avoid overloaded nodes and failed pods. This might include a pod eviction event due to insufficient available resources. Or, as a result of spot instance availability, you could encounter an unexpected termination scenario. Missing dependencies might also cause repeated failed launches accompanied by CrashLoopBackOff errors or similar problems.

Monitoring is crucial to maintain availability for distributed applications. However, the sheer scale of a Kubernetes-backed application infrastructure requires automated monitoring.

An Introduction to SolarWinds Observability

SolarWinds Observability provides AI-powered, single-pane-of-glass monitoring of your infrastructure and applications, no matter where they run. Developers gain visibility for complex Kubernetes workloads deployed across multi-layered stacks of AWS or Azure cloud services, giving them the data they need to track performance, outages, and potentially worrying trends.

SolarWinds Observability covers everything from user-facing availability and latency issues to node and cluster-level health metrics. SolarWinds Observability monitors and reports on any aspect with a potential impact on application availability.

In addition to traditional infrastructure monitoring, SolarWinds Observability focuses on performance metrics specific to Kubernetes—at the cluster, node, and pod levels.

Detecting a failing pod with SolarWinds Observability

Through pod monitoring, alerting, and dashboards, SolarWinds Observability provides users with the information and alerts they need for faster remediation of application issues.

Pod monitoring

For your Kubernetes-backed application, your focus will be on pod health and performance. Healthy pods generally mean all elements within your workflow are doing fine. Underperforming pods indicate an underlying issue requiring your attention.

SolarWinds Observability surfaces comprehensive and easy-to-consume metrics representing CPU and memory utilization, pod status, and general system health. Everything is presented with requisite historical data for contextual understanding, permitting in-depth root-cause analysis.


Alerts and dashboards

While SolarWinds Observability gives you a dashboard to visually represent the current state of health of every pod you're running, the platform also incorporates rich historical data to compare current behavior with archived benchmarks. In doing so, SolarWinds Observability can identify patterns and root causes of changes.

SolarWinds Observability dashboards are highly configurable. You can select the widgets you need to see to reduce the noise and focus on what matters most to you. SolarWinds Observability allows you to define the precise pod failure or dangerous conditions and create an alert trigger, ensuring you don’t miss significant issues.

You can route those alerts through appropriate channels, including email, SMS, and Slack to easily integrate SolarWinds Observability alerts into your existing workflows.

Use cases

To give you a sense of what this might look like, consider a Kubernetes setup with hundreds of pods. Within this infrastructure, you need to track down host nodes nearing the point of being critically short of memory. When you’re alerted early to this critical threshold, you can act quickly to resolve the situation.

Nodes with a shortage of available memory will lead to a failing state in which an application's pods terminate unexpectedly and far more frequently than you'd expect during normal operations. By identifying slow-responding pods or nodes with insufficient memory, an SRE team can take the necessary steps to reprovision pods.


Using Kubernetes to maintain enterprise-scale deployments involves an incredible degree of complexity. This underscores the value of proactive, continuous, and smart monitoring of your resources.

SolarWinds Observability provides time-sensitive insights only possible through a service deeply integrated with your resources—no matter where they're running and how complex they might be. to discover potential pod-level failures before hearing about them from your users, real-time and comprehensive monitoring and alerts from SolarWinds Observability are essential.

Interested? Sign up for a fully functional 30-day free trial of SolarWinds Observability.


 Edit: Interested in more on this topic? It inspired the webcast Operationalize Kubernetes Performance Monitoring with SolarWinds Observability.

THWACK - Symbolize TM, R, and C