Observability for Node.js application performance and health

Every node.js developer wants their application to never experience lag or downtime.  But how do you accomplish this? Observability and the end to end visibility it provides, is key to gain a better understanding of the health of your node.js application. Measuring key performance metrics and understanding dependencies you can detect problems, such as a memory leak or long-running processes that block the Node.js event loop.

This blog discusses why you should consider a holistic observability solution for monitoring the performance of node.js applications. It explores which app metrics matter most and introduces three common problems you can solve using some of the features of SolarWinds Observability.


Why Observability Is Key for Improving Node.js App Health

No application is perfect, and issues can arise for any application. This is why observability is crucial element for measuring the health and performance of any application. You can detect an application crash. However, it’s much harder to detect smaller issues that only occur once per day or involve dependent services or resources. Observability helps you navigate this sea of logs and find small application issues. Furthermore, observability allows you to shift from reactive monitoring to proactive monitoring. Through anomaly detection, you can detect patterns that may cause issues in the future.

For example, let’s say you detect a whole set of failed login attempts for one of your applications. Such a pattern might indicate a malicious person wants to gain unsolicited access to your server. The APM functionality in SolarWinds Observability enables you to detect these types of pattern-based issues. Therefore, you can resolve potential problems before they even occur. Without it, you are essentially waiting for an issue to arise, so you can investigate and solve it.

However, it takes much more time to resolve such an issue as you don’t have the context monitoring can provide you. In other words, you’ll have to search thousands of logs to find out what exactly happened. Developers can find the root cause of an issue much faster when using an observability solution for application performance monitoring.

Which Metrics to Measure and Why

Here’s a list of four important metrics you should measure to get better insights into the health of your application.

 1. Average Requests Rate

The average requests rate tells you how many requests your application can handle. It’s an important metric for scaling your application. For example, when you notice the number of failed requests increases, it may mean your backend can’t handle the number of requests. You should measure the average requests rate and determine your application’s upper limit through stress testing. This way, you can create rules to automatically scale your application based on the request rate.

2. Average Request Response Time

The average response time metric tells you how long it takes your app to respond to a given request. This metric contributes to the overall health and performance of your application. The lower your average response time, the better!

However, bear in mind to also measure the edges. If you only measure the average response time, you might miss out on important information such as the longest response time, also referred to as “peak response time.” For example, you can create a metric to track each request that takes longer than three seconds to complete.

The average response time metric alone might swallow slow requests. By measuring the slowest requests, you might detect memory leaks you otherwise wouldn’t. Moreover, it’s not a waste of time to investigate the slowest requests, so you can further enhance flows in your application.

3. CPU Usage

Measuring CPU usage is a system-level performance metric. Other important system-level performance metrics include memory utilization and disk usage. The CPU utilization refers to the amount of CPU time your application requires to handle a request.

If you detect some requests use up to 50% or more of your CPU, you might want to investigate this. You might need more CPU capacity, or you might want to evaluate the code snippet causing the CPU spike. Perhaps incorrect coding causes your CPU usage to spike.

4. Event Loop Time

Measuring the Node.js event loop time is the number one tip for developers to improve application performance. It’s one of the best metrics to detect bad code design. As Node.js uses a single thread, synchronous tasks block this thread.

Synchronous tasks aren’t necessarily bad; however, long-running operations prevent you from doing anything else. A blocked event loop is something you want to avoid—it increases the average response time for other requests, as they also have to wait for the event loop to be cleared again.

If you detect problems with your Node.js event loop time, first look at your code. Maybe you’ll find some long-running synchronous code. If not, you can try using code profiling in SolarWinds Observability to detect time-consuming Node.js tasks.

Want to get started with using a holistic observability solution for application performance monitoring? Signup for a trial of SolarWinds Observability or check the interactive SolarWinds Observability demo.

THWACK - Symbolize TM, R, and C