Network Downtime: What to Monitor?

To be in the first flush of network issues before they turn into a nasty outage, you need to be on the watch for a few important network monitoring parameters. With so many factors, objects and interfaces, the burning question is – What to monitor?

Knowledge about main causes of network downtime helps but, a lot depends on the design of your network, the devices, services running, and so on. But, in general, what are the recommended critical parameters that need steady monitoring?

NPM_Blog-Image1.png

Be the first to know before it affects your users! Here are a few pointers to some important monitoring parameters are as below,

Availability and Performance Monitoring: Monitoring and analysis of network device and interface availability and performance indicators help ensure that your network is running at its best. Some of the factors that influence good network availability are:

  • Packet loss & latency
  • Errors & discards
  • CPU, memory load & utilization

Detailed monitoring and analysis of this data for your network elements and timely alerting on poor network conditions like slow network traffic, packet loss, or impaired devices helps safeguard your network from unwarranted downtime.

Errors & Discards: Errors and discards are two different objects in the sense that errors designate the packets that were received but couldn't be processed because there was a problem with the packet. Discards are those packets that were received with no errors but were discarded before being passed on to a higher layer protocol.

Large number of errors and discards reported in the interface stats is a clear indication of something that’s gone wrong. A further investigation into the root cause will help identify the issue which can be quickly resolved.

Device Configuration and Change: Non-compliant configuration changes are a major cause of network downtime. Not knowing what changes are made and when, is even more dangerous for your network. Create a system that monitors and approves device configuration changes before they are applied to production. Setting up alerts to notify you whenever a change is made helps you maintain control over your device configurations being the cause of network downtime. 

Syslog and Trap Messages: Syslog and traps serve separate functions - syslog messages come in for authentication attempts, configuration changes, hits on ACLs and so on. Traps are event-based come in when some device-specific event has occurred like an interface had too many errors, high CPU usage, etc.

The advantage here is that, instead of waiting for your network management system (NMS) to poll for device information, you can be alerted on unusual events based on these syslog and trap messages.

Network Device Health Monitoring: Monitor the state of your key device components including temperature, fan speed, and power supply so that you do not find yourself in a network outage caused due to faulty power supply or an overheated router. Set pre-defined thresholds to be alerted every time these values are crossed.

So, start monitoring your critical factors that impact the availability and performance of your network and  Minimize Network Downtime!

To Learn more: Unified Network Availability, Fault and Performance Monitoring

Thwack - Symbolize TM, R, and C