This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

How would you use NPM to discover and alert when a critical device has increasing latency and packet loss--but not so much packet loss that NPM marks it "down"?

I recently had a switch start malfunctioning and it caused a lot of disruption.  Finding it wasn't intuitive or easy because it wasn't down and wasn't showing up in any of my NPM's front page widgets.  A reboot was required to correct the issue immediately, and it may require an IOS upgrade or hardware replacement to prevent this permanently.

It's a simple switch, but it faces some critical equipment and areas.  It's never had much latency--1 ms or less, often, and never as much as 20 ms.

But twelve days ago it started having increasing latency.  And four days ago it started having a little packet loss.  But never enough packet loss to show up on NPM's front page.   But the latency was enough to cause increasing complaints to the help desk, and the packet loss was enough to drop a lot of customers' voice & video & citrix sessions.  I want to avoid being behind this 8-ball in the future by building the right alerts & notifications based on changing patterns of latency and loss--without getting inundated when other devices have temporary-but-normal cases of increased latency and packet loss.  In short, I want the right actionable alert, as adatole would say, but I'm not certain how to build it.

Here's how the switch's latency and packet loss history looks:

pastedImage_1.png

I'd like to have started getting alerts when the latency changed from less than 4 ms to consistently more than 20 ms.  I'd like this particular class of device to have more priority for any packet loss so that at 3% loss we'd have seen it in bright red on the front of NPM.

What type of alert & threshold is appropriate to capture these slowly-growing changes over time?

  • You may change alert trigger conditions for that node. But I'd suggest to use custom property with treshold for latency for each node - then you will not need to change alert trigger condition.

  • I agree with JaroslawLadyga​. We use custom property thresholds for various volume alerts, completely bypassing the use of default/override thresholds. We do, however, adjust thresholds for various nodes for some alerts. Both ways work well for us. Then your alert scope can be limited to only those nodes with a value there, out of whatever subset you need to pull.

  • Might I impose on you for a sanitized screen shot example of how you use custom property thresholds this way?

  • Sure thing, rschroeder​. It's nothing fancy, and we are using them for volumes, not nodes. (It's also a nice little workaround for managing volumes, as they have been missing from the manage node/interface page for, well, forever...) I'd think you would benefit more from using the default threshold options for your use case, dealing with nodes, as opposed to my method for volumes.

    We need custom properties to exist before we can use them, obviously.

    pastedImage_2.png

    We can use the custom property editor to provision these in bulk, as opposed to very awkwardly clicking through a bunch of thing on the manage nodes page, hoping to avoid a wrongful click.

    pastedImage_1.png

    And finally, the simple alert trigger to fire off if the custom property threshold is reached/breached.

    pastedImage_0.png

    Again, nothing fancy here, just a simple comparison to a CP value. And while your scenario is dealing with nodes, you may find the various default node threshold properties better suite your needs, rather than creating your own. Or, maybe not.

    Thank you,

    -Will

  • Thank you, Will.  I suspect you're correct.

  • even we use a lot of these, instead of using global thresholds.. and if the request for changing thresholds and other parameters is not very high then using custom property method is best in my view...