This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Can NPM do Smokeping-like polling and graphing?

Smokeping is an open source tool that pings (with traditional icmp or more advanced tools) an IP multiple times per polling interval and keeps track of the response time. Then when you bring up the web interface to see statistics it shows super useful information like average latency, jitter (difference between high and low latency), etc. and displays in a very neat looking graph showing average and "smoke" showing responses that fell outside the average.  Let's see if I can get an image to use as an example:

reading_detail.png

In this example it's set to send 20 small pings every 300 seconds. The green dots are each interval where all 20 pings were successful. The colored dots and vertical lines indicate some portion of loss witnessed. The grey areas above and below the averages is the variation in round trip time between each of the 20 pings.

I use a 10-ping-every-60-second method to get more detail but it's all configurable.

Is there a way in NPM to get some results similar to this and if not, is there a way to at least get more aggressive pinging and reporting on latency fluctuations and jitter between Orion and any IP destination (not using IPSLA)?

Thanks for reading.


Eric

  • "Smokeping" is exactly the tool I use to describe what SolarWinds is NOT. You can ratchet ping down to 1 ping every 10 seconds (default is 120 seconds)or as slow as 32767 seconds (once every 9 hours).

    But aside from those parameters, it's not the right tool for the job you are describing.

  • I see what you're saying.  Indeed there is a big gap between NPM's use of ping and the ideal.  Detecting jitter, accurately detecting % of packet loss, determining a node is experiencing severe packet loss instead of just being down, and so on are hard to do with NPM's ping monitoring.  But IPSLA does *fantastic* at those things and much more, and Solarwinds' VNQM embraces IPSLA as the testing protocol to provide that visibility.  What causes IPSLA to not be an option in this scenario?

  • If you have the VNQM module and IPSLA-enabled routers and and you only want to test connectivity to a relatively small number of nodes (my current environment is 10,000 devices. Wouldn't want to do that to EVERYTHING, now would I?!?) then that's a fine option.

    But your question was about NPM specifically. That tool is NOT built to use IPSLA, nor is it designed as a heavy-duty ping test. One ping with 3 retries and a 300ms timeout every 2 minutes, with a heavy ping cycle (1 ping every 5 seconds until 10 pings are missed) if the node fails to respond.

    The use case is for a general monitoring tool you can apply to the whole environment. Smokeping or the SW Toolkit or using IPSLA are more for short- to medium-term forensics and diagnostics.

    IM(ns)HO

    - Leon

  • I am aware of the technical limitations of the solution I proposed, but I'm still interested in why it is not a good solution for rednarb specifically.

  • Oops. I didn't realize it was YOU that posted the IPSLA question. I thought I was looking at the OP.

    In the immortal words of Ms. Emily Litella: "Never Mind"

  • I'm not against IPSLA and I understand the features that it brings. I just don't want to license that either from Cisco or Solarwinds. Smokeping is free and at the heart is very simple. I find it hard to live without reasonably accurate and more frequent latency tracking and assumed that others may have felt the same way and worked something into NPM you know, for that single pane of glass.

    Thanks for the replies all!

  • Next question: would you want that deeper visibility all the time for all nodes or just on demand for selected nodes?  Or all the time for some nodes but not others?

  • The history is what's important for me most often, to build a baseline so to speak. Therefore for me it would be on all the time.

    Smokeping just uses RRDs with RRDTool so it's super simple and really fast, even with a significant amount of history. I'd hate to see the additional load of 1000's of pings and all the associated telemetry being stored in mssql... but I'm no DBA so I can only imagine.