This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

How does Solarwind calculate the latency and packet loss.

Hi,

I have a alert configure on solarwinds for Latency and packet loss. And i get this alerts every day for packet loss and latency.

Can i know how solarwinds calculate this latency of the link and the packet loss. For Example:-  If i have 10 MB link then i get alert of 70% latency on link and some time packet loss alert.

So, please help me to know how latency is calculated in solarwinds for the link bandwidth and packet loss.

  • Latency is not measured or displayed in percentages in solarwinds.   It is milliseconds, based on how long it takes that node to answer a ping.

    Percent loss is calculated as the number of individual ping packets that were not responded out of the last 10 attempts.   That is why your packet loss will always be a multiple of 10.

    For the most part I would interpret any alert that showed me 70% packet loss as the interface was probably completely down for a little over a minute but came back up before solarwinds had decided to mark the node as down.   Loss that ramps up quickly and then ramps back down indicates an outage,  loss that sticks around in the same range for a sustained period of time is more likely to be actual dropping of packets due to a flaky connection.

    How is % Packet Loss calculated?

  • Hi mesverrum​, aLTeReGo​,

    If the packet loss is always a multiple of 10, how come we could see packet loss like 4%, 11.16% in the node's packet loss graphs ?

    Many Thanks!

  • Are you looking at the averaged stats from an older time period or something like hourly averages?  The default retention settings hold detailed stats for a week,  then average them to hourly values for 30 days,  then average those values and hold them for a year.  An individual node showing stats that have not been aggregated somehow would never be 11.16% for a single time period.

  • Hi mesverrum​,

    PFB the screenshot of the graph I'm looking at for packet loss.

    I'm looking at the packet loss averaged at 5 minutes and the value is 33.33%.

    Many Thanks!

  • The polling interval by default is two minutes,  so a 5 minute chart would always be an aggregate of at least 2 polls.  If you experience packet loss at some point in that 5 minutes then you enter fast polling every 10 seconds until we stop losing packets, so in this case I would probably interpret that as the device not responding to packets for something like a minute or two during that 5 min interval. You could do a report on node availability to see the exact timestamps of each poll if you want since there are so many moving variable that the chart would just be a lot of guessing about what exactly happened in that 5 minutes.

  • Hi All,

    So, here the device is set to poll ICMP at 2 minutes. However, if I am viewing the graph at 1 minute, I could see the response time at an interval of 1 minute and in some cases its after 3 minutes.

    How's this happening?

    @mesverrum, aLTeReGo​, kindly help! This is really getting confusing emoticons_sad.png

    pastedImage_0.png

    Many thanks!

  • amrita1690 , can you post a screenshot of your Polling Engines details for the poller that node is assigned? [Settings > All Settings > Polling Engines]

    pastedImage_0.png

  • Hi aLTeReGo​,

    PFB the screenshot of the poller, the node is assigned to.

    pastedImage_2.png

  • Based upon the screenshot you posted above, the poller is nearing its maximum limit. What's good is that your polling rate is still holding at 100%, but it's likely at times falling to 99% or lower. If you export the chart data, it's likely the polling is only falling behind a second or two from its schedule. The chart you referenced above bucketizes results into 1min buckets based upon the settings you selected. This means that if you polled once at 12:01:59 then the next poll was at 12:03:01 there would be a gap at 12:02, even though only 62 seconds had elapsed since the last poll, or only two seconds longer than scheduled. Orion's polling is dynamic based upon load and makes every attempt possible to schedule jobs out evenly, but when you begin nearing the poller limit you may begin seeing some very minor variance or delays in polling schedules as the poller is unable to find an available slot at the exact scheduled time. This is can also occur when latency is higher between Orion and the node, or the node being monitored is under heavy load as these can add additional delays to getting polling results from those devices, leave less time for other jobs to be processed. In almost all cases, adding an Additional Polling Engine will alleviate these issues.