This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

NPN Poller Per Second - Gaps in data

We are seeing gaps in data, especially on high bandwidth interfaces (Example 200M -500M on a 1G Interface).  I am trying to figure out of our pollers are tuned right.  I went though this doc but I am not sure what I need.

1600 Elements
449 Node Elements
1147 Interface Elements

I am noticing that the SNMP outstanding never goes below 545.

 

Polling Engine monitor stats:

ICMP Statistic polling Index 3034 out of 3034
SNMP Statistic polling Index (1100-2600) out of 3034

SNMP Statistics  PPS: 148
Max SNMP Statistics  PPS: 148

 

The Polls per second tuning states the recommended:
Maximum Node and Interface Status polls - 64
Maximum Statistics Collections = 105

We poll our interfaces at 1 minute statistics collections
We poll our Node Stats (CPU/MEM) at 5 minute statistics collections

Any ideas? 

  • You are probably over-sampling the statistics data. By increasing the polling ~10X you deminish the scalability of the Orion server ~10X. The adjusted element count (for polling frequency) is ~15K and pollers don't do well with more than ~9K.

    I suggest carefully examining the need to poll so frequently. If there are critical interfaces or nodes where you believe you need to poll at 1 min I would set those up for 1 min (assuming there are just a handfull of these) and return the polling for all else to the defaults.

  • Our requirements are at least 1 minute for interface statistics and would really want more granular than 1 minute if available. a 5 minute average does not show nearly the bandwidth peaks in a bursty traffic environment.

     

    If we need 1 minute statistics would an additional poller help?   When these gaps happen on the busy interfaces other interfaces do not show gap at all.  IS it the case that Orion gets too busy and can not calculate the large bandwidth numbers when its busy?

  • I recommend you run a test on two similar interfaces and poll 1 at 1 min and 1 at 5 min. I have done this to demonstrate to clients that the data is almost identical. You can see shorter peaks with 1 min but the question there is what is the value of seeing these peaks? Short peaks with a rapid recovery are common and don't impact performance.  It is a trade off between the cost of rapid polling and the value of the data. For LAN interfaces 1 min polling has no value, so you could set the WAN interfaces to 1 min and keep LAN at 9 min and save a lot of $ in extra pollers and data storage. OK - I'm off my soap box now.... ;)

    For the high speed interfaces make sure you are using 64 bit counters. This setting is in the admin manage nodes interface and is set per node. This will keep the counters from experiencing rapid roll-over.

    Your adjusted element count for 1 minute polling is about 8 to 10 times greater than the actual element count so an additional poller will off load the polling and eliminate the gaps. Here is a doc on gaps and one on Orion Performance that may be helpful.

  • I agree in allot of places 5 minutes would be good.  Our data is all time sensitive and bursty.  We get haunted by microbursts all day.  We will have 1G interfaces running at ~100M of bandwidth then a microburst can max the interface for a few seconds then go back to ~100M.

    Are the 64 bit counters in a certain Version?  We are still running V9.51 since our last upgrade attempt failed.

  • I found the 64bit counters box.  I will try that this evening.