This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Roughly 5 minute delay before hosts are marked "down" - Normal ?

Hi All, new to NPM, so bear with me if I've made a noob error.

My Cisco devices are monitored using SNMP (and I presume IMCP?).  If I take down my test switch, it takes sometimes 4 - 6 minutes before NPM marks the node as "down". It alerts me due to "packet loss" on the affected node very quickly.

I've changed the default polling interval from 120 to 60 seconds, with not much luck.

Am I being too impatient, or have I done something wrong?

Thanks in advance,

Ellis

Parents
  • Hi,

    at the settings page you can select polling settings, at the bottom of that page you can find "Node Warning Level", you can change the value there to determine after how many seconds a node is considered as down.

  • Hi - thanks for your answer.

    As a test I changed the "Node Polling Interval" to 30 seconds and changed the "Node Warning Level" to 10 seconds. It still took around 3 mintues for NPM to mark it as down.

    Am I still being too impatient?!

  • EllisD

    The other thing you can check is under Web Console Settings on the Admin page.  By default, the Orion page only refreshes every 5 minutes.

  • with those settings (assuming you are constantly refreshing either manually or by ways that kweise mentioned) or it should take less than a minute for the node to appear as down... maybe you should open a support case to investigate this issue

    PS: if you changed the polling interval for all nodes and you have a lot of them, this might overload your database which in turn might cause this delayed notification.

    with "appear" as down you mean the icon being green instead of red or do you mean an alert being sent out? because you also need to adjust alerts if you want them to trigger sooner or later...

  • I was manually refreshing the web page. Think it's time for a support query to find out what's going on.

    I only changed the polling interval to 30 seconds as a test, it was still slow/sluggish when set to 1 minute, 2 minutes etc.

    By "appear", i meant turn red :-(

    Thanks for your help.

    Ellis

  • isn't there also a setting somewhere that indicates 3 missed polls before it marks it down?

Reply Children
  • That sounds vaugely familiar but I can't find that setting anymore.  However, there is a setting under Orion Polling Settings for Node Warning Level.  It sets the number of seconds before Orion marks devices as down.  I'm guessing the default is 120 seconds.  It's at the bottom of the page under Calculations & Thresholds.



  • That sounds vaugely familiar but I can't find that setting anymore.  However, there is a setting under Orion Polling Settings for Node Warning Level.  It sets the number of seconds before Orion marks devices as down.  I'm guessing the default is 120 seconds.  It's at the bottom of the page under Calculations & Thresholds.



    So SW, does that mean you went to a FAST POLL if a poll is missed or what is the scenario?
    thx

  • Realised that I never thanked you all for your responses.

     

    Transpires the issue was down to the fact that I had ramped up the SNMP polling to every 1 minute.  Poor NPM was getting swamped.

    Have knocked it back to 10 minutes, nodes now get marked down within timeframe specified by the "ICMP polling interval" and the "node warning threshold".

    Support also sent me this useful info:

     A device may drop packets or fail to respond to a poll for many reasons. Should the device fail to respond, the device status is changed from Up to Warning. On the Node Warning Interval tab, you specify how long it will remain in the Warning status before it is marked as Down. During the interval specified, the service performs "fast polling" to continually check the node status.


    Please see below on changing the fast poll interval in Network Performance Monitor


    1. Click on File.
    2. Click on Advanced Settings.
    3. Select the Node Warning Interval tab.
    4. Adjust the scrollbar to suit your network's needs.


    The "Fast Poll" only occurs from the time NPM detects a problem thru the "Node Warning Period". Once NPM has determined the interface or node is DOWN it marks it as down and reverts to the normal polling interval timeframe.


    Notes:
    - To reduce the amount of packet loss reported by Orion NPM, configure the polling engine to retry ICMP pings a specific number of times before reporting packet loss. To do this, add the string value: “Response Time Retry Count” to the Windows Registry in the Settings folder of: HKEY_LOCAL_MACHINE/SOFTWARE/SolarWinds.Net/SWNetPerfMon/. Set the value data to the number of retries you prefer.


    - You may see events or receive alerts for down nodes that are not actually down. This can be caused by intermittent packet loss on the network. Set the Node Warning Interval to a higher value to avoid these false notifications.
    Please let me know if you require any further information.

     

     

    Thanks,

    Ellis

  • Great info.  I have a question for SW though.   This in in regards to the reg entry "Response Time Retry Count".  What is the default if this key is not present and set?