This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Taking 5 minutes to be notified an AP is down

Ive monitored the controller and can see that when an AP is down, the controller knows about it within 30 seconds... Ive checked the polling time of the NPM alert for when an AP is down, which is set to the default of 1 minute.. so why is it actually taking 5 minutes to be notified via email that it is down?

I would assume that NPM is querying the controller about AP statuss, rather than actually polling the APs directly, so not sure why its taking so long to be notified...  Are their any debug or similar logs I can monitor to see whats going on in real time??

  • There is a fast ping that starts on the first failure. On my system it's 180 seconds, I think it works out to and average time of half the polling interval + 180 seconds, then that is checked every 1 minute which could add some delay, then there is any email delay.

  • You would need to verify the same in 2 more sections - Did you check Trigger Condition and Trigger Actions section on your NPM Alert ?

    T1.JPG

    If its not set under Trigger Condition , then please go ahead and check if there is a delay on your email action under Trigger Actions:

    T2.JPG

  • Have checked these settings, the first is set to 0 seconds, and the second isn't enabled so shouldn't be adding any delay...

  • Check timing in your events list to see when the device shows down and the time of the alert, if you haven't completed that already. Also, on an alert, check the first screen to see how often you are looking to see if the device is down.  You may already have looked at all this.

  • If the Trigger conditions are not delayed, the statistical polling is 60 seconds and the fast polling is set to 0 seconds (aka Node warning level) then I would expect the system to alert you after the first 60 seconds not 5 minutes. I'd be interested to find out what is causing the delay. If you don't get the pointers from some one here I'd get in touch with support to have a look

    If the AP points sends syslogs or traps then I would suggest having a look at alerting on them.

  • The APs do not directly support snmp, so if I wanted to discover and poll them, I could only do this via icmp/IP, which as we have several hundred, adding them manually isn't really practical.  It is a shame that as NPM knows about them, from the controller, that I simply cant import them as nodes and poll them accordingly.

    NPM only knows about the APs due to the controllers being nodes and using snmp, so their status is only known through communicating with the controller, which from their response below, is done via the statistics collection interval of the node.

    Had this answer from SW support:

    If the AP is added as it's own Node into Orion then the Node Polling Interval(Default 120 seconds) will be used.

    If however you're monitoring it through your Wireless LAN Controller then it uses the Statistics Collection Interval(Default 10 minutes)

    To increase Statistics Collection Interval just go to your Controller in Manage Nodes and Edit Properties on the Controller. Change Collect Statistics to a lower number.

  • The real question might be, if you're looking to be notified quickly of an AP going down, is polling the WLC the best way of doing this?

    Personally I'd go with an SNMP trap from the controller, I believe the "AP Register" one will let you know if an AP associates or disassociates.

    If you have good, overlapping wireless coverage I wouldn't think this would be that critical though...

  • That's actually how we do it, meaning we add by ping the ap as a node itself and we have them set up with static ap's since they are all lightweight devices.  I've found that if something happens to the WLC and it loses it connections that it only sees what's new and what it loses while it's up, not what was lost to begin with.  I have alerting set up based on the ping and that works for us.

  • Nope.. I would have to agree that it probably isn't!

    Given that NPM displays info about your thin APs, and therefore their device name and IP, you would think that it could be fairly straightforward to add them as monitored nodes!  If I run a discovery and it discovers an "unknown device", why not check that against any other devices that it might now about even if discovered via an snmp query of the controller, the put the 2 pieces of info together to resolve the "unknown device"!

  • What type of AP/Controller? Does the controller generate traps indicating the tunnel between the AP and the controller has gone down?

    I use a script off the TrapReceiver to set the status of the AP automatically (saves increasing the polling time). This sets the AP whose IP address matches the command line argument passed in as available, and down. I have an equivalent script that flags it as up when it returns to service (via a trap). The periodic polling catches any missed traps

    Const DB_CONNECT_STRING = "Provider=SQLOLEDB.1;Data Source=server;Initial Catalog=SolarWindsOrion;User ID='username';Password='password';"

    if WScript.Arguments.Count = 0 then

        WScript.Echo "Missing parameters"

        Wscript.Quit

    end if

    Set myConn = CreateObject("ADODB.Connection")

    Set myCommand = CreateObject("ADODB.Command" )

    myConn.Open DB_CONNECT_STRING

    Set myCommand.ActiveConnection = myConn

    ' generate an update statement from the input

    ' myCommand.CommandText = SQL update statement here

    myCommand.Execute

    myConn.Close

    You can extend this idea to set custom properties on nodes or interfaces for other trap-directed alerting.

    aside: I get pretty circumspect about sharing scripts that directly modify the database since they tend to be Orion-version specific and could be dangerous if improperly used. So, I'll leave it to you to figure out the correct update statement to generate in the script if this is an approach you want to take.