12 Replies Latest reply on Oct 16, 2008 5:00 PM by mandg

    Alert not working

    mandg

      I've created an advanced alert to notify me when a node goes down based on the following:


      Status is equal to Down
      Vendor is equal to Windows


      However this doesn't seem to work as I'm not getting alerts when the Windows systems go down. I've used the test feature in the alerts area and successfully received the notifications from this. I've also ensured that the alert is enabled. Is there something obvious that I'm missing?

        • Re: Alert not working
          aLTeReGo

          "Status" should be "Node Status". See the following screenshot


            • Re: Alert not working
              mandg

              Yes, I'm sorry- typo on my part. ""Node Status" is configured in the query (I don't believe there is a "Status" only option).


              Perhaps some other obvious item I'm missing? Is anyone else successfully using "Vendor" in their Alert condition?

                • Re: Alert not working
                  aLTeReGo

                  I'm anxious to hear what kind of response you get back from either the Thwack community or Solarwinds themselves. I'd like to do something similar with my alerts, but before I go and break something that's currently working I would like to know if this is suppose to work. Honestly I don't see any reason why it shouldn't, but then again I just use the code, I don't write it. :)


                  • Re: Alert not working
                    qle
                    If your alert is as aLTeReGo posted, it should definitely work. I've defined an alert similar to this and it's been working fine for us. Just to double-check, you have both conditions created and the property to monitor (at the top) is set to "Node"?
                      • Re: Alert not working
                        mandg
                        Well I think I figured it out and, of course, user error was likely the cause. There was an additional condition in the Alert that specified "Vendor = VMware". Thus, the logic of the Alert was waiting for a 'Windows' system AND a 'VMware' system to be in 'Down Status' before triggering the alert. So I stripped out the Vendor = VMware condition and it seems to be working- albeit not real time. Which brings me to my next question- Where can I lower the response time to see when a system is down. Currently, when I disable a particular systems network interface, it takes about 4 minutes for me to receive the Alert. Such a delta would miss a majority of system restarts (which is really what I'd like this alert to monitor).

                        Thanks again for the support.

                          • Re: Alert not working
                            qle
                            When you test it, by disabling the interface, how long does it take for it to show down in Orion?
                              • Re: Alert not working
                                mandg

                                Well, Orion is still set at the default 4 minute interval for the page to refresh. But I manually kept refreshing after disabling the interface and, at about 120 seconds, it went to warning, and then another 60-120 seconds, it changed to 'Down' status and I received an alert (email). within seconds. So, it seems that NPM is the slowest link- I have changed the poll interval for NPM down from 120 seconds to 60. Is there some setting that sets the node in 'Down' state immediately after 60 seconds instead of 'Warning' (as seen in Orion)? (Though I think the 'Warning' is related to packet loss...)

                                  • Re: Alert not working
                                    assign an snmp trap on the switch interface going down, would be the basic answer.... but I think that you are monitoring virtuals with this alert? (Vendor=VMWare) so then I would look at the Virtual center's snmp trap abilities.

                                    2 cents dropped, see if wish comes true...
                                    • Re: Alert not working
                                      tdanner

                                      Orion pings each node every [Node Polling Interval - default 120 seconds]. If no response is received, the node status switches to Warning and we start polling it every 10 seconds. This continues until either we get a response - node status switches back to Up and we resume normal polling - or the "node warning interval" (default 120 seconds) expires and we mark the node Down.

                                      You can configure the node polling interval on a per-node basis and you can set the default node polling interval for new nodes in System Manager under File > NetPerfMon Settings > Polling. You can change the node warning interval in System Manager under File > Advanced NetPerfMon Settings.

                                      • Re: Alert not working
                                        the Node Warning Interval determines the length of time a node stays in the warning state.  This is configured from the System Administrator Console
                                        File >>> Advanced Settings >>> Node Warning Interval
                                        • Re: Alert not working
                                          casey.schmit
                                          I manually kept refreshing after disabling the interface and, at about 120 seconds, it went to warning, and then another 60-120 seconds, it changed to 'Down' status and I received an alert (email).
                                           

                                          You got the first part, changing the status polling interval to 60 seconds.  The second part of what your looking for is under 'Advanced Settings' in System Manager.  Go to the 'Node Warning Interval' tab.  From there you can drop the amount of time that a node will sit in the warning state before going to the down state.