12 Replies Latest reply on Oct 24, 2012 2:26 PM by jeryn

    Getting node down alert even after unamanging the node

    dilu

      Even after we unmanage our nodes, we are getting alerted on node downs. When we checked we found that the node table is getting updated properly. "Unmanagefrom" , "UnmanageUntil" and "unamanged" - all are updated with expected data. But still the node is not going to "unmanaged" state. The applications under it are going to unknown state at the scheduled downtime. SNMP has stopped collecting data from the specified time. Currently the "unamanged" field is set to true in node table and node status is still green. But we can not unmanage it through orion portal as "unmanage" button is greyed out. This is happening for more nodes now. Any idea what may be the issue? Quick response is much appreciated. Our NPM version is 10.2.2 and SAM 5.0.1 . Both are recently upgraded. The issue happened after we upgraded SAM last week. ( from APM 4.0)

        • Re: Getting node down alert even after unamanging the node
          dilu

          Also I noticed that we have given the unmanage beginning time as 13:00 and DB it is updated as 13:16. Why is it behaving in this way?

          1 of 1 people found this helpful
            • Re: Getting node down alert even after unamanging the node
              Chrystal Taylor

              Have you opened a support ticket with SolarWinds?  If not, then I would recommend doing so.

               

              Thanks,

               

              Chrystal Taylor

              http://www.loop1systems.com

              1 of 1 people found this helpful
                • Re: Getting node down alert even after unamanging the node
                  dilu

                  Yes, we have a case opened with the support. We had severe end user performance issue. Many of the views will not get loaded, Reports will not complete, the site will get stuck and can not navigate etc. We raised a case and support suggested us to upgrade our NPM and APM version. We have updated NPM version from 10.1.0 to 10.2.2 and APM 4.0.1 to SAM 5.0.1. After upgrade we are getting more issues. Now rather than looking into this particular issue and resolving, support is suggesting us to move webserver from our primary server to an independent server, moving from RAID 5 to RAID 10, increasing the statistic polling interval, decreasing the data retention, upgrading all other modules - IPAM, NTA etc to the latest version, Upgrading NPM to 10.3. But I am personally not convinced that approach. We have 1 primary server, 5 additional pollers and SQL server. Please find the element statistics details below.

                  ServerN01
                  N04G01
                  N05
                  NP01
                  GP01
                  Type of Polling Engine
                  PrimaryAdditionalAdditionalAdditionalAdditionalAdditional

                  Polling Engine Version

                  2011.2.2
                  2011.2.2
                  2011.2.2
                  2011.2.2
                  2011.2.2
                  2011.2.2
                  Polling Completion99.9799.9599.1599.97100100

                  Elements

                  7084596216925126240211
                  Network Node Elements2933901634322822
                  Volume Elements511502911333545174156

                  Interface Elements

                  628054339611493833
                  Polling rate57%28%8%27%1%1%
                  Job wait16138513169484337

                  We recently started getting web site error(after upgrade). It says the process is being deadlocked. Any suggestion?

                  1 of 1 people found this helpful
              • Re: Getting node down alert even after unamanging the node
                Richard Nicholson

                Please go into the Polling Settings.  Settings>Polling Settings.

                 

                Scroll down to find. "Allow alert actions for unmanaged objects"

                 

                Make sure this isn't checked off.  If so you will be alerting on all unmanged objects if you have any alert rules or conditions the object falls under.

                 

                Hope this helps you out.

                1 of 1 people found this helpful
                  • Re: Getting node down alert even after unamanging the node
                    dilu

                    I have checked in polling settings. I was not able to find "allow alert actions for unmanaged objects". We are using NPM 10.2.2

                    1 of 1 people found this helpful
                    • Re: Getting node down alert even after unamanging the node
                      jeryn

                      Not to move in on OPs thread, but I seem to be having a similar issue to OP. I have two main alerts that simply alert me when either a Node Status is not equal to up or an application status is not equal to up. I than have in the Alert suppression tab to suppress alert when Node / Application status is equal to UnManaged.

                       

                      I am still getting alerts for all unmanaged nodes. I checked the setting you listed to make sure and it is not checked, I do not think that is the problem. Any other ideas why any unmanaged nodes and application are being reported on?

                       

                      EDIT:

                       

                      Also, just to clarify.. The alerts I am getting for the unmanaged node and application are not saying they are down, they are just having emails sent out for them stating that they are unmanaged, and one alerted unknown when it was unmanaged for some reason.

                      1 of 1 people found this helpful
                        • Re: Getting node down alert even after unamanging the node
                          Richard Nicholson

                          Can we see the triggers and conditions you are using.  I would assume right now it's safe to say that the OP is using a condition stating..

                           

                          Node Status is not equal to Up.

                           

                          This would cause what you are seeing since the alert is stating that any status other than Up should be alerted upon.  I stay away from this and state this

                           

                          Node Status is equal to Down

                           

                          This way I only report out when the node is down.  I don't care about Warning, Unmanaged, or anything in between up/down for that matter.  I have interface/error alerts I use that are for watching the health of a router.  I keep my up/down alerts just that.. Only an Up/Down condition to alert upon.

                           

                          This is just a guess until I can see both the OP's alert Triggers and Conditions as well as yours jeryn

                          1 of 1 people found this helpful
                            • Re: Getting node down alert even after unamanging the node
                              jeryn

                              Pretty much I have two alerts I am having issues with.. This is how they are setup.

                               

                              Alerts 1:

                               

                              Trigger Type - Node

                              Trigger alert when ALL of the following apply

                                   Node status is not equal to up

                               

                              Suppress Alert when:

                              Trigger alert when ALL of the following apply

                                   Node status is equal to UnManaged

                               

                              Alert 2:

                               

                              Trigger Type - Application

                              Trigger alert when ALL of the following apply

                                   Application status is not equal to up

                               

                              Suppress Alert when:

                              Trigger alert when ALL of the following apply

                                   Node status is equal to UnManaged

                               

                               

                              I could indeed change the node alert to only alert when the node goes down, or go into more detail and probably do something like:

                               

                              Trigger Type - Node

                              Trigger alert when ANY of the following apply

                                   Node status is equal to DOWN

                                   Node status is equal to WARNING

                                   Node status is equal to UNKNOWN

                               

                              However, with the application alert, I cannot do anything like that. I only have the option of UP so it pretty much has to be a "Not equal to UP" trigger. If there is a better way to do it, I am all ears!

                               

                              Thanks,

                              Jeryn

                              1 of 1 people found this helpful
                              • Re: Getting node down alert even after unamanging the node
                                jeryn

                                Is the above information what you was looking for, or was you looking for something else? Should the above statements be working fine? Either way, they will get tested out again over the weekend. I did go ahead and change the Node alert liek you stated to only report as down. I can't do that with the application alert however. Ill update this post come Monday after I see how it handles them over the weekend.

                                1 of 1 people found this helpful
                                • Re: Getting node down alert even after unamanging the node
                                  jeryn

                                  Just to update one more time in case anybody else comes across this post. I was able to get both of the alerts to work correctly now with the UnManaged state.

                                   

                                  For the node, it was as simple is making it to where the node would not alert unless is was equal to down rather then alert whenever it was not equal to up.

                                   

                                  For the application alerts, I did as follows:

                                   

                                  Trigger Type - Application

                                  Trigger alert when ALL of the following apply

                                       Application status is not equal to up

                                   

                                  Suppress Alert when:

                                  Trigger alert when ALL of the following apply

                                       Node status is equal to UnManaged

                                       Application status is equal to UnManaged


                                   

                                  Now, what got me was when I was setting the suppress alerts conditions, when I would try to create the line "Application status is equal to UnManaged" the drop down box only had UP in it so I figured that was the only option I had to work with.. I realized I could actually click inside the drop down box and could type in it to put whatever I wanted in there. Whether this is as intended, a bug, or me just missing what should of been a simple solution, I'm not sure. Either way it's what resolved the issue for me. Hope this helps anybody else having problems like this.


                                  Thanks,

                                  Jeryn

                                  1 of 1 people found this helpful
                                    • Re: Getting node down alert even after unamanging the node
                                      Richard Nicholson

                                      If an Application is on a node and the node becomes down/unmanaged I believe it is a known dependency that the application can't function and should go into an Unreachable status.  This could be why your suppression didn't seem to work.  In theory you shouldn't have to supress application alerts that are caused by the Parent node being down.

                                       

                                      Also, I would watch using just Application status is not equal to Up.  This means that the whole application has to drop for the trigger to happen.  What about when just 1 component of an application drops.

                                       

                                      Change your trigger type to APM: Application Component.  You still get all the logic of APM: Application Name, but you have the added conditions of Application Component which can be used as the trigger for individual components inside a template.

                                      1 of 1 people found this helpful
                                        • Re: Getting node down alert even after unamanging the node
                                          jeryn

                                          I have been getting emails about applications being unmanaged when I unmanaged a node for a restart, The only way I have found to resolve this issue it my above email.. If they should not be sending out emails by default, then something else is going on.

                                           

                                          The way I have my application monitor setup, It seems to be reporting even if one component in the application goes down. Take the following Emails for example:

                                           

                                          Capture.PNG

                                          The only downside I think I might run into is if this alerts, the application have already alerted for having an issue. Is another part on this application goes down before it gets reported as UP again, I am not 100% sure whether or not it will alert me a second time with the new components that are having issues.

                                           

                                          Should it not be sending me alerts like this?

                                          1 of 1 people found this helpful