cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post
Level 9

Getting node down alert even after unamanging the node

Even after we unmanage our nodes, we are getting alerted on node downs. When we checked we found that the node table is getting updated properly. "Unmanagefrom" , "UnmanageUntil" and "unamanged" - all are updated with expected data. But still the node is not going to "unmanaged" state. The applications under it are going to unknown state at the scheduled downtime. SNMP has stopped collecting data from the specified time. Currently the "unamanged" field is set to true in node table and node status is still green. But we can not unmanage it through orion portal as "unmanage" button is greyed out. This is happening for more nodes now. Any idea what may be the issue? Quick response is much appreciated. Our NPM version is 10.2.2 and SAM 5.0.1 . Both are recently upgraded. The issue happened after we upgraded SAM last week. ( from APM 4.0)

12 Replies
Level 13

Please go into the Polling Settings.  Settings>Polling Settings.

Scroll down to find. "Allow alert actions for unmanaged objects"

Make sure this isn't checked off.  If so you will be alerting on all unmanged objects if you have any alert rules or conditions the object falls under.

Hope this helps you out.

Not to move in on OPs thread, but I seem to be having a similar issue to OP. I have two main alerts that simply alert me when either a Node Status is not equal to up or an application status is not equal to up. I than have in the Alert suppression tab to suppress alert when Node / Application status is equal to UnManaged.

I am still getting alerts for all unmanaged nodes. I checked the setting you listed to make sure and it is not checked, I do not think that is the problem. Any other ideas why any unmanaged nodes and application are being reported on?

EDIT:

Also, just to clarify.. The alerts I am getting for the unmanaged node and application are not saying they are down, they are just having emails sent out for them stating that they are unmanaged, and one alerted unknown when it was unmanaged for some reason.

Can we see the triggers and conditions you are using.  I would assume right now it's safe to say that the OP is using a condition stating..

Node Status is not equal to Up.

This would cause what you are seeing since the alert is stating that any status other than Up should be alerted upon.  I stay away from this and state this

Node Status is equal to Down

This way I only report out when the node is down.  I don't care about Warning, Unmanaged, or anything in between up/down for that matter.  I have interface/error alerts I use that are for watching the health of a router.  I keep my up/down alerts just that.. Only an Up/Down condition to alert upon.

This is just a guess until I can see both the OP's alert Triggers and Conditions as well as yours

Just to update one more time in case anybody else comes across this post. I was able to get both of the alerts to work correctly now with the UnManaged state.

For the node, it was as simple is making it to where the node would not alert unless is was equal to down rather then alert whenever it was not equal to up.

For the application alerts, I did as follows:

Trigger Type - Application

Trigger alert when ALL of the following apply

     Application status is not equal to up

Suppress Alert when:

Trigger alert when ALL of the following apply

     Node status is equal to UnManaged

     Application status is equal to UnManaged


Now, what got me was when I was setting the suppress alerts conditions, when I would try to create the line "Application status is equal to UnManaged" the drop down box only had UP in it so I figured that was the only option I had to work with.. I realized I could actually click inside the drop down box and could type in it to put whatever I wanted in there. Whether this is as intended, a bug, or me just missing what should of been a simple solution, I'm not sure. Either way it's what resolved the issue for me. Hope this helps anybody else having problems like this.


Thanks,

Jeryn

If an Application is on a node and the node becomes down/unmanaged I believe it is a known dependency that the application can't function and should go into an Unreachable status.  This could be why your suppression didn't seem to work.  In theory you shouldn't have to supress application alerts that are caused by the Parent node being down.

Also, I would watch using just Application status is not equal to Up.  This means that the whole application has to drop for the trigger to happen.  What about when just 1 component of an application drops.

Change your trigger type to APM: Application Component.  You still get all the logic of APM: Application Name, but you have the added conditions of Application Component which can be used as the trigger for individual components inside a template.

I have been getting emails about applications being unmanaged when I unmanaged a node for a restart, The only way I have found to resolve this issue it my above email.. If they should not be sending out emails by default, then something else is going on.

The way I have my application monitor setup, It seems to be reporting even if one component in the application goes down. Take the following Emails for example:

Capture.PNG

The only downside I think I might run into is if this alerts, the application have already alerted for having an issue. Is another part on this application goes down before it gets reported as UP again, I am not 100% sure whether or not it will alert me a second time with the new components that are having issues.

Should it not be sending me alerts like this?

Is the above information what you was looking for, or was you looking for something else? Should the above statements be working fine? Either way, they will get tested out again over the weekend. I did go ahead and change the Node alert liek you stated to only report as down. I can't do that with the application alert however. Ill update this post come Monday after I see how it handles them over the weekend.

Pretty much I have two alerts I am having issues with.. This is how they are setup.

Alerts 1:

Trigger Type - Node

Trigger alert when ALL of the following apply

     Node status is not equal to up

Suppress Alert when:

Trigger alert when ALL of the following apply

     Node status is equal to UnManaged

Alert 2:

Trigger Type - Application

Trigger alert when ALL of the following apply

     Application status is not equal to up

Suppress Alert when:

Trigger alert when ALL of the following apply

     Node status is equal to UnManaged

I could indeed change the node alert to only alert when the node goes down, or go into more detail and probably do something like:

Trigger Type - Node

Trigger alert when ANY of the following apply

     Node status is equal to DOWN

     Node status is equal to WARNING

     Node status is equal to UNKNOWN

However, with the application alert, I cannot do anything like that. I only have the option of UP so it pretty much has to be a "Not equal to UP" trigger. If there is a better way to do it, I am all ears!

Thanks,

Jeryn

I have checked in polling settings. I was not able to find "allow alert actions for unmanaged objects". We are using NPM 10.2.2

Level 9

Also I noticed that we have given the unmanage beginning time as 13:00 and DB it is updated as 13:16. Why is it behaving in this way?

Have you opened a support ticket with SolarWinds?  If not, then I would recommend doing so.

Thanks,

Chrystal Taylor

http://www.loop1systems.com


Chrystal Taylor | Head Geek

Yes, we have a case opened with the support. We had severe end user performance issue. Many of the views will not get loaded, Reports will not complete, the site will get stuck and can not navigate etc. We raised a case and support suggested us to upgrade our NPM and APM version. We have updated NPM version from 10.1.0 to 10.2.2 and APM 4.0.1 to SAM 5.0.1. After upgrade we are getting more issues. Now rather than looking into this particular issue and resolving, support is suggesting us to move webserver from our primary server to an independent server, moving from RAID 5 to RAID 10, increasing the statistic polling interval, decreasing the data retention, upgrading all other modules - IPAM, NTA etc to the latest version, Upgrading NPM to 10.3. But I am personally not convinced that approach. We have 1 primary server, 5 additional pollers and SQL server. Please find the element statistics details below.

ServerN01
N04G01
N05
NP01
GP01
Type of Polling Engine
PrimaryAdditionalAdditionalAdditionalAdditionalAdditional

Polling Engine Version

2011.2.2
2011.2.2
2011.2.2
2011.2.2
2011.2.2
2011.2.2
Polling Completion99.9799.9599.1599.97100100

Elements

7084596216925126240211
Network Node Elements2933901634322822
Volume Elements511502911333545174156

Interface Elements

628054339611493833
Polling rate57%28%8%27%1%1%
Job wait16138513169484337

We recently started getting web site error(after upgrade). It says the process is being deadlocked. Any suggestion?