This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Delayed Alerts

Hi everyone,

Our company has a client who has NPM/SAM installed and has asked not to disclose their identity, so the thing is, that when ever a node in their network goes down or has any issue, an alert is sent 4-8 hours later, when my superiors discussed it with me, i was like what? Because i already have worked on NPM/SAM at my previous job here in Pakistan and our client was in US, and when ever there was an issue regarding any thing, alert was sent with in 30 secs. So i cross questioned my superiors about the re-discovery schedule and alert reporting time when any issue occurs, so my superiors said that they tried everything and checked everything, nothing looked like out of the blue and everything seemed normal. So they asked me to ask for help from you guys because some of you might have faced the same issue and could help me out identify the case, or provide me with some details in finding the root cause of the issue. Looking forward to hear from all the thwackians...

CourtesyITDanielleHaLTeReGo Jfrazier familyofcrowes jeremymayfield wabbott patrick.hubbard bsciencefiction.tv and others who could help.

  • Can you post up a copy of the alert? It sounds like there's something mis-configured in the alert itself.

  • That does sound a bit odd.  Given that this is a multi-continent/multi-timezone instance I would be interested in knowing what type of Network Time Protocol time zone you would be using and if all your devices are using the same.  Also, I am wondering on how prevalent this issue is.  Is it just a couple devices, 25 or so devices, or 100 devices spread across various time zones.  Also, can you provide more insight on your Logging architecture and design?  

  • the first few questions that came to my mind was, its impossible, it cannot be that late, are you serious or something might be mis-configured, re-discovery schedule would be after 4 hours or +, or the system configuration was very low that it took time to generate the alert and i dont think alert generation has anything to do with re-discovery etc.

  • and one more thing, when the devices were added they might have the ip of another subnet for eg 172.16.x.x and later their subnet changed to 10.x.x.x or 192.168.x.x  and they are up but the status is down.

  • So i just discussed with the client, they said that they have installed NPM on vmware and now it has become so severe that even for a whole day no alert is generated when a node goes down or comes back up. He said when they restart the vmware then everything comes back to normal and after some time it becomes bad again, and CourtesyIT​ can you be more specific for what kind of logs i need to ask from the client and from where the logs can be collected?

  • Ok,  basically when the node goes down, NPM should trigger an alert.  There can be various ways the alert is triggered and notification sent.  Is the customer waiting for an email or is the delay being noticed on the alerts section in NPM.  Can you supply a screen shot?   

  • i have asked for all the possible things which would cause the alerts to be delayed and asked for screen shots. Lets wait for the reply and then i will share it with you guys.

  • another cause could be that the system configuration would be not very high and number of nodes would be high, it has a lot to do with the system configuration, because for a short while the alerts are triggered successfully because when VM starts up, the RAM is free and after some time when there are many alerts the system start to get jammed because of utilization of a lot of RAM and they might not have required amount of RAM for that much nodes.

  • Hi Fazl

    If you have configured the alert correctly and still you are not getting the alert or you are getting delayed in alert.

    can you tell me how many elements you are polling from your main polar and what is the CPU utilization of the server at the time of polling?? 

    DELAYED ALERTS

    This question is Not Answered.

    fazl azeemLevel 14