6 Replies Latest reply on Jun 22, 2017 5:28 PM by shaun_9999

    Alerts not being triggered.

    shaun_9999

      Good morning fellow thwackers.

      I am having an issue with two servers that run at 100% for two hours in the early morning this is expected as they do processing jobs for an external client.  I changed some alerting for all nodes (including these ones) within our estate.

      I set the trigger time for 15 minutes before the alert is triggered (condition must exist > 15 minutes) rather than 2 minutes that was previously set.

       

      Here is an example of when the server was being used for 100% during the two hours jobs are processed.  Irrespective of how many resources it has, it will take 100% of all resources to process this job.

      These servers (there are 8 in total) are basically hosting solutions by themselves:

      256gb RAM, 3 CPU's/12 cores and they run on fast fibre storage.  We monitor them because sometimes the application that processes and deals with the nightly jobs can get a little stuck and not stop the jobs.  Ergo you have to reboot the apache service.  I know you can auto set services to reboot, but we can't do that as the client doesn't want to rely on an automatic process in the background.  So sometimes, maybe four times a year we reboot the service

       

      We also have a 3rd party that takes calls during the night and weekends.  The trigger was set to 3 minutes (which seemed a little low to me.) apart from that everything is a mirror of the alerts.  Though we are not getting the trigger when the process job is running.

      A test run works fine for the eight servers in, so I can send a test email alert.  But in a real-world environment we never see the alert.

       

      I think the old alerting is a little broke somewhere.  When I turn on the old alerting, it will send an email every few minutes to tell you everything is fine, again they all have the same blanket config, so it can't can anything in configuration.

      Has anyone any advice on this?

        • Re: Alerts not being triggered.
          d09h

          When troubleshooting email alerts (especially when expected emails don't appear), I've occasionally configured the alert to create a SolarWinds event, or write to a text file (or both).  In at least one case, my mail process was getting blocked, and I eventually proved it.  The evidence of the NPM event and the text file were helpful in getting others to assist in determining who was breaking the email piece and how.  By creating an event or text log, I was able to prove/ disprove alerting was working. Perhaps you could try something similar.

            • Re: Alerts not being triggered.
              shaun_9999

              I have an action to place a log in netperfmon.  Since this has been implemented (21 days ago) we have not received the log to capture this event.  Which I why I know that the alert isn't being triggered.  I also implemented it on the mits alert, which is why I know it was triggered.

               

              The SMPT is the same (we have one SMTP server) we also receive alerts for other servers with a high CPU usage.

              My only other option is that I generate a separate alert for these nodes, but that's something I have tried to move away from as the old environment had an alert for every action on every individual node.