5 Replies Latest reply on Oct 1, 2012 10:34 AM by Fred Lipton

    Alert messages stuck in queue

    Fred Lipton

      Had an embarrassing moment just now with alert emails apparently *stuck* on the NPM server and going out all at once instead of when they were actually generated.  The alerts monitor a set amount of available space on selected database volumes, separate alerts for different space levels.  They tested out successfully and have triggered successfully when first implemented.  Today, right after clearing alerts on the web console about 15 of these alerts came all at once, some not accurate for this moment in time suggesting to me that although it might have been accurate when generated but since were alleviated.

       

      This is bad on several fronts: an alert was triggered but the message didn't go to necessary recipients preventing timely action, a slew of notices went out to a rather large group unnecessarily, and most damaging is that corporate confidence in NPM and alerting has taken a hit.  I need to respond with an explanation and action plan to restore confidence and need the forum's help.

       

      Has this happened to others?  What could be causing alert emails to get stuck on the server?  Can it be fixed?

       

      Please, and thanks...Fred

        • Re: Alert messages stuck in queue
          mharvey

          The best place to start in determining a cause and potential fix would be the AleringService.log (or the swalert.log if you are on 10.2 or earlier).  This should have some errors (maybe in log.1,.2, or later) that show why the emails may have been jammed.  Could have been an issue connecting to exchange, or timeouts on the alert service trying to queue the actions.  Depending on what's in the logs, you can try things like timeout increases, connecting to exchange via IP, or even making sure that mail filter AV is disabled on the Orion server (too many emails can be seen as SPAM and then just held). 

           

          Regards,

          Matthew Harvey

          Loop1 Systems

          http://www.loop1systems.com

            • Re: Alert messages stuck in queue
              Fred Lipton

              Thnx Matt. That makes sense  A very good place to start, appreciate the tip. I'll report findings in a follow-up  Now, why do these things always seem to happen on Friday afternoons?

               

              Via PDA. Pls forgive typos

              • Re: Alert messages stuck in queue
                Fred Lipton

                The answer appears to be Exchange related.  The targeted distro list wasn't created until after the alert was configured and enabled so the alerts were generated properly but couldn't be delivered and were stuck in the queue.  Apparently when I acknowledged alerts in the web console they were kicked loose .

                 

                The log location was a bit difficult to find as it's configured as 'hidden'.  Once the log was found and examined the cause became clear .

                 

                Thanks for the guidance!..Fred

                 

                From: Fred Lipton

                Sent: Friday, September 28, 2012 6:12 PM

                To: <jive-862817195-34f1-2-3whi@solarwinds.hosted.jivesoftware.com>

                Subject: Re: - Alert messages stuck in queue

                 

                Thnx Matt. That makes sense  A very good place to start, appreciate the tip. I'll report findings in a follow-up  Now, why do these things always seem to happen on Friday afternoons?

                 

                Via PDA. Pls forgive typos

              • Re: Alert messages stuck in queue
                gregatkins

                What could be happening is your SQL server using a transaction log instead of writing directly to the database.  Let's say it takes SQL 2 hours to get to the log portion where an alert would be generated. It writes that log transaction to the database, generates the alert...only problem is that problem was 2 hours ago.

                 

                We had that happen once when we were migrating database backups off our Solarwinds server. The SQL server disk was getting hammered.  We also were extremely low on space ( <1GB) . We had alerts going out 8 hours later by the time the SQL server caught up when the transfer was done and the space was freed up.

                 

                You may want to use Solarwinds to investigate what was going on with the server at the time of those alerts