14 Replies Latest reply on Apr 20, 2017 3:27 PM by mandevil

    alert if the monitor is off for one instance

    gschultz@wbmi.com

      scenario:  I had a production SQL server instance rebooted during a recent patch cycle that was being monitored by SQ DPA.  However, DPA did not recover automatically, and I was not aware that the monitor failed to restart collection and re-sync until a user informed me.

       

      I would like an alert from DPA if a single instance is not getting collected.

       

      I set up two alerts:

           Database Instance Availability - seems to only check for a heart beat of the instance, not the monitor

           Database Instance Monitor Errors - only fires if there is an error, not if the monitor itself is offline.

       

      What am I missing?

      Thanks,

      Gary

        • Re: alert if the monitor is off for one instance
          mandevil

          The instance availability alert should alert you when a monitored instance goes down. I've tested in our environment, but would suggest you test in yours.

          Might you have a dev instance you can use for this?

            • Re: alert if the monitor is off for one instance
              gschultz@wbmi.com

              I am looking for an alert if the Monitor is offline for an instance - we had a reboot of the client, but the monitor did not come back up, while the client itself was up and running just fine.   The alert above will only fire if the client itself is down.

                • Re: alert if the monitor is off for one instance
                  mandevil

                  If I understand correctly, the monitor was stopped (likely due to instance being down and unresponsive). When the instance came back up, the monitor remained in the stopped state. Are you wanting a monitor the monitor? One thing I've done is schedule out a job that periodically runs a query against the DPA repository to set the command column to START. Something like:

                   

                  update COND set command='START' where...

                   

                  Or if you want ALL of them started, just don't include the where clause.

                  If you don't want things to be started automagically, you could have the job run a select of command and status and send you the results periodically so you know the current state.

                  Nothing exists to do this out of the box currently though.

                  3 of 3 people found this helpful
                    • Re: alert if the monitor is off for one instance
                      skenow

                      hi, thanks for your post, we are looking to build this same alert as one of our database monitors stopped recently, and the instance was and happy and we were unaware of it for awhile before someone brought it to our attention.

                       

                      would you be willing to share the command  and tables where to find this information?

                       

                      thanks so much in advance!

                       

                      skenow

                        • Re: alert if the monitor is off for one instance
                          mandevil

                          So not sure how you are going to schedule the job to run (SSMS or cronjob or ??), but here's what you will want to do:

                           

                          take a look at COND in the DPA repository. That table has the info you need and updates directly to it (be careful of course) can cause action to be taken - like starting monitoring programmatically.

                           

                          select NAME, COMMAND, STATUS from COND;

                           

                          Should give you something like this:

                          name command status
                          DPASQL2K5 START STARTED
                          DPASQL2K8 START STARTED
                          DPASQL2K8R2-WRG START

                          STARTED

                           

                          If the command is 'START' and the status is 'STOPPED', you know you the monitor is supposed to be up, but it's down.

                          Hope that gets you going in the right direction!

                          1 of 1 people found this helpful
                            • Re: alert if the monitor is off for one instance
                              skenow

                              hi Mandevil,

                              Yes thank you very much I just figured it out finally when I got your email,

                               

                              but your answer raises another question, if the monitor has really stopped, shouldn't the status be in a stopped state?

                              because the query reported it was in a start state and the monitor really had stopped would that indicate the reporting was off and something needed to be reset in the database to report accurately?

                               

                              also, can this be set up in a custom alert or just setup in a sql job?

                               

                              thanks much! skenow
                                

                                • Re: alert if the monitor is off for one instance
                                  mandevil

                                  Yes, this can be set up using any kind of logic or schedule (I won't get into those specifics as there are MANY ways one could implement this).

                                  What I meant is that the command and status *should* be consistent. If the command is set to 'START' and status is 'STOPPED', likely something has gone wrong and I'd look in the error logs.

                                    • Re: alert if the monitor is off for one instance
                                      skenow

                                      ok thank you for the clarification

                                      • Re: alert if the monitor is off for one instance
                                        skenow

                                        ok so I created a custom alert in dpa to monitor if monitoring has stopped on a server;

                                         

                                        (Executes a user-defined SQL statement that will return one or more name/numeric value pairs) using the SQL statement, select NAME, COMMAND, STATUS from [ignite].[COND] WHERE [STATUS] = 'STOPPED',

                                         

                                        Also set Notification Policy to  "Notify when level is not normal". and added 2 servers for testing one with monitor on and one with off,

                                         

                                        the first few tests of the alert gave correct results back, One status Normal, one Status Broken.  Now no matter how I test it (servers on or off) both servers come back as a broken state, even though the monitors are on.

                                        running the statement directly in SSMS brings back the correct results.

                                        there is no SQL Alert type currently set up that accurately describes my sql statement I'm running and there isn't a way to create one3 that I am aware of.

                                         

                                        can you provide some help on what the issue may be, I'm lost on this and we need this setup to monitor if monitoring has stopped on our servers?

                                         

                                        thanks in advance for any help you may provide me.

                                          • Re: alert if the monitor is off for one instance
                                            mandevil

                                            I see a couple of issues. Try this SQL using a custom multi-numeric return.

                                             

                                            select NAME+' Monitor is stopped', 1 from [ignite].[COND] WHERE [STATUS] = 'STOPPED' and [COMMAND] = 'START'

                                             

                                            This will prevent you getting alerted when the monitoring was turned off explicitly (command set to STOP) - which assuming would be a false positive for you.

                                            Also, the format of the multi-numeric alert can really only handle two outputs (one alphanumeric and one numeric value).

                                              • Re: alert if the monitor is off for one instance
                                                skenow

                                                thanks very much, that seems to work well and was working hen only 1 server monitor was down, but adding another one in a stopped state it fails to send alert message.  I created a alert group and assigned appropriate groups, etc to receive alerts but get no emails.  any ideas on why not?

                                                 

                                                and again thank you so much in advance !

                                                  • Re: alert if the monitor is off for one instance
                                                    mandevil

                                                    When you query the COND table, what does it show for command and status for each instance? The script I added will only flag instances that *should* be monitored but are not for some reason (like an error). Make sure you aren't expecting an alert if you stopped a monitor manually because the command in that case would be set to 'STOP' which gets excluded.

                                                      • Re: alert if the monitor is off for one instance
                                                        skenow

                                                        hi - I took off the last part [Start] because we want to know if its stopped no matter the reason.  but it does report now each instance in a stopped state, but I don't always get an email.  not sure why.

                                                         

                                                        and the more instances I have added to alert I get an email with each instance listed and stopped ones in body of email.  I will copy below.  can the alert be set so that it only gives me the stopped instances and with that instance name and not all the others?

                                                         

                                                        Alert: Susie Test Monitor Custom SQL Alert - Multiple
                                                        Numeric Return

                                                         

                                                        Database Instance: DPSHxxxxxxxx

                                                         

                                                        Execution Time: Thursday - April 20, 2017 09:40:46

                                                         

                                                        View Alert Status: http://mcl-swdpa1:8123/iwc/alertMain.iwc

                                                         

                                                        Alert Parameters:

                                                          Description:
                                                        <Not Specified>

                                                         

                                                        TEST for monitor is down please investigate

                                                         

                                                        Parameter: Fxx-Sxxxxx Monitor is stopped

                                                                        Value:
                                                        1

                                                         

                                                        This message was system-generated. Do not reply to this
                                                        message

                                                         

                                                        If I set to monitored databases it reports BROKEN,  if set to repository it sends me this message but only one time not each time the alert runs.  I would expect to receive an alert until the Stopped condition is fixed.

                                                        Also my alert level is set to 4 min and 10 max in HIGH.   Policy is set to "Notify when level is not Normal"

                                                         

                                                        I've read through the online documentation in the administrators guide but it does not really give good examples or 'how to' setup information so at a loss here.  are there others docs to look at also?

                                                         

                                                        thank you very much,