6 Replies Latest reply on Sep 30, 2009 3:55 PM by g33kfu

    SQL 2008 Cluster monitoring

      Hi,

      We have a SQL 2008 Cluster that we are monitoring with ipMonitor. When we fail an instance from node 1 to node 2, the monitors carry on working happily. However, when we fail back to the original node 2, ipMonitor starts to generate these alerts. Any idea why?

      cmsdbs (SQL Server (MSSQLSERVER))

      status: Call was canceled by the message filter; oserror: 0x80010002

      alert: 1 from a maximum of 1

      critical failures: 1

        • Re: SQL 2008 Cluster monitoring

          This seems to be intermittant. It generates an alert, sometimes a couple, for random services / drives that are being monitored on the cluster.

          Perhaps our timings are too sensitive / too short for the monitors, or is it something with WMI?

          Any recommendations / advice would be appreciated.

            • Re: SQL 2008 Cluster monitoring
              Peter.Cooper

              Rhodan,

              Forgive me... but can you tell me what monitor type is spitting out the error message. You did mention WMI, but I didn't want to assume.

              Edit: Added "type" next to monitor.

                • Re: SQL 2008 Cluster monitoring

                  Yes we are using WMI for these monitor's, as it's monitoring mount points which RPC does not seem to pick up, it just picks up the root drive. WMI is able to pick up the mount points on the root drive.

                    • Re: SQL 2008 Cluster monitoring

                      OK the failure can happen when failing over to either node. It's intermittant, and different monitor's fail each time, sometimes one, sometimes a couple.

                      I set the monitor to basically test twice before alerting, and that seems to have fixed it. I notice they do change to warn, and on the second test they change to up. Might be something to do with WMI by the looks of it.

                        • Re: SQL 2008 Cluster monitoring
                          Peter.Cooper

                          Rhodan, gotcha.

                          My first thought was that you may have been using a user experience ADO monitor, thanks for clearing that up. It does sound like WMI doesn't fail over gracefully enough. I can only guess why it does that. WMI stacks get better the more modern the OS.

                          You're following the correct approach: Increase the number of failures per notification. You also have the option to increase the "delay while warn".

                            • Re: SQL 2008 Cluster monitoring
                              g33kfu

                              I believe this is an issue with the performance counters.   We have seen similar things with SQL 2005 clusters.  If you failover from A-->B it's fine.  Then if you go back to A it won't work.  If you were to reboot the host between the failover then it would be ok.  You can verify that this is the same issue by doing your A-->B and then B-->A and then go into the windows performance monitor to see if the SQL items are there.  We have not found any real solution to the problem