This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Monitoring an Active/Passive HA Application

In spite of the awesomeness that I've seen posted on Thwack for a myriad of issues I have yet to find a really elegant solution for active/passive application monitoring.  Now, I do have some ideas for when we move to SAM 6.2 (of higher) but here are my challenges.  Any ideas?

1)     An application runs on 2 nodes in an active/passive configuration.  The application monitor on the passive side always shows down.

2)     An application team requests an alert only when both the primary and secondary applications are down but there are multiple HA pairs to which the application template has been applied.  I've used this code in the past when the application names were similar, but unique, amongst the HA clusters but it won't work if there isn't a way to differentiate the nodes into distinct HA clusters even when they share an application template.

I should mention that we do use common naming nomenclature (not the names below -- they are changed to protect the innocent!) but the structure for HA pairs is always SERVER01/SERVER02, SERVER11/SERVER12, etc. etc.

I bet adatole‌ would like to take a crack at this one emoticons_wink.png

SELECT APM_AlertsAndReportsData.ComponentID AS NetObjectID, APM_AlertsAndReportsData.ComponentName AS Name

FROM APM_AlertsAndReportsData

INNER JOIN Nodes WITH (NOLOCK) ON (Nodes.NodeID = APM_AlertsAndReportsData.NodeId)

INNER JOIN APM_ApplicationCustomProperties WITH (NOLOCK) ON (APM_AlertsAndReportsData.ApplicationId = APM_ApplicationCustomProperties.ApplicationID)

LEFT OUTER JOIN (SELECT COUNT(DISTINCT(APM_AlertsAndReportsData.NodeId)) AS UpCount, ComponentName FROM APM_AlertsAndReportsData

  JOIN Nodes WITH (NOLOCK) ON Nodes.NodeID = APM_AlertsAndReportsData.NodeId

  WHERE (ComponentName like 'APP1%')

  AND APM_AlertsAndReportsData.ComponentStatus = 'Up'

  AND Nodes.Caption like '%HANode_APP1%'

       GROUP BY APM_AlertsAndReportsData.ComponentName) AS GTUP ON GTUP.ComponentName = APM_AlertsAndReportsData.ComponentName

WHERE 

(

  (

  (Nodes.Prod_State <> 'PROD') AND

  (Nodes.n_mute <> 1) AND

  (APM_ApplicationCustomProperties.a_mute <> 1) AND

  (APM_AlertsAndReportsData.ComponentName like 'APP1%')

  AND (GTUP.UpCount = 0 OR GTUP.UpCount IS NULL)

-- This check indicates that there are no applications in the HA cluster that are up  

  )

)

  • I did think about using groups but it feels like a management nightmare.  I have 12000+ nodes that we monitor.  Having to track which nodes are in an HA cluster for a specific application feels like opening Pandora's Box.  The solution needs to be scalable to the large enterprise level.

    Just when you thought it was going to be easy, eh?!?

  • I devised this query to find the 01/02 pairs (in this specific case it is only 01/02 pairs) required.  Just need to figure out how to slot it in the query above.

    SELECT TOP 10 HA1.Caption, HA1.Node, HA2.Node, HA2.Caption

    FROM

      (SELECT Caption, LEFT(Nodes.Caption,13) AS Node FROM Nodes WITH (NOLOCK) WHERE RIGHT(LEFT(Nodes.Caption,15),2) = '01') HA1

    INNER JOIN

      (SELECT Caption, LEFT(Nodes.Caption,13) AS Node FROM Nodes WITH (NOLOCK) WHERE RIGHT(LEFT(Nodes.Caption,15),2) = '02') HA2

    ON

      HA1.Node = HA2.Node

  • The initial creation of the groups may be time consuming initially, but their maintenance thereafter should be minimal provided you are using dynamic groups.

  • Yeah, I agree.  As it often the case though, the problem needs an immediate solution but doing it right requires an enterprise-scaled solution emoticons_silly.png

    We are definitely working towards that groups and dynamic queries to populate those groups.