0 Replies Latest reply on Sep 8, 2011 6:18 PM by Gerritt Dunn

    Alert management interface or utility to track Alert details\specifics

    Gerritt Dunn

      Apologies in advance for the length here......

      Precipitating this question\discussion is the 100+ plus alerts I have inherited in a production NPM environment. Where the Enterprise Infrastructure is across 25+ autonomous Districts with over 250+ unique US and International sites. Qty 6) pollers: 3k+ nodes\12k+ interfaces\8k+ volumes\20k+ elements  All major SW Modules.

      At present
      the collection of NPM production alerts I am trying to manage are primarily 'simple', not 'custom' alerts. Of benefit to me (and I presume others as well) would be some way to easily identify\track details\specifics across the full scope of production alerts for elements buried beneath multiple drill downs such as:

      1. Alert recipients (due to consistent, ongoing MACs, I need to frequently revisit\revise this element across Alerts)
      2. Trigger conditions (identify constructs and logic, including 'Property to Monitor' specific conditions, etc.)
      3. Variable strings in use (per alert! these can change with Orion releases and 'break' alerts, best would be to be able to globally find\replace when such changes occur.)
      4. Alert suppression  (i.e., details as above)
      5. Comments field (Used by us to track changes by Admin with date\timestamps, since there is no high level Admin logging details\timestamp in the toolkit at large.)
      6. Alert times (manage Weekday and Off Hours\Weekend time coverages)
      7. Alert Trigger Actions (in detail)
      8. Alert Reset Actions (in detail)

      Basically: deconstruct Alerts into a flat interface, or at the least, display the full unique Alert details in a single screen.

      Questions\Discussion point

      1. Does\has anyone created a unified Alert management interface to track details\specifics out of an enterprise level NPM Alert population?
      2. Or even a report to try and easily track Alert details?
      3. The benefit being solicited here avoids manually deconstructing the full scope of alerts to track details, in say, Access or Excel.
      4. Have I missed the toolkit GUI alert manager?
      5. I have reviewed Thwack for relevant content, and have an open ticket for this same request.
      6. Included here are possibly Alert (feature) enhancements, but I will let SW dig these out.

      Feature enhancements

      1. Alert GUI interface check box for "Include the generating alert name in the message body."
      2. Some way to track Admin changes: Author, date of last change, etc.. (Some way to log the changes made.)
      3. Some way to re-run the "Alert manager GUI" for updating alert changes, or better, be addressable in real time.
      4. Interact with Alerts globally or individually with a search constraint to make relevant variable changes. 
      5. Some form of search function against the full scope of active alerts per detail field.
      6. Some central Customer portal URL of feature requests\ features\enahncements in development for future release to remove any question of past requests already tabled.
      7. (nice to have) some form of boolean tracking to generate a compilation Alert for repetitive alerts, based on unique condition #'s of alerts\timeframe. with resets.

      A bit of background

      I have already visited the SQL Server DB to see if this request is (easily?) actionable in a complex report.

      From what I can interpret the NPM alert as constructed is a SQL query in parts Designer inputs with: Client side table joins and targeting Stored procedures on the DB side. I.e., hereis the SQL statement pulled from a very simple, basic alert: "Alert me when a component goes into warning or critical state." Parsing this to deconstructed details of the Alert tab selections and drill downs (i.e., Trigger action message body) is pretty cryptic (to me).

      <?xml version="1.0" encoding="UTF-8"?>
      <AlertDefinitions><AlertDefinition AlertDefID="{5EB4B441-F3D6-4DF1-AA39-A59B4B5191AB}" AlertName="Alert me when a component goes into warning or critical state" AlertDescription="This alert will write to the event log when an component goes into warning or critical state and when an component comes back up again." Enabled="True" StartTime="12:00:00 AM" EndTime="11:59:59 PM" DOW="1,2,3,4,5,6,7" TriggerQuery="SELECT APM_AlertsAndReportsData.ComponentID AS NetObjectID, APM_AlertsAndReportsData.ComponentName AS Name
        FROM Nodes INNER JOIN APM_AlertsAndReportsData ON (Nodes.NodeID = APM_AlertsAndReportsData.NodeId)
        WHERE 
        (
         (Nodes.Status &lt;&gt; '2') AND
         (
          (APM_AlertsAndReportsData.ComponentStatus = 'Critical') OR
          (APM_AlertsAndReportsData.ComponentStatus = 'Warning')
         )
        )" TriggerQueryDesign="&lt;QUERY&gt;&lt;KIND&gt;1&lt;/KIND&gt;&lt;COMPLEX&gt;&lt;TAG&gt;&lt;/TAG&gt;&lt;CONNECTIVE&gt;1&lt;/CONNECTIVE&gt;&lt;CHECKED&gt;1&lt;/CHECKED&gt;&lt;SIMPLE&gt;&lt;TAG&gt;&lt;/TAG&gt;&lt;ALIAS&gt;&lt;/ALIAS&gt;&lt;ADVANCED&gt;0&lt;/ADVANCED&gt;&lt;COMPARISON&gt;5&lt;/COMPARISON&gt;&lt;FUNCTION&gt;0&lt;/FUNCTION&gt;&lt;SORT&gt;0&lt;/SORT&gt;&lt;CHECKED&gt;1&lt;/CHECKED&gt;&lt;LEFTSIDEKIND&gt;2&lt;/LEFTSIDEKIND&gt;&lt;RIGHTSIDEKIND&gt;1&lt;/RIGHTSIDEKIND&gt;&lt;COMPARISONATTRIBUTES&gt;&lt;/COMPARISONATTRIBUTES&gt;&lt;FUNCTIONATTRIBUTES&gt;&lt;/FUNCTIONATTRIBUTES&gt;&lt;LEFTFIELDPATH&gt;Network Nodes.Node Status.Node Status&lt;/LEFTFIELDPATH&gt;&lt;RIGHTFIELDPATH&gt;&lt;/RIGHTFIELDPATH&gt;&lt;LEFTVALUETYPE&gt;0&lt;/LEFTVALUETYPE&gt;&lt;LEFTVALUE&gt;&lt;/LEFTVALUE&gt;&lt;LEFTCAPTION&gt;Node Status&lt;/LEFTCAPTION&gt;&lt;RIGHTVALUETYPE&gt;8&lt;/RIGHTVALUETYPE&gt;&lt;RIGHTVALUE&gt;2&lt;/RIGHTVALUE&gt;&lt;RIGHTCAPTION&gt;Down&lt;/RIGHTCAPTION&gt;&lt;/SIMPLE&gt;&lt;COMPLEX&gt;&lt;TAG&gt;&lt;/TAG&gt;&lt;CONNECTIVE&gt;2&lt;/CONNECTIVE&gt;&lt;CHECKED&gt;1&lt;/CHECKED&gt;&lt;SIMPLE&gt;&lt;TAG&gt;&lt;/TAG&gt;&lt;ALIAS&gt;&lt;/ALIAS&gt;&lt;ADVANCED&gt;0&lt;/ADVANCED&gt;&lt;COMPARISON&gt;0&lt;/COMPARISON&gt;&lt;FUNCTION&gt;0&lt;/FUNCTION&gt;&lt;SORT&gt;0&lt;/SORT&gt;&lt;CHECKED&gt;1&lt;/CHECKED&gt;&lt;LEFTSIDEKIND&gt;2&lt;/LEFTSIDEKIND&gt;&lt;RIGHTSIDEKIND&gt;1&lt;/RIGHTSIDEKIND&gt;&lt;COMPARISONATTRIBUTES&gt;&lt;/COMPARISONATTRIBUTES&gt;&lt;FUNCTIONATTRIBUTES&gt;&lt;/FUNCTIONATTRIBUTES&gt;&lt;LEFTFIELDPATH&gt;APM Component Monitors.Component Status&lt;/LEFTFIELDPATH&gt;&lt;RIGHTFIELDPATH&gt;&lt;/RIGHTFIELDPATH&gt;&lt;LEFTVALUETYPE&gt;0&lt;/LEFTVALUETYPE&gt;&lt;LEFTVALUE&gt;&lt;/LEFTVALUE&gt;&lt;LEFTCAPTION&gt;Component Status&lt;/LEFTCAPTION&gt;&lt;RIGHTVALUETYPE&gt;8&lt;/RIGHTVALUETYPE&gt;&lt;RIGHTVALUE&gt;Critical&lt;/RIGHTVALUE&gt;&lt;RIGHTCAPTION&gt;Critical&lt;/RIGHTCAPTION&gt;&lt;/SIMPLE&gt;&lt;SIMPLE&gt;&lt;TAG&gt;&lt;/TAG&gt;&lt;ALIAS&gt;&lt;/ALIAS&gt;&lt;ADVANCED&gt;0&lt;/ADVANCED&gt;&lt;COMPARISON&gt;0&lt;/COMPARISON&gt;&lt;FUNCTION&gt;0&lt;/FUNCTION&gt;&lt;SORT&gt;0&lt;/SORT&gt;&lt;CHECKED&gt;1&lt;/CHECKED&gt;&lt;LEFTSIDEKIND&gt;2&lt;/LEFTSIDEKIND&gt;&lt;RIGHTSIDEKIND&gt;1&lt;/RIGHTSIDEKIND&gt;&lt;COMPARISONATTRIBUTES&gt;&lt;/COMPARISONATTRIBUTES&gt;&lt;FUNCTIONATTRIBUTES&gt;&lt;/FUNCTIONATTRIBUTES&gt;&lt;LEFTFIELDPATH&gt;APM Component Monitors.Component Status&lt;/LEFTFIELDPATH&gt;&lt;RIGHTFIELDPATH&gt;&lt;/RIGHTFIELDPATH&gt;&lt;LEFTVALUETYPE&gt;0&lt;/LEFTVALUETYPE&gt;&lt;LEFTVALUE&gt;&lt;/LEFTVALUE&gt;&lt;LEFTCAPTION&gt;Component Status&lt;/LEFTCAPTION&gt;&lt;RIGHTVALUETYPE&gt;8&lt;/RIGHTVALUETYPE&gt;&lt;RIGHTVALUE&gt;Warning&lt;/RIGHTVALUE&gt;&lt;RIGHTCAPTION&gt;Warning&lt;/RIGHTCAPTION&gt;&lt;/SIMPLE&gt;&lt;/COMPLEX&gt;&lt;/COMPLEX&gt;&lt;/QUERY&gt;" ResetQuery="SELECT APM_AlertsAndReportsData.ComponentID AS NetObjectID, APM_AlertsAndReportsData.ComponentName AS Name
        FROM Nodes INNER JOIN APM_AlertsAndReportsData ON (Nodes.NodeID = APM_AlertsAndReportsData.NodeId)
        WHERE NOT 
        (
         (Nodes.Status &lt;&gt; '2') AND
         (
          (APM_AlertsAndReportsData.ComponentStatus = 'Critical') OR
          (APM_AlertsAndReportsData.ComponentStatus = 'Warning')
         )
        )" ResetQueryDesign="SIMPLE" SuppressionQuery="" SuppressionQueryDesign="&lt;QUERY&gt;&lt;KIND&gt;1&lt;/KIND&gt;&lt;COMPLEX&gt;&lt;TAG&gt;&lt;/TAG&gt;&lt;CONNECTIVE&gt;1&lt;/CONNECTIVE&gt;&lt;CHECKED&gt;1&lt;/CHECKED&gt;&lt;/COMPLEX&gt;&lt;/QUERY&gt;" TriggerSustained="0" ResetSustained="0" LastExecuteTime="9/8/2011 2:14:48 PM" ExecuteInterval="60" BlockUntil="9/8/2011 2:14:49 PM" ResponseTime="0" LastErrorTime="9/3/2011 5:39:07 PM" LastError="System.Data.SqlClient.SqlException: Timeout expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.
         at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection)
         at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection)
         at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj)
         at System.Data.SqlClient.TdsParser.Run(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj)
         at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString)
         at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async)
         at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method, DbAsyncResult result)
         at System.Data.SqlClient.SqlCommand.InternalExecuteNonQuery(DbAsyncResult result, String methodName, Boolean sendToPipe)
         at System.Data.SqlClient.SqlCommand.ExecuteNonQuery()
         at AlertingEngine.CheckAlert.UpdateRowsThatAreReset()" ObjectType="APM: Component" IgnoreTimeout="True"><AlertActions><AlertAction ActionDefID="{4449897B-5293-4402-86E9-CB3A59E386FF}" AlertDefID="{5EB4B441-F3D6-4DF1-AA39-A59B4B5191AB}" TriggerAction="True" ExecuteIfAcknowledged="True" TimeOffset="0" RepeatInterval="0" StartTime="12:00:00 AM" EndTime="11:59:59 PM" DOW="1,2,3,4,5,6,7" SortOrder="1" ActionType="NPMEventLog" Title="NetPerMon Event Log : Component  ${ComponentName} on Application ${ApplicationName} on Node ${NodeName} is ${ComponentStatus}" Target="" Parameter1="NetPerMon Event Log:  Component  ${ComponentName} on Application ${ApplicationName} on Node ${NodeName} is ${ComponentStatus}" Parameter2="" Parameter3="" Parameter4="" NetObjectType=""/><AlertAction ActionDefID="{DB86F4B0-1372-4EE8-B887-4A2F9F5E1AE1}" AlertDefID="{5EB4B441-F3D6-4DF1-AA39-A59B4B5191AB}" TriggerAction="False" ExecuteIfAcknowledged="True" TimeOffset="0" RepeatInterval="0" StartTime="12:00:00 AM" EndTime="11:59:59 PM" DOW="1,2,3,4,5,6,7" SortOrder="1" ActionType="NPMEventLog" Title="NetPerMon Event Log : Component  ${ComponentName} on Application ${ApplicationName} on Node ${NodeName} is ${ComponentStatus}" Target="" Parameter1="NetPerMon Event Log:  Component  ${ComponentName} on Application ${ApplicationName} on Node ${NodeName} is ${ComponentStatus}" Parameter2="" Parameter3="" Parameter4="" NetObjectType=""/></AlertActions></AlertDefinition></AlertDefinitions>