Version 1



    I'm really new at SQL, and it's probably going to become more apparent once you see my query. I encourage any of you SQL guru's out there to help me optimize this. =)


    I made a query to populate a report as a Top 10 by month with the following:


    Column 1 - The Top 10 most common Alert Names.

    Column 2 - The quantity of alerts per Alert Name.

    Column 3 - The object that is the Top Offender per Alert Name.

    Column 4 - The quantity of alerts for the Top Offender, per alert name.


    Here's an example end result. This indicates that our top error is errors above threshold, and it has triggered 125 times in the last month. The top offender is PHX1-RW1's Gi0/1, and it has generated 58 of these alerts.


    Alert NameQTYObjectNameTotal
    ${NodeName} ${IfName} is displaying ${InErrorsThisHour} errors above the threshold of 2000.125N-PHX1-1NW-RW1 - GigabitEthernet0/1 · NAME=N-PHX1-58
    Interface ${NetObjectName} for node ${NodeName} has TX utilization above 90%.86N-PHX1-1NW-FE2 - Adaptive Security Appliance 'Ethe38
    Interface ${NetObjectName} for node ${NodeName} has RX utilization above 90%.84N-PHX1-1NW-FE2 - Adaptive Security Appliance 'Ethe34
    NTA: Excessive Bulk Class Drops - ${PolicyName} on ${NodeName} has exceeeded threshold.77N-DFW1-1AB22-RW159
    TX - Node ${NodeName} has a response time above the 60ms threshold.46N-DFW1-1AB22-SS2U448
    Interface ${NetObjectName} for node ${NodeName} has excessive RX utilization.42N-PHX1-1NW-FE2 - Adaptive Security Appliance 'Ethe14
    Interface ${NetObjectName} for node ${NodeName} has excessive TX utilization.40N-PHX1-1NW-FE2 - Adaptive Security Appliance 'Ethe22
    AZ - Node ${NodeName} has a response time above the 100ms threshold.30N-PHX2-1NW-PU14
    NTA:${Node.Location} QOS Utilization exceeds threshold - ${ClassName}11N-DFW1-1AB22-RW110
    UPS Battery Run Time Low Alert5upsAdvBatteryCapacity on N-DFW2-3N-PU15


    Here's the query:


    With a (AlertDefID, QTY) AS
      SELECT TOP 10 AlertDefID, count(*) AS QTY FROM [dbo].[AlertLog]
      WHERE LogDateTime BETWEEN DATEADD(Month, -1, getdate()) and getdate()
      AND Message LIKE 'Success%'
      GROUP BY AlertDefID
    c (AlertDefID, QTY, ObjectName, Total) AS
      SELECT DISTINCT TOP 10000 b.AlertDefID, a.QTY, b.ObjectName, count(*) AS Total FROM [dbo].[AlertLog] AS b
      JOIN a AS a ON a.AlertDefID = b.AlertDefID
      WHERE b.LogDateTime BETWEEN DATEADD(Month, -1, getdate()) and getdate()
      AND b.Message LIKE 'Success%'
      GROUP BY b.AlertDefID, a.QTY, b.ObjectName
    d (Row, AlertName, QTY, ObjectName, Total) AS
      SELECT TOP 10000 ROW_NUMBER() OVER (PARTITION BY c.ALertDefID ORDER BY c.AlertDefID) AS Row, def.AlertName, c.QTY, c.ObjectName, c.Total
      FROM c
      JOIN [dbo].[AlertDefinitions] AS def ON c.AlertDefID = def.AlertDefID
    SELECT TOP 10 d.AlertName, d.QTY, d.ObjectName, d.Total FROM d WHERE d.Row = '1'


    So here's the idea:


    a = table to define the top 10 AlertDefID and the quantity of the alerts.

    c = This table helps us identify the quantity of alerts per ObjectName, per AlertDefID. This is TOP 10,000 because the result is a Cartesian product (basically, TOP 10 Devices per AlertDefID), the AlertDefID is repeated all over the place. We'll group by the AlertDefID (so all the repeats are together), Order by the a.QTY(reminder: this is the totals of each respective alert), then order by the total instances of each respective ObjectName.

    d = This adds a "Row" column on the "c" table, and this provides a numeric identifier that we can filter on that is based on the total instances of a specific Alert ID. Because "c" provides a Cartesian product, we need this column so we can filter out the hundreds of duplicate entries per alert. So now, if we look at alert ID "45ac" and see 10 entries, they'll be numbered 1 through 10, and this occurs for every unique Alert ID. In addition to this, we've now created an identifier that we can use that tells exactly where the highest value for the alert is, because we ordered by QTY and Total in "c". Also in "d", we convert AlertDefID to AlertName, so our final output is easier to interpret.

    27. This is the actual query after all the tables are built. This pulls everything but the row values from "d" and filters to include specifically the rows that include the Row value of 1 (our top offender per alert).


    Let me know what you guys think!