Version 7

    Here is an example to use SWQL to build a view to display problematic nodes (servers) with issues from one or more flowing areas:

     

    •    Node Status (column name: CONN) - (1 UP, 2 Down, ignore other status)

    •    Node Response Time (column name: M_SECS) - in milliseconds, (> 0 OR When Node is Down, it is -1). If M_SECS> 500: Warning, If  M_SECS> 500: Critical

    •    Node CPU Load (column name:  C_LOAD) - in percentage, (Between 0 - 100). If  C_LOAD > 95: Warning, If  C_LOAD > 98: Warning, If  C_LOAD =100: Down

    •    Node Memory Usage (column name:  R_Load) - percentage, (Between 0 - 100). If  R_LOAD > 95: Warning, If  R_LOAD > 98: Warning, If  R_LOAD =100: Down

    •    Node Highest Volume Usage (column name:  V_PERCENT) - (Between 0 - 100). If  V_PERCENT > 95: Warning, If  V_PERCENT > 98: Warning, If  V_PERCENT =100: Down

    •    Node Hardware Components worst  Status (column name: HW_Status) -  (UP, Undefined, Unknown, Warning, Critical, n/a)

    •    Node  Application worst Status (column name:  APP_Status)  - (UP, Unmanaged, Unknown, Unreachable, Warning, Critical, Down, n/a)

     

    Alert-swql.jpg

     

    In order to the worst (highest priority) condition are shown on the top of the list I gave each status different scores, and each column different weights. Then calculate total score as the priority. Here is the calculation:


    •    wConn (Connection), scores: Down - 1000, Up - 0; weight 1.00
    •    wTime (Response Time), scores:  > 1000ms - 80, >500ms - 10, other - 0; Weight 0.75
    •    wCPU (CPU Load), scores: 100% - 600, >98% - 80, >95% - 10, Other - 0; Weight 1.00
    •    wRAM (Memory Load), scores: 100% - 600, >98% - 80, >95% - 10, Other 0; Weight 1.00
    •    wVol (MAX(Volume Usage)), the highest volume usage of all volumes on a node, scores: 100% - 600, >98% - 80, >95% - 10, Other 0; Weight 0.75
    •    wHW  (Hardware Status (worst Value)), the worst HW component status of a node with HW monitor enabled   scores: Critical - 80, Warning - 10, Up - 0, other 1; Weight 0.50
    •    wApp (Application Status (worst value), the worst application statues of a node with application monitors assigned.  scores: Down - 600, Critical - 80, Warning - 10, Up - 0, other 1; Weight 0.50

     

    Maximum Total Weighted Score (Exclude wConn):  80*0.75 + 600*1.00 + 600*1.00 + 600*0.75 + 80*0.50 + 600 *0.50  = 2050

     

    Priority = ROUND((t1.wTime*0.75 + t1.wCPU*1.0 + t1.wRAM*1.0 + t1.wVol*0.75 + t1.wHW*0.5 + t1.wApp*0.5)/2.05 + t1.wConn*1.00, 2)

     

    Final Priority value is between 0 and 1000.

     

    You can change the score and weight to meeting your requirement.

     

    Steps:

    • Create a view; add “Custom Query” resource.

    swql.jpg

     

    • In the view, edit Custom Query:
    • In the Custom SWQL Query box, add the codes in attached file “thwack-swql-alerts.txt”
    • Enable search, and in Search SWQL Query box, add the codes in attached file “thwack-swql-alerts-withSearch.txt”

     

    Done!

     

    Using Search:

     

    •    By Node Name
    If you want to just display a node or a group of nodes with similar names, type node name or part of the name in the search box and click search button.
    •    By Connection Status
    If you want to just display nodes in DOWN status, type “n 1” (white space between n and 1) in the search box and click search button.
    •    By CPU or RAM or Volume usage
    If you want to just display node with CPU or RAM or Volume usage above certain level, using the following:

         o    “c 80”  (CPU usage above 80%)
         o    “r 80” (Memory usage above 80%)
         o    “v 80” (Volume usage above 80%)
    •    By Hardware Status
    If you want to just display node with certain hardware status, type “h status” (‘status’ can be one of the following: UP, undefined, Unknown, Warning, Critical, n/a).
    •    By Application Status
    If you want to just display node with certain application status, type “a status” (‘status‘ can be one of the following: UP, Unmanaged, Unknown, Unreachable, Warning, Critical, Down, n/a).

     

    You can customise the query to meeting your requirements.

     

    Thanks Alex Soul's post https://thwack.solarwinds.com/docs/DOC-174568, which is very helpful!

     

    ===========================

     

    Update:  As Alex suggested, I have updated the query and new files are attached. Thanks Alex!

     

    ===========================

     

    Update: 11/March/2015

     

    I have added 2 addition columns for Alert Prioritising Dashboard.

    One column is AlertTime, another one is Acknowledge (Ack). The Ack column is click-able. Right click it and open a new windows to View or Acknowledge an alert.

    Please see the additional document at https://thwack.solarwinds.com/docs/DOC-176727

     

    alert-000.jpg

     

    ============================

     

    Update: 11/11/2015

     

    The original query is for NPM & SAM, but if you only need NPM (network nodes) part, I did create another two queries for network devices only.

    The files: "networkNOC-ForThwack.txt"  and "InterfaceNOC-ForThwack.txt" are attached.

     

    "networkNOC-ForThwack.txt" is for network device (NPM) only.

    001.jpg

    "InterfaceNOC-ForThwack.txt" if is for network Interface only.

    002.jpg

    Both are limited to Vendor = 'Cisco', you can change it to meet your requirements.