CPU Monitoring With Process Snapshot (PowerShell)

Version 2

    I have always had trouble with the out-of-the-box CPU alerting. By the time the process poller ran whatever had been spiking my CPU was back down so the emails always had a high total usage with a top 10 processes showing very normal usage.

     

    This is a PowerShell script that reads the Process\% Processor Time performance counter. Total usage is calculating by subtracting the Idle process' usage (divided by the # of CPU cores) from 100. If this value is greater than the threshold then the top 10 process names and usage are selected from the same data. This is the key point, the top 10 returned processes will accurately reflect what was actually going on during a CPU spike.

     

    The Top 10 processes are written as a HTML table to the component's Message field.

     

    Notes: the template has a critical threshold of 90. If you want to change the threshold, you also need to change the argument passed into the script to match. This is because there is no way to pass the critical threshold variable directly into the script.

     

    Prerequisites: WinRM configured on the target server. This script must be execute in Remote Host mode. See the Solarwinds Configuring and Integrating PowerShell (PDF) reference.

     

    Here is the script contents:

    $Threshold = $args.get(0);
    $Counter = "\Process(*)\% Processor Time";
    $Data = Get-Counter $Counter;
    $Cores = (Get-WmiObject -class win32_processor -Property numberOfCores).numberOfCores;
    $CPUUsage = [math]::Round(($Data.CounterSamples | Where-Object {$_.InstanceName -eq "idle"} | Select-Object @{Name="Total";Expression={100 - $_.CookedValue / $Cores}}).Total,2);
    Write-Host "Statistic: $CPUUsage";
    If($CPUUsage -gt $Threshold) {
        $Processes = $Data.CounterSamples | Where-Object {$_.InstanceName -ne "idle" -and $_.InstanceName -ne "_Total"} | Sort-Object -Descending -Property CookedValue | Select-Object -First 10 -Property InstanceName, @{Name="Usage";Expression={[math]::Round(($_.CookedValue / $Cores),2)}};
        $Message = "<table><tr><th>Name</th><th>CPU</th></tr>";
        ForEach ($Process in $Processes) {
            $ProcName = $Process.InstanceName;
            $Usage = $Process.Usage;
            $Message = $Message + "<tr><td>$ProcName</td><td>$Usage</td></tr>";
        }
        $Message = $Message + "</table>";
        Write-Host "Message: $Message";
    }

     

    EDIT: I had planned on uploading the alert I use as well but it doesn't seem that I can do two documents in the same post... Here is a quick how-to:

     

    Add a new alert. Name it whatever (mine is High CPU Load with Top 10 Processes).

     

    In the Trigger condition, alert on Component. For the first condition select Application -> Application Name -> Is Equal To -> CPU Monitoring With Process Snapshot (Powershell)

    Add another single value comparison and make it Component -> Status -> Is Equal To -> Critical

     

    For Reset Condition choose Reset when condition is no longer true. For Time of Day choose always enabled.

     

    For trigger actions is use this for the message:

    High CPU usage on ${N=SwisEntity;M=Application.Node.Caption}.

     

    Then add an email action.

     

    Subject: High CPU usage on ${N=SwisEntity;M=Application.Node.Caption}.

    Body: CPU usage on ${N=SwisEntity;M=Application.Node.Caption} was ${N=SwisEntity;M=MultipleStatisticData.NumericData} at ${N=Alerting;M=AlertTriggerTime;F=DateTime}.

     

    Top 10 processes:
    ${N=SwisEntity;M=MultipleStatisticData.StringData}

     

    Alert details: ${N=Alerting;M=AlertDetailsUrl}

     

     

    And you're done. Now you have a CPU monitor that will alert you with a list of the top 10 processes from the instant the high load was detected.