How are you monitoring your server performance?

I'm in process of setting up performance monitors for my windows server team. My goal is to create meaningfull alerts... that is, an alert means there is a problem and not just another CPU/Mem spike. I thought I'd share what I'm doing and see if anyone wants to do the same.

Proc-queue

Monitor sampling time = 120 sec
Queue length greater than: 20 ( this is for 1 x Xeon 5550 2.67 GHz)
Alert to trigger if moniter is critical > 5 minutes

Process monitoring

Monitor sampling time = 300 sec
ASP worker process percent CPU > 90
ASP worker process Mem (disabled, since the value is a percentage of total memory which on a server with 40G is meaningless)
Process Handler Count > 10,000 (need time to burn in on this one, 10,000 might need to increase)
Process Thread Count > 500 ( same comment as above)

(as I get more info I'll add to the list... oh... here is a link I've been using as a guide: http://technet.microsoft.com/en-us/magazine/2008.08.pulse.aspx?pr=blog#id0120047 )

Find more posts tagged with

Accepted answers

All comments

gbutler

Modified process monitoring to exclude the Tivolli process... it runs a normal 75k+ handle count. According to IBM this iss OK.

Adjusted my process monitors to report the "WMI commandline" in the "message" field. This has been helpful for Developers in pin-pointing the afflicting process.

Kiemat

I generally like to see my Processor Queue Length metric under 2, if it is over 2 for 5 minutes at a time that server might be overburdened.

I look at the following for system performance monitoring on Windows machines:

· Disk Queue Length

· Processor Queue Length

· InterruptsPersec

· Page File Usage

· PercentDPCTime

· PageReadsPersec

Some of these are good for general information on server load, for example Processor Queue Length and Page Reads per Second. Others have been usefull for determining where to look for issues, such as Percent DPC Time and Interrupts per Second. I do not alert on these, I only use them for historical reference.

BWB8771

Are you guys still around, and maybe willing to help me with similar questions?

I'm trying to set up the "canned" Exchange performance monitors, but our Exchange server is also our MS Small Biz Server 2011, so I'm not sure the (for example) the Process Queue Length should still be set at 2.

Using the canned, default thresholds, I'm getting emailed to death.

gbutler

yes...still around. Not working much with SW these days. For us, after a long struggle with WMI/SNMP and Windows we have pretty much given up and are focusing performance monitoring using MS native systems center suite, which seems to have matured of late.

The issue we have is WMI and SNMP services are not reliable services making our monitoring not reliable. If you only have a few servers WMI might be just fine.

Regarding your question. Your alert thresholds can certainly be adjusted so that the only alerts you get are when you have a problem.