cancel
Showing results for 
Search instead for 
Did you mean: 

High CPU on a Database Server—Should You Be Concerned?

Level 10

Back when I used to be an on-call DBA, I got paged one morning for a database server having high CPU utilization. After I punched the guy who setup that alert, I brought it up in a team meeting—is this something we should even be reporting on, much less alerting on. Queries and other processes in our use CPU cycles, but frequently as a production DBA you are the mercy of some third party applications “interesting” coding decisions causing more CPU cycles than is optimal.

Some things in queries that can really hammer CPUs are:

  • Data type conversions
  • Overuse of functions—or using them in a row by row fashion
  • Fragmented indexes or file systems
  • Out of date database statistics
  • Poor use of parallelism

Most commercial databases are licensed by the core—so we are talking about money here. Also, with virtualization, we have more options around easily changing CPU configurations, but remember overallocating CPUs on a virtual machine leads to less than optimal performance.  At the same time CPUs are a server’s ultimate limiter on throughput—if your CPUs are pegged you are not going to get any more work done.

The other angle to this, is since you are paying for your databases by the CPU, you want to utilize them. So there is a happy medium of adjusting and tuning.

Do you capture CPU usage over time? What have you done to tune queries for CPU use?

27 Comments
theflyingwombat
Level 9

In the past what we have done is take readings during peak and non-peak hours to see what we can expect to be "normal" CPU levels. Using these readings you could set a custom alert value in NPM to send an alert when CPU is above the normal level for certain amount of time. The CPU can be high while not effecting performance but it can also be high and adversely effect users. If you are able to find the specific limit where it starts effecting users you can run reports to see how often you are hitting this limit and if it would be wise to start throwing more CPU cores at the server.

zackm
Level 15

Generally speaking, I am not a fan of any CPU alerts on Windows boxes (unless it's the Ultimate CPU Alert!) But especially so for databases for all of the reasons you mention here.

Guidance behind CPU alerts from my perspective to clients has been on 2 distinct tracks usually:

1) If you do not have SAM or DPA; then use NPM to trend your CPU utilization over time. Usually, this results in a trend that your median CPU load is around 95%+. This obviously is not really that helpful on the surface unfortunately because it just tells you 'what', not 'why', which leads me to #2.

2) Buy SAM and/or DPA and use them to get root-cause analysis on your CPU woes. Use the 'Ultimate CPU Alert' referenced above and see if there is actually a problem, or if you are running a bunch of wonderfully "right-sized" servers and you can brag to your boss about the cost savings you have provided when it comes to your annual review! If there is a problem, use AppInsight and/or DPA to find out what is actually causing the problem. (Of course, the ideal would be to use SAM and DPA )

tcbene
Level 11

The answer to this sounds very similar to the previous baseline discussion.  You really have to do some research with your system to determine what is normal and what is outside the norm and base the alerts off of those findings.  Otherwise you might find yourself chasing anomalies.

byrona
Level 21

We don't treat the alert as an immediate escalation, we use the high CPU alert as a step to begin triage and part of that process is to look at the CPU utilization over a several day/week/month period of time and see if the high CPU being detected is normal or not.  Often times we will see SQL servers have an extended period of high CPU at the same regular cycle indicating some type of database maintenance or report.  Either way, it always goes back to the baseline and determining what is normal for the specific system in question.

aaron.j.denning
Level 12

We don't treat these as high alerts unless there are a huge number of them going off at once this is so our DBA's don't throw a hissy fit.

syldra
Level 12

There is a clear distinction in my mind between "high CPU" and "100% CPU".

High CPU : if you run over your baseline for an extended period of time, there could be something to investigate. Maybe your load has changed and you have to update your baseline ? Maybe there is a process going wild ? Maybe some users are running jobs they don't usually run because their mandate has changed ? When you know your load and you know nothing is wrong, a steady high CPU usage only means the servers have been sized correctly.

100% CPU : 100% CPU (sustained for an extended period) has to be investigated. As the load cannot go over 100%, if you steadily hit that limit, the server could need a little upsizing. Can't know for sure unless you investigate.

Jfrazier
Level 18

100+% CPU is possible on an uncapped AIX LPAR....

syldra
Level 12

Pffff... show off...

Jfrazier
Level 18

Another thing to watch with your high cpu utilization is the processor queue.  If there are things queued up to get processor time and it banging away at 100% or close to it, then you may need more resources.

Keeping long term cpu utilization data can be troublesome especially in a virtual environment where you can add and drop vCPU's almost at a whim these days.  So the numbers become skewed because you have no way to know the number of cpu's at that point in time...so your data is useless.  Not to mention the roll-ups Solarwinds applies over time, that will also muddy the water.  Do you have a process that has an affinity for one specific cpu ?   Is it pegged and the other loping along ?  Your over all average no longer tells the general story but rather only part of the story.

The best way to approach this would be a data warehouse where all the detailed metrics are stored for long periods of time.  Then you can do that sort of base-lining and see how many cpus you had and how each one was running at various points in time.  Now you can quantify the results of your tuning with relevant data....

Please see https://thwack.solarwinds.com/ideas/2637

Jfrazier
Level 18

Sorry...just been bit by that one before..

syldra
Level 12

Haha, I can imagine the head scratching the first time you see this.

Jfrazier
Level 18

yep...once I understood how it worked it made sense.

It just messes with the heads who don't understand it.

lhoyle
Level 10

Alerting on CPU utilization no, WAN utilization, YES (at my previous job, due to a latency sensitive primary business app). We would see CPU utilization charts during the business days, and I think we all realized the high CPU was at night when maintenance and re-indexing cycles ran.

cahunt
Level 17

Not quite my area, but monitoring would be good to utilize. I would alert in pre critical situations, mainly to have someone investigate and go from there...

jdanton
Level 10

The 100%+ LPAR thing is awesome--one of the perfmon counters we use in the Windows world (as someone mentioned above) is Processor Queue Length. Use PerfMon to Diagnose Common Server Performance Problems

I like the commentary about the importance of baselining--one of the things I did at a prior job was to build a performance data warehouse which was extremely beneficial to show trends and spikes in the process.

mharvey
Level 17

We capture CPU to make sure that the server doesn't trend out at 100%, but one thing we've run into oftentimes is our DBAs will configure SQL to take almost 99.5% of the available memory on the server leaving very little for the OS and other applications to run.  However, when we've configured SQL properly and watch the server for a while we are able to better trend out the CPU usage on the server and determine for those servers what "high usage" is and we have then been able to tailor our alerting to match this for these specific servers.

jdanton
Level 10

Typically SQL Server should be only allocated 80% of the physical memory on a server, but fellow SQL Server MVP Jonathan Kehayias has a formula here: How much memory does my SQL Server actually need? - Jonathan Kehayias

mharvey
Level 17

Thanks for the link.  I'll keep this handy.  It's been a fun task trying to get the DBA's to understand that SQL doesn't need to have close to 100% allocation of physical memory.  I mean I know they want it to perform well, but that's a bit much for anything.

lfaulkner
Level 9

Collecting data used for trending to determine actionable alerts - logical path.

jbiggley
Level 14

Every admin of any decent size Solarwinds environment should be voting up your suggestion -- just saying.  While they are at it, they could vote up the publish DB schema as well.

https://thwack.solarwinds.com/ideas/4687

(Thread hijacking complete!)

patrick.mchenry
Level 11

Our database alerts are based on CPU usage over time and not really based on processes

clubjuggle
Level 13

Generally speaking we are monitoring, but not alerting on, database CPU cycles. That way we have the data available if our DBAs need it.

jim.couch
Level 11

I'm thankful our DBAs have their own monitoring to do what they will. 

leejosephr
Level 7

For me it all depends. If you database server is virtual on VMware lets say, a high CPU doesn't bother me as much unless it is affecting performance. In our environment we have around 250 database servers. I have an alert setup to send me an email when the CPU has been pegged for over an hour. I rarely see an alert for high CPU, maybe 1 - 3 a week.

To go back to my virtualization statement. If the server is virtualized it may show in Windows and the monitor that it is pegged out at 100%, but that is actually running efficiently. In a virtualized server you may only give the server 4 vCPU's but when it is making CPU requests it actually goes to the least busiest core on any of the processors on the host it sits on. This gives the server a faster CPU response time because the hyper visor can determine  the best route for that cycle to go. Now if the server is under allocated that can cause a problem as well so there is a fine balance between it, but in general I don't care that much about high CPU unless whatever the server is servicing is affected.

goodzhere
Level 14

I think you really need to do some data collection over an extended period of time.  Then you may be able to determine what is really going on.  Or at least be able to move in the right direction.

brianflynn
Level 12

My answer : This is a work harder vs smarter question.  Nobody cares how hard you work as long as the job gets done satisfactorily.


In the decade I managed SQL Servers, I found that CPU utilization is not immediately relevant to performance.  It's immediately relevant to capacity planning.  My users never cared what percentage of CPU the database was using as long as it was performing well for them.  For that, you need something that monitors the end user experience measured in wall clock time.  And if that shows trouble, CPU utilization can be a cause of it, but certainly not the only cause and sometimes users are perfectly satisfied when the CPU utilization was high.

So what did I do if I saw high CPU utilization?  I first checked to see if anyone was experiencing performance degradation.  If they were, I urgently sought a solution that improved performance regardless of what it did for CPU utilization.  If they were not experiencing performance degradation, I took a more casual approach to analyzing my capacity trend line and laying out long term options.

jkump
Level 15

Great discussion.

About the Author
Joseph D'Antoni is an Senior Architect and SQL Server MVP with over a decade of experience working in both Fortune 500 and smaller firms. He is currently Solutions Architect for SQL Server and Big Data for Anexinet in Blue Bell, PA. He is frequent speaker at major tech events, and blogger about all topics technology. He believes that no single platform is the answer to all technology problems. He holds a BS in Computer Information Systems from Louisiana Tech University and an MBA from North Carolina State University. .