Can your database run a four-minute mile?

Jogging is my exercise. I use it to tune out noise, focus on a problem at hand, avoid interruptions, and stay healthy. Recently, I was cruising at a comfortable nine-minute pace when I was passed by four elite runners, and it was like I was standing still. It got me thinking along the lines of health versus performance and how they are related. I came to the conclusion that they are related, but more like distant cousins than siblings.

I can provide you data that indicates health status: blood pressure, resting heart rate, BMI, percentage of body fat, current illnesses, etc. Given all that, tell me. Can I run a four-minute mile? That question can’t be answered with the data I provided. That’s because I’m now talking about performance versus health.

As it relates to databases, we can look at health metrics: CPU utilization, I/O stats, memory pressure, et al. However, those also can’t answer the question of how your databases and queries are performing. I’d argue that both health AND performance monitoring and analysis are important and can impact each other, but they really do answer different questions.

Health is a very mature topic and pretty much all database monitoring solutions offer this visibility. Performance is another story. I love this definition of performance from Craig Mullins as it relates to databases – the optimization of resource use to increase throughput and minimize contention, enabling the largest possible workload to be processed.” Interestingly, I believe this definition would be widely accepted, yet approaches to achieving this with monitoring tools varies widely. While I agree with this definition, I’d add “in the shortest possible time” to the end of the definition. If you agree that a time component needs to be considered with regards to database performance, now we’re talking about wait-time analysis. Here’s a white paper that goes into much more detail on this approach and why it is the correct way to think about performance.

Stop chasing red herrings and stop resolving symptoms. Get to the root cause of database performance issues using the right approach.

For more on this topic, check out my webcast recording- Database Performance on Tap. Feel free to comment below with your thoughts, questions, or ideas for my next webcast!

  • Very appropriate discussion. We often pigeon hole ourselves by looking only at performance or only at health. Both are vital to tracking the overall quality of your network and user experiences.

  • When I initially implemented Orion monitoring, I'd inherited a network full of problematic hardware with bad firmware.  When something went down, there wasn't a Dependencies part of NPM to reduce alerts, and a large flood of pages would go out.

    The flood was so large that it overwhelmed and disabled the local paging company--our Orion could do a DOS on them, resulting in reduced (or no!) pages going to emergency professionals, law enforcement, fire fighters, doctors, etc.

    Eventually I was able to replace the bad network gear with new brands and models that didn't fail, and Orion NPM was upgraded with the option to reduce alerts through creating Dependencies.

    But we migrated off of pagers in favor of alternate communications solutions, and the pager company isn't keeping up with the times--I don't know how long they'll be around.  Certainly they earned a reputation for poor reliability and bad scalability as a result of page floods creating inadvertent and unexpected Denial Of Service situations against them--and not only from my environment.

    I think it would be easy to overwhelm a target device with too much monitoring, but one who did so would have to either be mighty ignorant, or filled with malicious intent.

  • I think it depends on what your body (servers and apps) can take. As long as the impact stays low, frequent sampling can be good thing - but I agree, it's key that the impact stays low (if you have a dodgy knee, consider carefully a choice to do high impact exercises). It would be a bit ironic if a performance monitoring tool actually caused a performance issue.

    The main idea I had was to say you have to be measuring the right things in order to answer certain performance questions. I've got a more extensive version that I may post some day that outlines just what you need to satisfy all parts of the definition of performance.

    I worked with a couple of guys that did the ultramarathons (Leadville 100: http://www.leadvilleraceseries.com/leadvilletrail100run/). Just the time commitment they put in was astounding!

    Nice parallel though!

  • I think about that white paper's contents in terms of a few ultra-marathon runners I've become aware of.  These are people who routinely run races (or practice runs) of 50 to 100 miles (or more!) at a time.

    Ultramarathon - Wikipedia

    The 10 Most Ridiculously Impressive Ultra-Marathoners of All Time | Complex

    The ramifications are significant.  General statistics for ultra-marathoners show lower instances of high blood pressure and fewer cases of heart disease or stroke than the general population, but also increased problems with breathing.

    http://well.blogs.nytimes.com/2014/01/15/what-ultra-marathons-do-to-our-bodies/?_r=0

    Is there a parallel between ultra marathon running and I.T. monitoring, and the impact of intense monitoring of servers and apps?  I think so.

    Most of the successful runners are in it for the long distance, and most must run slower than a typical marathon runner.  They just run longer at that slower pace to prevent damage.

    Maybe less polling is better than more . . .  At least in extreme cases.

Thwack - Symbolize TM, R, and C