cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post

The importance of baselining - and baselining often!

Level 11

For the most part most database performance monitoring tools do a great job at real-time monitoring – by that I mean alerting us when certain counter thresholds are reached, such as Page Life Expectancy below 300 or Memory Pages per Second is too high.  Although this is definitely crucial to have setup within our environment, having hard alerts does pose a problem of its own.  How do we know that reaching a page life expectancy of 300 is a problem?   Maybe this is normal for a certain period of time such as month end processing.

This is where the baseline comes into play.  A baseline, by definition is a minimum or starting point used for comparisons.  In the database performance analysis world, it’s a snapshot or how our databases and servers are performing when not experiencing any issues for a given point of time.  We can then take these performance snapshots and use them as a starting point when troubleshooting performance issues.  For instance, take into consideration a few of the following questions…

  1. Is my database running slower now than it was last week?
  2. Has my database been impacted by the latest disk failure and RAID rebuild?
  3. Has the new SAN migration impacted my database services in any way?
  4. Has the latest configuration change/application update impacted my servers in any way?
  5. How have the addition of 20 VMs into my environment impacted my database?

With established baselines we are able to quickly see by comparison the answer to all of these questions.  But, let’s take this a step further, and use question 5 in the following scenario.

Jim is currently comparing how his database server is performing now against a baseline he had taken a few months back.  This, being after adding 20 new VMs into his environment.  He concludes, with the data to back him up, that his server is indeed running slower.  He is seeing increased read/write latency and increased CPU usage.  So is the blame really to be placed on the newly added VMs?   Well, this all depends – What if something else was currently going on that is causing the latency to increase?  Say month end processing and backups are happening now and weren't during the snapshot of the older baseline.

We can quickly see that baselines, while they are important, are really only as good as the time that you take them.  Comparing a  period of increased activity to a baseline taken during a period of normal activity is really not very useful at all.

So this week I ask you to simply tell me about how you tackle baselines.

  1. Do you take baselines at all?  How many?  How often?
  2. What counters/metrics do you collect?
  3. Do you baseline your applications during peak usage?  Low usage?  Month end?
  4. Do you rely solely on your monitoring solution for baselining?  Does it show you trending over time?
  5. Can your monitoring solution tell you, based on previous data, what is normal for this period of time in your environment?

You don’t have to stick to these questions – let's just have a conversation about baselining!

24 Comments
MVP
MVP

baselines are an extremely useful tool, but it really depends what your end goal is.

There can be a tendency to view anything outside the norm to be a critical event and the default baseline thresholds can get incredibly thin.

We had one UPS that was configured just using baselines - its battery temp thresholds were set using "Use recommended thresholds"

Battery temp was a pretty steady 27 - warning was 27.3, critical was 27.8

The question is - do you want to get alerts at 2am when temp fluctuates by half a degree?

When you know the answer to that question you can decide when to use those recommended thresholds

Level 13

1. We're unfortunately very haphazard about taking baselines. It's something we need to do regularly, and we're not. One of my "snowy August day" project ideas has been how to do this better using the tools we have.

2. Bandwidth usage, drive usage, memory, etc.

3. Varies here. Much of it is average usage, which is of limited usefulness. The 95% lines in NPM are helpful.

4. Yes, and in some cases, yes.

5. Not sure. If it can, we're not using this capability.

Level 12

Ditto on the haphazard.  The server team has a full plate and we are the ones who would have to set it up.  We have no full time DBA.

Level 14

Do you take baselines at all?  How many?  How often?  Yes, Before/After each integration period

  1. What counters/metrics do you collect?  bandwith util., latency, memory, cpu, all software/hardware versions, etc
  2. Do you baseline your applications during peak usage?  Low usage?  Month end?  Not doing apps yet
  3. Do you rely solely on your monitoring solution for baselining?  Does it show you trending over time?  for the most part
  4. Can your monitoring solution tell you, based on previous data, what is normal for this period of time in your environment?  yes, but we don't have SAM for apps yet

But to sum it up, baselining is important.  You always need a reference point to go back to or the see just how things are going in your environment.  They will show you where you need to go in the future.  They are very important to determine growth rates and sometimes to show just how bad things are.  Especially when you need supportive data to give to upper management.

MVP
MVP

Baselines while good have other uses...you need to compare your baselines over time so you can see growth.

But you will need to be able to notate changes to the environment at each baseline so you can see how/why it may change or not.

It ties heavily into capacity planning...

Level 11

Still working on our baseline. I think the hardest part is getting all the parties involved (VoIP, Security, Wireless, etc) involved so that we can settle on certain settings.

Level 9

We do some initial baselines of network configs and server performance.  We need to add additional modules to fully monitor all the servers and storage.

Level 11

For sure!   I'm a big supporter of tweaking your alerts to fit your needs.  Once we start to get too many alerts from too many systems we end up filtering these messages, and not seeing the important one when it does come in!

Level 11

Certainly they are something that are only a priority when we need them right?

Level 11

You seem to have a very diligent baseline process - kudos to you Christopher!

Level 11

Great point!  They absolutely are key in capacity planning!

Level 11

Anything that requires multiple people to agree on common ground will always be a challenge!  And I think you are on to something in your stating that it's the biggest challenge!  Thanks for the comment

Level 11

Well, you are off to a good start!

Level 17
  1. Do you take baselines at all?  Not the the full extent of what this means. We have enough data to 'figure it out' but having some clear definitions per proper working state is still in the works. Move, Shake & Change constantly! And I have nothing to do with VM's and most applications.. other devices measure those
  2. What counters/metrics do you collect? Any network related statistic I can get my hands on
  3. Do you baseline your applications during peak usage?  Low usage?  Month end? Not I... there could be someone who does. But not I.
  4. Do you rely solely on your monitoring solution for baselining?  Does it show you trending over time? Unknown, as we have too many other tools for this not to be an option.
  5. Can your monitoring solution tell you, based on previous data, what is normal for this period of time in your environment? I think most other entities can show you what it was at this time on another day, and possibly the average of last X days/months, etc.
Level 11

1.  As for baselining, the tool we use is collecting and baselining on an ongoing basis.  That said, it is just for packet throughput and not specific to DB performance.

2.  Throughput, traffic, connections, etc

3.  Constantly but within a limited set of production subnets.

4.  Yes solely on monitoring and it does show trending over time.

5.  Yes, generally when there is an issue then I'll pull up a chart that shows what the current (insert metric I am concerned with) on top and on the bottom of the graph it will show me normal data from yesterday, last week, or last month (averaged out).  Many times, this gives me a good indication where to look next.

Jim

Level 12
  • Do you take baselines at all?  How many?  How often? we are kinda slow at this we dont take them as much as we should.
  • What counters/metrics do you collect? All the network stats and metrics
  • Do you baseline your applications during peak usage?  Low usage?  Month end? our application team would do this not me.
  • Do you rely solely on your monitoring solution for baselining?  Does it show you trending over time? we have a ton of tools so i couldn't answer this.
  • Can your monitoring solution tell you, based on previous data, what is normal for this period of time in your environment? Yes we can see what is normal all week.
Level 11

1. Not on a regular basis. If I think out if I will go in and run the baseline

2. Traffic, Errors, Latency, Utilization

3. None

4. Yes, Yes

5. Yes we can

Level 11
  1. Do you take baselines at all?  Yes,  Initial installation, prior to major changes.
  2. What counters/metrics do you collect?  software versions, memory connections speeds.
  3. Do you baseline your applications during peak usage?  Low usage?  Month end? Not really doing anything with apps yet.
  4. Do you rely solely on your monitoring solution for baselining?  Does it show you trending over time?  Pretty much.
  5. Can your monitoring solution tell you, based on previous data, what is normal for this period of time in your environment?  When enough time is spent analyzing the data, mostly used during troubleshooting.
Level 9

Most people usually do baseline once and forget about it, you're absolutely right on that it should be done often. Networks and applications change over time, sometimes faster than we can tell!

Level 10

We don't baseline in maybe the "normal way", we rely on our network monitoring solution.  With historical data, we can pretty accurately get an idea of levels of use for a resource as well as get notifications if usage levels are lower or higher than normal.  I do believe having the historical data is perhaps the most important, because a user saying something is "slow" doesn't always mean the application itself is slow, there are a lot of other factors.  This data is invaluable for eliminating possibilities. 

Level 15

This is absolutely accurate!

One of the best enhancements that SW could make immediately would be recalculating ALL baselines in intervals like you currently can with interfaces.

EDIT:

Well I didn't see an existing feature request so...

https://thwack.solarwinds.com/ideas/4496

Level 11

Thanks for the great advise.

Level 10

1. I thinks use baseline is very useful because you can make a comparionsbetween different enviroment conditions and diagnose. I try to used always that I can or Remeber

2. On Servers, HardDrive Space and I/O queue, Network devices, response time

3. I think this is an Oportunitty area for me

4. This is only my unique reference point for baseline, but is enough to show the trends.

5. Obviously Orion

Level 15

This post was helpful.

About the Author
I Started in my early days (1996), while I was still at college, in the Technical Support area of the Direccion de Sistemas e Informatica of the  UANL (Universidad Autonoma de Nuevo Leon) which is one of the biggest public universities of Mexico. There, I grew from a total novice in the IT world to a Jr Network Engineer and eventually as the Engineer in charge of the management and operation of RedUANL (Thats what we called the university's network) That's where I suffered the pains and enjoyed the pleasures for the first time, of being the network manager of a big, and complex network. Currently I'm with the best Latin American Solarwinds Channel Partner, Iscor Soluciones, as the Sr Pre- and Post-Sales Engineer in charge of the Solarwinds Brand, inside Iscor. At Iscor, we've been partners with solarwinds since 2001 and growing every day with new challenges and new projects that make our day-to-day work a fun and enriching.