cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post

Application Performance Management (APM): What's good enough?

Level 13

On the surface, application performance management (APM) is simply defined as the process of maintaining acceptable user experience with respect to any given application by "keeping applications healthy and running smoothly." The confusion comes when you factor in all the interdependencies and nuances of what constitutes an application, as well as what “good enough” is.

APM epitomizes the nature vs nurture debate. In this case, nurture is the environment, the infrastructure, and networking services, as well as composite application services. On the other hand, nature is the code level elements formed by the application’s DNA. The complexity of nature and nurture also plays a huge role in APM because one can nurture an application using a multitude of solutions, platforms, and services. Similarly, the nature of the application can be coded using a variety of programming languages, as well as runtime services. Regardless of nature or nurture, APM strives to maintain good application performance.

And therein lies the million dollar APM question: What is good performance? And similarly, what is good enough in terms of performance? Since every data center environment is unique, good can vary from organization to organization, even within the same vertical industry. The key to successful APM is to have proper baselines, trends reporting, and tracing to help ensure that Quality-of-Service (QoS) is always met without paying a premium in terms of time and resources while trying to continuously optimize an application that may be equivalent to a differential equation.

Let me know in the comment section what good looks like with respect to the applications that you’re responsible for.

13 Comments

It's funny - when I came onboard at my current shop, there was no APM of any kind in place - there were days when the ticket queues didn't fill with performance complaints, and days when they were filled to bursting. We simply had. no. visibility.

So probably the first 45 days or so were establishing baselines - not only for day-to-day 'what can we expect' kind of stuff, but the increased load on a few key apps at month-end.

The last four years have been a continuous expansion and refinement of not only our monitoring, but of the applications' performance levels themselves - since by exposure and measurement of apps and supporting services, we exposed performance problems and bottlenecks.

And you're right on re nature vs nurture, too. Some of the exposure we obtained showed us basic constraints in a bunch of our (mostly legacy and third-party) apps that are not surmountable at this time. This can be a challenging mental place to be in as a monitoring and measurement person, since 'most' of the time we expose/show a performance or operational limit/problem/constraint our next step is RESOLUTION. Sometimes given the business need or objective, there isn't one that makes APM numbers look any rosier.

We recalculate baselines when app issues are exposed and resolved when possible. For the problem-child apps described above, we've at least defined 'normal' - and what kind of performance our business users are going to see. It doesn't make anyone any happier, but sometimes that's the deal - and at least we know what to expect (and pay close attention when those baselines are exceeded).

I like to imagine that evolution of applications (and their management tools) results ultimately in fewer products and fewer vendors, each one doing more things well.  The evolution process eliminates antiquated, obsolete, poorly-performing solutions.  This "survival of the fittest" view should be universally supported by management and staff and customers, all who want greater value for their dollar, simpler systems to deploy and manage, and fewer complaints requiring triaging and action.

But I'm always disappointed as observation shows this evolution isn't happening. Rather than seeing fewer solutions (each having become more reliable, more affordable, and more powerful) competing for top market share, there are more and more folks trying to reinvent or improve the wheel.  Instead of seeing simpler deployments that require less oversight and management, things continually become more complex, requiring greater specialization by support teams and increased training and understanding by users.

APM ought to have boiled down to two or three excellent products and strategies and processes by now, and that's not the case.  Organizations are easily swamped with "opportunities" from vendors of APM solutions, and can become bogged down with too many options, too many in-house or purchased tools.  This can happen through changes in staff and management--someone is unfamiliar with the current product, but they know and like a different product.  They buy the other product and leave the original one to languish.  Then that employee moves on and new folks may select another parallel or duplicate product to use, based on their ignorance of the company's internal pre-existing suites of tools.  Which is the result of lack of leadership and lack of training dollars and lack of time.

When organizations do not allocate budget and time to bring existing and new-incoming people up to speed with the solutions the company already owns, employees become disenfranchised by being expected to be productive without being given what they need--training and great tools and time to learn them, and a test environment for them to safely make mistakes in.  Honestly, how can they be expected to reduce their new-employee-learning-curve and begin being more productive as soon as possible in this environment?

I work in the Health Care sector, and Five 9's of up time looks good.  Six 9's looks better.  More efficient use of capital dollars is "encouraged."   The same goes for staff time spent on anything.  Of course, improved customer satisfaction is expected year by year, yet training budget and time aren't as accessible as capital dollars for new hardware.  And the new hardware requires great training to properly select, design, deploy, and to create monitors that have value.  Are you seeing a vicious cycle yet?

Here, as in any industry, "The absolute best you can do is the minimum acceptable level of performance."  Management is not afraid to play the "this is a life-and-death-environment card" to give you a sense of the import of anyone's job.  Similarly, accounting types are present who insist on staying true to the budget, who continually question if the hardware or service we recommend is really necessary.  They ask if it can be obtained for lower cost by adopting a different hardware vendor or support model, while vendors like Cisco refuse to provided a level playing field for hardware pricing that's got identical discounts available to any VAR.  The circle is closed when administrators question why a service or site became unavailable as you try to accommodate the accountants' demands for spending less and getting by without training dollars and test labs.

Sum it all up:  K.I.S.S.  Make K.I.S.S. your company's mantra and keep the decision makers away from their industry peers so they aren't exposed to twenty different methods of monitoring a service, and don't let each demand their own proprietary solution, which will balloon the budget unnecessarily and require far more support than adopting a single platform and a single suite of monitoring tools.

There will remain room for some best of breed apps and monitors, but they must be vetted with a very critical eye, since adopting them means more one-offs, more support costs, more training costs, ad infinitum.

Up / Available / Fast / Affordable is "good".  Down / Unavailable / Slow / Expensive is "bad".  Do what it takes to make all things  "good."

Of course!

MVP
MVP

maybe the question is what is normal, or what is expected.

Expectation is usually several factors removed from what is possible, let alone what is normal

But for sure, baselines demonstrate what is normal.

Identifying bottlenecks allows us to progressively improve the what is possible, which allows the normal to improve.

Repeat as necessary

MVP
MVP

Nice article , I totally agree with you on the following points -> "The key to successful APM is to have proper baselines, trends reporting, and tracing to help ensure that Quality-of-Service (QoS) is always met without paying a premium in terms of time and resources while trying to continuously optimize an application that may be equivalent to a differential equation."

Baseline thresholds play an important role for business critical systems, we were always questioned stating why can't we change the baseline on node basis in the past when we use to set a generic threshold at alert level for all monitors in our environment, then we went ahead with custom attributes where we use a defined baseline for specify set of nodes (application threshold alerting was based on node custom attributes), post which we started using baseline threshold mechanism which was introduced by Solarwinds. In wonderful to see orion suite evolving in such a major pace, probably there would be a day where data analytics would work in conjunction with APM and APM would be intelligent enough to set a proper baseline automatically within itself periodically (without any human intervention) based on historical and futuristic trend data (data analytics)  

Level 20

SAM seems to do a pretty good job at this... I went for years without APM which later became SAM... now it's hard to imagine getting along without it!

Exactly my thought....'how did we figure lots of stuff out without this again?'

Level 21

Application Performance can be very subjective.  Much like others have already mentioned, we use baselines to determine if the performance is good enough.  When a client tells us an application is "slow" we compare the performance at the time it was slow to the performance at times it wasn't slow and see what the delta is and we use Orion for this.  Often times I find that a users perception of an application doesn't correlate with the actual performance of the application based on data we have.

Level 15

I think  with byrona​ 

To certain degree APM is tied to Expectation. What is the expectation on performance? Because the reality is that fast and slow is all relative.

   25 years ago when my mom retired I was working for a company that was constantly disposing of hardware. I was constantly upgrading her home PC with faster and faster components. Pretty soon the apps would lightning fast! Clicking from one screen to the next was seamless. It was a thing of beauty. My mom would complain that the computer was still too slow.

ME: What do you mean your computer is too slow?

MOM: Yes. It's still too slow.

ME: How? It's almost faster than mine at this point.

MOM: It still takes over 2 minutes to boot up every time.

ME: <smh>

25 years ago home PC's took a good long time to boot up. Hardly anything could be done about that. Expectations...

Level 13

I too am with byrona

Completely agree with byrona​ user application reporting is very subjective, matching this to monitoring is very tricky.

I'm happily anticipating NPM 12.2 and its associated suite of upgraded modules coming here this fall.

That should improve APM in my shop!

Level 12

byrona​ well done for this

About the Author
Mo Bacon Mo Shakin' Mo Money Makin'! vHead Geek. Inventor. So Say SMEs. vExpert. Cisco Champion. Child please. The separation is in the preparation.