6 Replies Latest reply: Feb 14, 2013 10:54 AM by agbagwell RSS

The Application Monitoring Shift

chriswahl

As we continue to strike forward into a more virtualized world, the need for end-to-end visibility of our applications has elevated from a nice to have to a can't live without. Much like the movie "Inception", we are continuing to add layer upon layer of nesting to our data centers, burying the data deeper into the software stack. No longer can you simply trace a network cable by hand to a server to help figure out an issue on a network port - the server is more often than not a virtual machine sitting on a random blade within a mesh of chassis.

 

The monitoring shift is here, but are we all ready for it? From being focused on needles pointing to CPU and memory usage, power consumption, and disk capacity, to looking at data from a more application-centric standpoint. Things like jitter, latency, response time, sessions, and cluster utilization are now driving business objectives and IT budgets. The user experience is becoming more paramount than ever, and as IT professionals, we are expected to better understand and translate the application performance to a complex network of nested and virtualized hardware.

 

I'm curious: Have you made the shift from watching hardware "speeds and feeds" to monitoring the application holistically from all its multiple tiers?

 
  • Re: The Application Monitoring Shift
    byrona

    As a cloud service provider I am finding this to be less of a "shift" and more of an "addition".  Monitoring the speeds and feeds are still necessary for us in addition to the added stuff that you mention.  Since part of our billing is based on bandwidth usage we still need to monitor that.  In a cloud environment we try to "right size" systems with regard to CPU and Memory resources so it's become even more important for us to watch those data-points on virtual systems as they are configure to run much closer to their maximums.

     

    We are still working through the complexities of the application level and user experience performance monitoring and how to map those issues to something we can tweak or fix.

    • Re: The Application Monitoring Shift
      chriswahl

      byrona - Actually, I like your spin on this. I'm definitely not suggesting that we ignore those more hardware level metrics. I do feel that they are becoming less ... "relevant", especially with virtualization constantly shifting around workloads to utilize resources more efficiently. I'm of the opinion that we'll continue to make monitoring additions (using your word there ) towards looking at the application, and at real live user experience values, and then work our way down from there only when something is found to be broken or off.

      • Re: The Application Monitoring Shift
        byrona

        It almost sounds like a complete reversal of the classic troubleshooting methods where you would begin at the Physical layer (OSI Model) and work your way up.  Maybe it's time for an update to the OSI model so that virtualization is included?

  • Re: The Application Monitoring Shift
    netlogix

    I think the way Chris said "continuing to add layer upon layer of nesting" is really the key.  In order to keep up with the layers of demand we have to build the layers of monitoring.

     

    Also, as byrona said, with the drive to do more with less we are having to push our systems to the brink of overloaded therefore our cushions are becoming smaller so we have to know of shifting loads way faster.  Also the ability to to track and predict loads for right sizing is very important and we need the metrics to know what is acceptable so we can react quickly and know where resources can be borrowed.

     

    The model of troubleshooting is to the point where you need to have a way to see all the standard metrics immediately.  The counters on each network interface the app uses should already be mapped out and alerted on, all the supporting servers/switches/firewalls CPU/RAM load, the network latency/jitter, etc should already be at your figure tips before/as you are notified of the issue.  And the users shouldn't have been the one to tell you of an issue.  You should already know when they are telling you so you can start getting details from them at that time to fill in any gaps the monitoring systems may have for this situation.

     

    I think this move is making our jobs way harder!  Each time we have the ability to do something better/faster the bar is raised even higher so management expects even more without giving us the resources to do it.

    • Re: The Application Monitoring Shift
      chriswahl

      Steve B The statement "the users shouldn't have been the one to tell you of an issue" is hitting the nail on the head. If we continue to monitor simply hardware "speeds and feeds", this will be pretty hard to do, as many user experience issues are difficult to pin down at such a wide level.

       

      Also, the requirement of management to do more with less is one I don't see going away anytime soon. Your best bet is to tie in a documented business case, with financial return, to a plan for implementing more robust monitoring. If they say no at that point, it's on them to justify the performance or application issues to the higher level executives.

  • Re: The Application Monitoring Shift
    agbagwell


    I agree with Chris, we need to be ahead of the user's experience - we should know before they do that their is a problem.  I think visualization monitoring will need to be layered in on top of our traditional monitoring as well as a user experience dashboard that has internal and external hooks - that would be sweet.