Whose Fault Is It Anyway?

Hi there! I’m michael stump, a technology consultant with a keen focus on virtualization and a strong background in systems and application monitoring. I hope to spark some discussion this month on these topics and more.


Last month, I published a post on my personal blog about the importance of end-to-end monitoring. To summarize, monitoring all of the individual pieces of a virtualization infrastructure is important, but it does not give you all of the information you need to identify and correct performance and capacity problems. Just because each individual resource is performing well doesn’t mean that the solution as a whole is functioning properly.


This is where end-to-end monitoring comes in. You’re likely familiar with all of the technical benefits of e2e monitoring. But let’s talk about the operational benefits of this type of monitoring: reducing finger-pointing.


In the old days of technology, the battle lines between server and network engineers were well-understood and never crossed. But with virtualization, it’s no longer clear where the network engineer’s job ends and the virtualization engineer’s job begins. And the storage engineer’s work is now directly involved in both network and compute. When a VM starts to exhibit trouble, the finger-pointing begins.


“I checked the SAN, it’s fine.”

“I checked the network, it’s fine.”

“I checked vSphere, it’s fine.”


Does this sound familiar? Do you run into this type of fingerpointing at work? If so, share a story with us. How did you handle the situation? Does end-to-end monitoring help this problem?

  • Breaking through the silo walls between IT departments still remains a goal for my organization.  I'm trying to temp other groups into dipping a toe into the Orion waters, to see what my team can offer the other teams. QOE might be the wedge in the door for us . . .

  • Being a virtualization engineer is a specialized skill set.  The engineers who will rise to the top are the ones whose experiences goes wide, far and deep.  Unfortunately, those people are the exception and not the rule.  There is a lot of trust given to those who manage so many resources, and as such organizations are hesitant to let the silos down too far because they believe its already too much for one person to manage.  End to end monitoring is absolutely important, and this includes the client connecting in to consume applications. 

    Virtualization engineers are often handcuffed into their single console, and held accountable for performance and health of the environment. Yet, they have no visibility past the front door.  The virtual machines do not exist without the network.  The network doesn't have a purpose without the virtual machines.  And without either of the aforementioned, the san does nothing..but spin disks.  Virtualization has special network needs, depending on destination and purpose.  Often tried-and-true network profiles get applied with no discern for performance or packet profiles and requirements, and that is an accident waiting to happen; link lights ain't right! 

    From the disk to the client outbound traffic needs to be seen.  SAN's should present volumes in cluster relatable diagrams and metrics.  Network switching and its traffic need to be visible and monitored; dropped packets, average packet sizes, switch cpu and memory, port reliability, etc.  VM's monitoring should focus on drill back down into the SANS with metrics like Logical disk i/o latency both read/write, page operations, memory swap, removal / addition, integration health.  Its a Symphony of Destruction and fun every day to manage.

    In closing:  CTO's, CIO's - Make sure your virtualization engineers at minimum have read access to syslog/snmp/cli access to both SANS and all switches in their environments.  And if the only way you can do that is Solarwinds, so be it.. They need it, other wise you can't hold them accountable for any real problems.

  • "In the old days of technology, the battle lines between server and network engineers were well-understood and never crossed. But with virtualization, it’s no longer clear where the network engineer’s job ends and the virtualization engineer’s job begins. And the storage engineer’s work is now directly involved in both network and compute. When a VM starts to exhibit trouble, the finger-pointing begins."

    It is very interesting to me how things have changed.  Virtualization has made it a necessity to end the "silo-ing" of information in order for an enterprise to succeed and move forward.  I've always hated "turf" in IT.  In 14 years of IT, I have seen many situations that could have been resolved quickly and efficiently but were not because someone would get upset if another member of the team stood on their cheese.  I am fortunate now to be in an environment where cross-training and the sharing of information is highly encouraged.  That M.O. has fostered an environment where problems get solved quickly and are not often repeated.

  • michael stump In my organization (I hope others are different emoticons_wink.png), almost not an exception, when there was a VM trouble, people approached us as "network problem". I admitted that one time it was due to the HP OpenView SNMP crashed out the switch CPU. But most of the time, it wasn't. Well, you can say it's the network problem if you consider SAN is a "network". emoticons_happy.png

    I guess when people point the finger, they have to start somewhere, => network, right?

Thwack - Symbolize TM, R, and C