Showing results for 
Search instead for 
Did you mean: 

Helping customers to solve their problems first

Level 9

One of the good parts of my job consists in helping customers benefit from implementing virtualization management solutions. In most cases they start looking for solutions only after they experience problems. Slow VMs, slow applications, you know….

Then what? Fire in the house. The first person to call is the “guy who manages” the whole system – my client. Complains of slow applications or network performance. It’s of course already too late to fix the problem. Especially if there is no virtualization management suite which would allow to not only monitor, but also predict with “what if scenarios”.

Wouldn't it be better to fix the problems before they occurs? Because the slow VMs performance do have source somewhere and it would be nice if you could be alerted that, for example, “In one month time you’ll run out of space in your datastore”, or “In 2 months the capacity of your 2 brand new ESXi hosts will not be sufficient” because you’re creating 5 new VMs per day…

Usually the admins have to firefight two things (but usually much more):

  • Internal problems on applications they run on their infrastructure.
  • Problems related to VM sprawl where the virtualization system becomes inefficient and the only solution would be to throw in more hardware.

One of the first functions of every virtualization management suite is the capability to get the « full picture », to see, for example, that half of the client’s VMs are performing poorly because they’re over (or under) provisioned with CPU or memory, that each second VM is running with snapshots.

Other problems can be arising from bad performance of existing hardware, which is often out of warranty and too old to do the job that is required.

Now what? Buy first and optimize later? No, that’s not what I would usually advise. The optimization phase would consist in helping the client solve their problem first and then give them advice to implement a solid virtualization management solution. The client thanks you not only for your help, which saved his bacon, but also for the advice you gave him to save him from future problems.

Level 9

I see that many times when systems are designed and deployed, it is about how the system will function, who interacts with it (users, accounts, permissions, etc.) and then when - not IF, but when - a problem occurs with the system(s), is when we get to the firefight mode. If we consider system monitoring during the design and deployment phase, many times the firefight could be held off, or at least lessened.

The system I'm in charge of is smaller in comparison to most network systems being monitored by users on Thwack!, but the principal is still the same. The design and deployment consisted of getting the system up and running, getting users trained, access granted, then deal with how to effectively monitor the system. If we at least included monitoring the system from the start, many headaches could have been prevented and probably mistakes mitigated also.

I definitely agree.  Proper monitoring is often an after thought and usually after the fire.

Level 12

The client thanks you not only for your help, which saved his bacon, but also for the advice you gave him to save him from future problems.

Interesting topic vladan. Yes the client will thank you because you identified the "Fire in house" before it started at all.....

Level 10

Mmmm bacon. Sorry it's lunch time and that's all I can think about right now.

Level 12


Level 10

I like the Fire Fighter reference. Here we refer to ourselves as Smoke Jumpers since we are constantly moving from one hot spot to another.

Level 11

It better thing to avoid fire...

Level 9

Avoiding "fires" in the first place is definitely key. If you're able to predict when a fire might start, you can be well ahead of the game.

Level 10

Much as in real life planning for fire season can not control or contain all of possibilities of failure.

Level 10

I completely agree in that monitoring should be a part of the design phase.  Also, things to consider are the stategic plan of the company and the direction that operations is wanting to go.  As an engineer, the only way to design a robust network is to know where we want to be in "x" number of years and plan out to that date knowing there will be advancements in technology, but we are buying the technology available on the budget given and monitoring the network based upon those requirements.  Once the engineers have planned the network and estimated the costs then it is time to go back to operations and let them know if the budget is going to fall short based upon the plan or not... so often we live in a world of silos and it just doesn't make for an effective way of doing business.  Although I have digressed significantly from the original question, let me pull this all together - in order to effectively monitor the network, engineering and operations must both know the "destination".  Yes, things change on a day to day basis, but with a bit of historical review and statistical analysis, we can probably project future needs with a decent amount of accuracy and minimize the number of fires went are containing / extinguishing.

Level 12

everyone always forgets about monitoring its more of an after thought - but it can so help before the fire starts , during the fire and after the fire is put out ...

Level 10

Much as in real life, it's a good idea to take all your logs & keep them somewhere secure!

Level 9

Level 8

Thumbs up guy...

Level 12

Working without virtualization monitoring simply results in being reactive instead of having the opportunity to be proactive.

Level 9

Without proactive monitoring you are like blindfolded. Yep. More you anticipate, more peace you have at the end ... Users happy!

I'm sure that most of the software developpers and vendors thinks like this as well as it's another feature which adds value to a software product.

Thanks all for the replies!

Level 18

Part of the challenge is that when the VM admins build out the virtual servers, they don't spec out additional resources for monitoring....that has to be part of the equation.

Level 13

A problem I run into too often is that after assisting with and troubleshooting the problem and advising the customer on next steps/future planning they do not follow through.  This then leads to a repeat of the issue which is now a network issue since I "did something" to fix it last time the issue occurred.

Level 7
Two thumbs up!
Level 21

While I certainly think Monitoring is a critical piece of managing an infrastructure and helps provide visibility into what is going on allowing you to stop potential issues before they begin, I stop just shy of calling it "Proactive".  The reason for this is that when your monitoring system does detect a problem it's after the problem has occurred; customers and/or users always can and usually will detect the problem before the monitoring system does.  If I go as far as calling the monitoring service "Proactive", then management is upset when customers call about a problem before it is detected by the monitoring system.

On a more personal level, I certainly think it's a proactive step (and absolutely necessary) to add monitoring to your infrastructure.  I also think it's important for customers to understand that a monitoring system itself also requires on-going management an tuning if they expect to get useful information out of it.

Level 9

Somehow we wont going thru some of the possible failures when we are actually avoiding them in the first place.

It always better to fix the problems before they occurs.

"Prevention is better than Cure"

Level 14

I dont need to add to the monitoring inclusion, as this is accepted as a fundamental.

What I would say is there is a distinction between using your monitoring tools for firefighting and being pro-active. We see a lot of our customers using Orion in very much the firefighting role, with a limited pro-active utilisation. For me this stems from knowledge and time, where having someone able to use the solutions to report in an efficient manner and then having the time to work with report data and the interactive GUI's to identify issues before they impact the health and performance of the IT infrastructure fed services. Spending the time on the pro-active work invariably leads to time saved in the future, let alone service affecting issues.

Level 10

Indeed, the first thing to do when a client call to complain of poor performance of servers and applications in a virtual environment is to first help the customer resolve what ever issue they are experiencing then propose a better solution and advise on how future recurrence of same problem. That way your client will see you as a trusted adviser rather than one who is shying away from problem.

Level 7

Agreed.  The monitoring is a tool to help be proactive but unless someone is taking the time to do trend analysis and look for issues the customer is still going to see the problem first.  We refer to it as a tool and try not to overly use the word proactive because in all honesty we don't have the resources to sit someone down and just perform trend analysis.

Level 10

Very common sense, I agree.  Infrastructure Optimization strategy is often no even used any more.  Identifying the manner in which your environment is reactive and identify the steps to get to a proactive environment.  We too many times forget that the client just needs to print. 

Level 10

Virtualization is emulation.  This is forgotten and all so important.  So often if the hypervisor upgrade is followed up with a step 2 most problems could be avoided.  Step 2 is the IC.  Even auditor engineers at HP miss this.  If you run Hyper-V put your seatbelt on and start checking the damn versions.  The mismatch eliminations will stabilize your environment.  Then you can go home early and everyone likes you.  OMG and Cisco UCS with the price tag on that thing you would think to check the status page.

Level 15

Thanks for sharing.  Good discussion!

Level 12

nice job to help other people

About the Author
Virtualization blogger and IT engineer, living at Reunion Island (fr). Trying to help others with their journey to all virtual...