The Cost of Monitoring - with the wrong tool (part 1 of 2)

or “It’s not just the cost of buying the puppy, it’s the cost of feeding the puppy that adds up”

This is a story I've told out loud many times, on conference calls, and around water coolers. But I've never written it down fully until now. This is the story of how using the wrong tool for the job can cost a company so much money it boggles the mind. It's a story I've witnessed more than once in my career, and heard anecdotally from colleagues over a dozen times.

Before I go into the details, I want to offer my thoughts on how companies get into this situation.

The discipline of monitoring has existed since that first server came online and someone wanted to know if it was "still up." And sophisticated tools to perform monitoring have been around for over two decades, often being implemented in a "for the first time" manner at most companies. Some of it has to do with inexperience. For example, either the monitoring team is young/new and hasn't experienced monitoring at other companies, or the company itself is new and has just grown to the point where it needs it. Or there's been sufficient turn-over, such that the people on the job now are so removed from those that implemented the previous system, that for all intents and purposes the solution or situation at hand is effectively "new."

In those cases, organizations end up buying the wrong tool because they simply don't have the experience to know what the right one is. Or more to the point…the right ONES. Because monitoring in all but the smallest organizations is a heterogeneous affair. There is no one-stop shop, no one-size-fits-all solution.

But that's only part of it. In many cases, the cost of monitoring has bloated beyond all reason due to the effect known as "a dollar auction". Simply put, the barrier to using better tools is the unwillingness to walk away from all the money sunk into purchasing, deploying, developing, and maintaining the first.

And that leads me back to my story. A company hired me to improve their monitoring. Five years earlier, they had invested in a monitoring solution from one of the "big three" solution providers. Implementing that solution took 18 months and 5 contractors (at a cost of $1 million in contractor costs, plus $1.5million for the actual software and hardware). After that, a team of 9 employees supported the solutionsetting up new monitors and alerts, installing patches, and just keeping the toolset up and running. Aside from the staff cost, the company paid about $800,000 a year in maintenance.

With this solution they were able to monitor most of the 6,000 servers in the environmenta blend of windows, Unix, Linux, and AS400 systems; and they could perform up/down (ping) monitoring for the 4,000 network devices. But they encountered serious limitations monitoring network hardware, non-routable interfaces, and other elements.

Meanwhile, the server and application monitoring inventorythe actual monitors, reports, triggers, and scriptsshowed signs of extreme "bloat." They had over 7,000 individual monitoring situations, and around 3,000 alert triggers.

This was the first company where the monitoring and network teams weren't practically best friends and even the server monitoring was showing signs of strain. Some applications weren't well-monitored either because the team was unfamiliar with it, or because the tool couldn't get the data needed.

Part of the problem, as I mentioned earlier, was that the company had invested a lot in the tool, and wanted to "get their money's worth." So they attempted to implement it everywhere, even in situations where it was less than optimal. Because it was shoehorned into awkward situations, the monitoring team spent inordinate amounts of time not only making it fit, but keeping it from breaking.

KEY IDEA: You don't get your money's worth out of an expensive tool by putting it into as many places as you can, thereby making it more expensive. You get your money's worth by using each tool in a way that maximizes the things it does well and avoids the things it does not do well.

NOTE: This is a continuation of my Cost of Monitoring series. The first installment can be found here: The Cost of (not) Monitoring

Stay tuned for part 2, which I will post on January 20th to see how we resolved this situation.

edit LJA20150116: forgot to include the link to explain a dollar auction.

Anonymous
  • You bring up a good point - sometimes switching tools simply isn't an option. And in those cases, you do what all IT Pros do - you MAKE IT WORK. It's great that you can and it's the right choice.

    But there comes a point when the "make it work" (along with the lack of functionality) costs more than any imagined savings. when It costs even more than the whole other tool. That was the case I made at the client above - they were spending $800,000 per year on maintenance on a tool that required 9 people to maintain. And using that tool required the management of 7,000 monitors and 3,000 alerts. And that was for 6,000 devices. Not to mention that 4,000 network devices were all-but unmanaged.

    It should be no surprise that part 2 is going to mention SolarWinds (hey, this *is* Thwack, right?).

    How much SolarWinds could you buy for $800,000? In fact, turn it around: If I told you that you needed to monitor hardware and applications for 10,000 devices, how much SolarWinds would you need and what would it cost? What would your savings be (versus $800k) in year one alone? And that doesn't take into account year 2, when the SolarWinds maintenance cost drops to 20% of list.

    You are right, sometimes you simply have to work with what you have. But sometimes the cost of using what you have (in dollars, time, features, etc) is far greater than the cost to walk away.

  • Currently I'am in the other side of the coin.

    Trying to make fit one solution because the company doesn't what to invest in in more adequate solutions. it doesnt matter if its, or it isnt the right tool, its the tool we have so make it work!...

  • Good series so far.  I have had very similar experiences.  I was with an organization that spent millions on a monitoring solution that they also tried to use for everything.  If it did not have the capability, and if there was another module for that product that performed the needed function, they bought it.  It did not matter whether the product did a good job with that function.  It was not tested against other products.  They wanted the "Network Monitoring TOOL".  The tool did some good things.  It was a more advanced networking monitoring tool.  It was very expensive.  It took a lot to maintain it.  Eventually, I was able to get them away from this mindset and move them to better tools for specific functions.  It took me about 4 or 5 years, a lot of data collection, and a lot of briefs to do so.  It is not easy to "change" in some places and when the original implementers have a lot of pull.  They don't want their "baby" messed with.

  • This has been enlightening.  After 30+ years in IT and dealing with a sizes of business.  It is funny to note how circular concepts are and how every few years the same arguments come back to the surface.  There truly is no one-shot solution for all things and intergrating the best tool is a tough sell but the rewards are worth it.  Looking forward to seeing how your situation resolves.....

  • I wasn't being ironic at all. While our goal at SolarWinds is to provide feature-rich tools, we know we can't be all things to all people nor can we provide every single function under the sun (at least not at an affordable price point).

    Any reasonably sized enterprise is going to have multiple toolsets, if only for the purposes of sanity check. I have never been at a company where there wasn't overlap between software. At that point, you want to decide which tool is your primary for a particular feature, and which is backup.