cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post

Too Many Tools

MVP

I'll make an assumption: if you're a Thwack user and you're reading this post, you've got an interest in systems and applications monitoring. Or maybe you just really want those points. Or that Jambox. Whatever's cool with me.

But if you're a tools geek, this post is for you.

Tool suites aspire to have the ability to monitor everything in your infrastructure, from hardware, through networking and virtualization, right up to the user. But to be honest, I've never seen a single solution that is capable of monitoring the entire environment. Many times this is not a technological problem; organizational structures often encourage multiple solutions to a single problem (e.g., every team purchases a different tool to monitor their systems, when a single tool (or a subset of tools) would suffice). Other times, tool sprawl is the result of staff or contractor turnover; everyone wants to bring the tools they know well with them to a new job, right?

Tool sprawl is a real problem for IT shops. The cost alone of maintaining multiple tools can be staggering. But you also run into the problem of too many tools. You'll likely end up monitoring some systems many times over, and will most certainly miss other systems because there's confusion over who is monitoring what. As a result, you'll end up with more tools than you need, and the job still won't be done.

How do you manage or prevent tool sprawl at work? Do you lean on a single vendor, like SolarWinds, to address every monitoring need that arises? Or do you seek out best-of-breed, point solutions for each monitoring need, and accept the reality that there is no single pane of glass?

50 Comments
Level 8

We've struggled with this.  One of the challenges with monitoring is establishing the justification for a budget of the appropriate tooling.  Solarwinds provides a suite of products which have a very broad coverage for almost everything in our datacenter, however, without a scope to monitor (meaning 100% coverage by default) the pricing makes this unreachable.  So we pick and choose our battles using guerrilla tactics to justify more spending.  If I can stand up a functioning open-source solution which hits around 60-80% of the monitoring objectives for long enough, people become reliant on it and see a use in funding the last minority percentage.  This depends also on how your organization does budgeting.  For us spare human cycles of work are free, but buying a tool or more importantly increasing operational expenses are difficult.  Now I realize this is counter to your question of preventing sprawl, but it actually helps in the long run by legitimizing the expense of a single vendor solution. 

Another point to consider is that sometimes it isn't the expense that keeps you from a single pane of glass, but rather the vendors you buy from.  I inherited a storage solution which supports its own monitoring tools, but doesn't expose that same data to other vendors.  It's not Solarwinds fault they can't get the data.

Level 11

Ah, good old tool sprawl.

I've been part of a push to standardize on a single set of monitoring tools (Solarwinds) from a multitude of separate systems.  Right now we have HP OpenView, GFI Server Monitor, Solarwinds, and Zenoss.  Each monitors its own set of systems, and reports back to the appropriate team.  Unfortunately, the push toward Solarwinds as a single pane of glass has been difficult at best.  See, the HP-UX teams don't want to expose their systems to Solarwinds, because it would mean giving access to data they like to keep in-house.  Additionally, that team doesn't want to give a non HP-UX admin (me) credentials to pull data from their boxes.  Our server team purchased a SLX license for SAM, and has added most of their servers to monitoring, but hasn't tuned it to function and give the same detail that GFI does.  Because it doesn't work 'the same' they don't use it... meaning GFI is still chugging along. 

Finally we have Zenoss.  This one is odd because it was set up by an old Linux admin who doesn't work for the company anymore.  Nobody knows how to admin it, but it works to update data into their documentation wiki, so it stays running.  At some point it's probably going to die from not being maintained, and the team that uses its data will have to figure out a way to get it from somewhere else. 

So, to answer the actual question, yes, we're trying to prevent tool sprawl.  Mostly in an effort to get our helpdesk and NOC teams setup with ability to see issues as soon as they arise, without having to log into 4 different systems. 

MVP
MVP

That's interesting. It makes me think that a small number of tools, when properly selected, can manage and monitor everything. Maybe the "one tool to rule them all" idea should be buried.

You reminded me that sprawl doesn't mean having more than one tool, it just means controlling the tools you acquire and using the ones you've purchased to their greatest extent. And incidentally, storage vendors that don't expose performance data via standard methods have no place in today's infrastructure. Just sayin'.

Level 8

It's also where you define your objectives.  There are various domains to monitor: Environmental, Physical, Virtual, Asset, Storage, Network (wired & wireless), OS, Application (i.e. Exchange, SQL Server, etc.), Service (website availability/performance).  Different teams may have a stake in their particular domain, but it's all part of one rich tapestry.  The application and service domains are often the ones facing the brunt of the problems because they're the customer facing domains which are impacted by their dependent domains.  So if you define your monitoring objectives only within a single domain, you only get part of the picture.  This has as much of a technical problem as a political problem.  It's important to be able to understand and analyze the metrics at each level because they affect one another.  Additionally, it's helpful to break down the silos of IT and merge tools to make analysis more thorough, quicker.

MVP
MVP

It's tough when open source is in the mix, because it's hard to argue with a free tool that provides useful data. But tool sprawl doesn't have to mean one tool to monitor everything; using a small number of tools is a completely logical approach, assuming that each tool was selected for a specific purpose.

MVP
MVP

+1 for monitoring services, not just devices. It's like that other post in geek speak that said a customer reported a problem, but the capacity of the link was only at 20%, so there wasn't a problem. Obviously, something made that customer pick up the phone to complain. Saying that there's no problem will just make the situation worse.

I'm all for breaking down silos. In this age of hyperconvergence, IT can't survive with hypo-converged org charts.

Level 8

Also, just because you tear down an IT silo by merging your visibility doesn't mean you have to tear down your responsibility chains.  For example a DBA cares greatly about the performance of his/her hardware, but if a disk fails that's not their problem.  That would fall under their operations team to take care of.  If cross domain responsibilities are creating too much clutter (especially for management), then it is useful to use a tool which can provide multiple user interfaces.

Level 8

Open source is a tricky beast.  Each organization treats it differently.  Depending on how accounting is done human time can be money.  Additionally, commercial vendor support may or may not be a requirement.  It can be good for guerrilla tactics on a proof of concept, but it can also backlash on you when it comes to human time accounting and priority setting.

Level 12

When I first got here to this new company - tool sprawl was unreal -

Nagios , Whats up Gold , IP monitoring , Netflow , NPM , alertra , opsview - with only bits and pieces being monitored and no process around. We had about 100 devices in NPM when i started with no active alerting or reporting. We decided to consolidate and put process around monitoring and also relaying on our DC Partner and what they offer.

We have NPM , NTA , IPAM , SAM and WPM and site 24/7 for web uptime and availability ( wish WPM would do this)

Level 10

As ideal as having everything managed through one single pane of glass would be it's just not realistic. An example is you have a carpenters hammer and you have a ball peen hammer. They are both hammers yes and will both hammer a nail into wood but the ball peen hammer is designed to do metal working rather than wood working.

We use Lansweeper and Solarwinds. Both are very good software but just in the previous example each has their own function and while some of their functionality does overlap it's not enough to rely solely on one or the other. A company we're merging with uses whats up gold. I've used it before and I know for a fact that Solarwinds is better. I'm fairly certain once we're all merged and the dust settles that the whats up gold monitoring will be discontinued as there is a lot more overlap with it and Solarwinds has a lot more utility other than just monitoring network status.

Level 13

I've been on both sides of the single/multiple vendor argument and watched the "single pain (sic) of glass" dream die more often than it succeeds.  The "use a single vendor" idea works as long as nobody else allows anyone to purchase anything outside of the vendor which is impossible since no single vendor can handle everything (especially bleeding edge 3rd party products or latest versions), so people just start redefining what a monitoring tool is to management so they can get their favorite monitoring tool under some other name like "performance analysis tool" and then they secretly use it for monitoring , especially if they are under different areas of management. There's also too many vendors who provide their own tools that the administrator likes to/has to use to manage their environment that pull *all* the data they need, they trust it more than some 3rd party monitoring tool, and they have complete control of it. Never mind the admins that write monitoring scripts in the cold dark of night and quietly email themselves sacred texts. I've found that the ?easiest? way to tie it all together is to get a good event manager and send all the alerts from 3rd party monitoring products, scripts, etc. through it and let the individual owners of the monitoring applications do whatever they want on their side. Then you at least have a single place to go for events/alerts that "matter enough to tell everyone" for the admins to send them in.  The event manager then sends the alerts out (possibly to a separate notification product and ticketing). If the tool admins don't send any events/alerts in then they eventually get caught (some outage occurs) and have to answer to it...by being forced to send those "important enough" alerts.  If you need to block/manage/enhance/correlate alerts you can do it all in one place and not have to deal with the tool admins.  If the tool admins want to send themselves low level "no so important" alerts they can do so and no one really minds.

I have yet to see a nice way of handling performance data or live status/health/reporting across several monitoring tools in a single place.  I was at a place that tried a hand built solution and it was painful as there were multiple vendor databases to handle.  I haven't seen anyone yet that wants to pay the time/effort to put this together in a user-friendly fashion.

I've also seen (and done this myself) places that built their own "single place to go" by developing that piece of software/website and integrate it with all the tools they can, but that's a chore to maintain.  Usually these are "single panes of glass that contain most of what we care about at this time but not everything".  When they have dedicated developer(s) it seems to work out.

Also: It seems that vendors that offer "we can do everything" always seem to cost a LOT of money up-front, which gets paid because the business has justification to spend it at that moment in time...but years later all those incredible licensing costs start getting questioned and some new technology comes along that needs monitoring and the "we can do everything" vendor slaps a huge price tag on it (because you have been paying such a high premium on everything so far) and eventually the business/IT says "why are we paying for this?" and the tool sprawl begins...

Our Sprawl is a matter of each group wants to own their own tools.  Sometimes it is not even different tools.  For example we have three different groups who own their own instance of Tripwire.  Everybody says they want that single pane of glass, however, everyone also wants to own that single pane of glass.  One View To Rule Them All.

MVP
MVP

How did you go about replacing one tool with another? Any lessons learned that you'd care to share? Consolidating tools is never as easy as it sounds.

MVP
MVP

I hear you re: WUG. I replaced it with Orion NPM, SAM, and NCM last year at one of my customer sites.

Great point about each tool have a specific purpose. There's nothing wrong with owning and using specialized tools. But there is a problem when a single environment has multiple "enterprise" monitoring solutions. It's like having more than one watch; you're never really sure what time it is.

MVP
MVP

...eventually the business/IT says "why are we paying for this?" and the tool sprawl begins...

Exactly. A colleague of mine is wrapping up a enterprise-wide tools assessment for a mid-sized IT shop, and he's found over 80 tools in use. Because all of the purchases came from different managers and departments, there was never any sanity checking going on to determine whether there was any overlap.

The in-house portal, or dashboard, or single pane of glass can be useful, unless you end up crippling the functionality of the underlying tools.

MVP
MVP

Everybody says they want that single pane of glass, however, everyone also wants to own that single pane of glass.

Totally agree. For this reason, I'm partial to a dedicated monitoring team that has no emotional ties to the technologies or solutions to be monitored. It's a way to avoid the in-fighting over who owns the portal.

Level 7

GOOD oNE


MVP
MVP

THANK yOU

Level 19

I just got an email from a salesman stating that his product suite offers "a single dashboard of truth". Just shoot me.

Level 12

My previous job i managed a NOC out of the Philippines and we were heavy Solarwind's shop and even used loop a few times to health check and training - so coming into the new job  and tool sprawl everywhere and no solid process or ownership. I started putting a few one pagers together for my director and the CIO on how consolidation and proper controls and process could help with troubleshooting on Incident bridges , and help with availability which was tanking at that time.  We got a budget in place , got buy in from upper leadership ( key ) and a roadmap on what we wanted to accomplish in what time frame we wanted to accomplish this and by getting them down to a single pain of glass view and some investment we got a handle on our environment.


Once the leadership bought in it was smooth sailing , they saw the ROI immediately and how consolidation into one platform could impact availability for the better 

Level 12

Sometimes, lean on a single vendor. But many people have their different ideas and opinion on single vendor and best of breed to monitor their applications. If you have a large team in an organization in charge of monitoring. Almost every one has an idea of which works best for them.

Level 9

Although I agree with the single team monitoring -- there are real instances where a single team does not make sense. In my position, having our home office monitor my local pit mesh network really does no good. I get inaccurate ping times, inaccurate up/down status and our home office is annoyed that our entire site goes down (lost link between home office and our office -- not the entire network down).

Once we deployed NPM locally, we can really use the functionality to enhance and improve our network. Home office is no longer annoyed with nodes they have no control over, although I'm sure they would (at times) want to see our network stats.

MVP
MVP

You're right. Reachability problems are annoying when you're dealing with a remote monitoring solution. A single, dedicated team doesn't always make sense. It depends on the business model, too. If you've got branch offices that act as independent entities, with their own IT department and budget, centralized monitoring is probably useless. But in a campus environment, which is still distributed but kinda local, you can benefit from centralized monitoring.

Level 11

As if 'truth' ever came from salesmen...

Level 11

bsciencefiction.tv wrote:

Everybody says they want that single pane of glass, however, everyone also wants to own that single pane of glass.

That is a bit of a problem everywhere that doesn't have a dedicated monitoring team.  Admins want to own the tools that monitor their systems, which makes sense, but everyone also has different opinions as to which tool monitors best.  Even if you can standardize on a specific tool/toolset, infighting over control can definitely become an issue.

Level 10

Repeat after me "No tool is perfect".  I say this often.   When I hear Solarwinds won't do this or that! Until your monitoring solution has been defined, baselined and exceptions created, they all will suck.  This is a point that I often need to repeat.   That being said,  you can get pretty close to a single pane of glass as long as all the tools have a Web interface.  We have embedded several tools using views and maps.  Alerts can be managed via syslog or snmp traps. 

MVP
MVP

I would agree the "single pane of glass" perspective and one tool to manage everything (aka a suite) is pretty much over hyped sales speak because in practice it doesn't work.

Every shop is different in their mix of technologies and requirements.

Most of the suites are a compromise to bundle many things under one umbrella.  They do all these things but none of them well.

Makes it an easier sell to management...but not the engineer who has to work with the tool.

Computer Associates is probably the worst about doing an end run around the engineers and pitching their suite to upper management to get the sale.

Now with that said...I believe every shop needs to determine what framework they are going to operate under, be it Solarwinds, HP Openview, CA Unicenter, Patrol, etc. 

It needs to cover 80% of the monitoring and alerting needs in a scalable and maintainable fashion.  The rest needs to be handled by appropriate point solutions to fill those gaps.

In the end Tool sprawl can be concern, but if managed appropriately is not.

Level 13

Another reason for tool sprawl that I've seen many places is simply a lack of knowledge about what is already available.

Often new solutions are purchased or built when an already in-place solution can fulfill a given need because not everyone knows of the existing solution or the capabilities of the existing solution, or as mentioned in the original post lack of familiarity with the embedded solutions.

 

One such example of overlapping solutions in my environment is our Infoblox IPAM capabilities, Orion IPAM, our NAC/IDS products, and the Microsoft SCCM/SMS/WSUS/whatever products.  All of these tools give us information about what workstations are in our environment.  Meantime we have another group that has brought in yet another workstation data gathering tool to assist with our Win7 migration. 

MVP
MVP

Ah...that is why one team should do all the monitoring and alerting otherwise each team (DBA's, windows server team, unix team, network team, etc.) ends up with their own tools and monitors with a hodge podge ofthings running.  Some overlapped and other gaps where no one is watching.  There is no way to qualify what is being watched or ensure that there is any consistency in how it is watched.

MVP
MVP

It'd be awesome to automate this process, so that the provisioning of a virtual machine includes workflows to add monitoring. I mean, vCOps would pick it up, but other tools that don't interface with vCenter might miss them.

I'm on the fence about discovery-style solutions for picking up new devices. I like that NPM scans subnets each night for new devices, but in my experience no one follows up to properly handle the new devices that are discovered (or the old ones that are re-discovered). It'd be more interesting, and accurate, to just have vCAC deal with it.

Level 13

I agree that monitoring and alerting should be centralized, but keeping the tools and data restricted to that central group is what causes the other groups to procure their own solutions and duplicate the data.  Making other groups aware that data exists, and providing access to that data, may help prevent sprawl and provide more uniform data gathering.

MVP
MVP

I think it's helpful to remember that software is a solution. Before you can talk about solutions, you really should understand your problem. If the problem is that it's too expensive to have 100 tools, then by all means look at suites. But if the problem is that you don't have any good monitoring in place, then you need to define what will be monitored before you start signing up for free trials.

Level 13

Plus, with all of these monitoring things, you can get it monitoring stuff and even alerting on things BUT the real value is once you have put in the time to customize it. You’ve got the views you need bookmarked. You’ve got summaries set up for everyone. You’ve got all the account restrictions so you don’t worry about accounting finding out stuff about the core servers. Monitoring by itself is great, but it needs to be targeted at what you need unless you’re just trying to make large piles of data.

MVP
MVP

We provide customized views to the various teams....so they can see the data and have a dashboard to help them in their daily work/troubleshooting ventures.

Level 10

I completely agree with you wbrown. In most organization today, they spend too much money on duplicate tools. This is also as a result of people being familiar with different tool than others. And aslo some organization end up having multiple tools with duplicate function because of change of IT person just to show that the newly employ boss is working, therefore they will not want to have anything to do with what was put in place by the previous IT manager and some organization do it just to fatten their pockets.

Level 10

We use several tools to monitor different aspects of our networks. We currently use Solarwinds, NetMRI, PacketDesign & Splunk to monitor networks, routing & security. SatMonics and SIMs to monitor our RF networks.

Level 13

With my employer it has basically been leaning on different vendors for different things. If you need networking you go here... computers/servers are over here...Monitoring is SW...software solutions are blah. We are a windows environment, but for everything else it's a bit of a sprawl.

Level 17

michael stump ; you have just peeked into my world. Sprawl is one of those things that just is. The tool sprawl isn't the real issue.

It is the question of, who, and what, and when (at times). The idea here is that all these tools should leave no holes, but this massive tool sprawl ends up extremely confined in a lot of area's where these tools get implemented. Then someone asks, 'Why wasn't this being monitored?"

Sprawl recoils as the tool's sprawl uncoils into each new place... or ... as wichita falls, so falls wichita falls

Level 9

We had a horrible case of tool sprawl. Our network admins were spread out and all had a different preference of network tool. We combined to a single unit and determined that it was not advantageous, let alone cost effective, to maintain multiple tools that were only being partially utilized. We turned to Solarwinds to centralize our monitoring tools and so far, it has been great.

Level 10

This subject always makes me think of the Three Mile Island incident:

"As coolant flowed from the core through the pressuriser, the instruments available to reactor operators provided confusing information. There was no instrument that showed the level of coolant in the core. Instead, the operators judged the level of water in the core by the level in the pressuriser, and since it was high, they assumed that the core was properly covered with coolant. In addition, there was no clear signal that the pilot-operated relief valve was open. As a result, as alarms rang and warning lights flashed, the operators did not realize that the plant was experiencing a loss-of-coolant accident. They took a series of actions that made conditions worse by simply reducing the flow of coolant through the core."

Year two - Case Studies, Three mile island disaster, Centre of Risk for Health Care Research and Pra...

The monitoring systems available to operators there had not been designed with actual real-world usage in mind. When things started going wrong, the operators were literally trying to troubleshoot and fix issues with sparse information while lights flashed and klaxons sounded throughout the room..! 

Moderation and elegance in all things I think is the solution.   Don't display 30 monitors when 5, drillable-down monitors will do.  And when things go wrong, you should have set up your monitors to tell you what is wrong, not just what isn't working.

Level 13

Isnt it the goal of the tool vendor to force you into tool sprawl?

Level 13

Or to get their suite

Level 19

Only if their tools are not already in place. If they have tools in place their goal shifts to preventing tool sprawl.

MVP
MVP

Good point. No one should want to contribute to growing sprawl. Sprawls of all sorts are accepted as things to be avoided. I'm happy when I can consolidate my investments in tools with a single suite, to be certain. It's just that the golden ideal of buying a single suite and expecting it to monitor / manage everything is unicorn bacon.

Level 13

Well, Unicorn bacon sounds….Magically Delicious….

Level 12

I work with a lot of clients that are using several different vendors' products with quite a bit of duplication and, in some cases, I am brought in for the sole purpose of helping them consolidate or migrate (to Solarwinds of course!).

Having said that, as others have alluded to, there are instances where have some specialized tools are justifiable.  I think it is more important simply to know what you have and the capabilities that are available within it to try to limit those instances.

Level 10

My experience is that no one vendor or suite of tools does everything the way we want it. We use SolarWinds extensively as our primary EMS but I still much prefer good old MRTG for basic traffic and availability graphing. Much as I love SolarWinds, I don't care for it's graphing.

Level 11

Hmm we are using Manage engine for some servers and Orion for Prod server.We are planing to migrate allserver to orion soon ...

Level 15

Interesting discussion.  I think we all bring tools from our past lives into our environment.  The joy is combining them into a usable toolbelt with the rest of the team members. 

An interesting topic, hit in numerous locations.

One Company's Journey Out of Darkness: Part I - What tools do we have?