Showing results for 
Search instead for 
Did you mean: 
Create Post

One Company's Journey Out of Darkness, Part III: Justification of the Tools

Level 10

I've had the opportunity over the past couple of years to work with a large customer of mine on a refresh of their entire infrastructure. Network management tools were one of the last pieces to be addressed as emphasis had been on legacy hardware first and the direction for management tools had not been established. This mini-series will highlight this company's journey and the problems solved, insights gained, as well as unresolved issues that still need addressing in the future. Hopefully this help other companies or individuals going through the process. Topics will include discovery around types of tools, how they are being used, who uses them and for what purpose, their fit within the organization, and lastly what more they leave to be desired.

Blog Series

One Company's Journey Out of Darkness, Part I: What Tools Do We Have?

One Company's Journey Out of Darkness, Part II: What Tools Should We Have?

One Company's Journey Out of Darkness, Part III: Justification of the Tools

One Company's Journey Out of Darkness, Part IV: Who Should Use the Tools?

One Company's Journey Out of Darkness, Part V: Seeing the Light

One Company's Journey Out of Darkness, Part VI: Looking Forward

As organizations roll out network management software and extend that software to a number teams they begin to gain additional insights that weren't visible before. These additional insights enable the business to make better decisions, recognize more challenges and/or inefficiencies, etc.

For this customer one of the areas in which we were able to vastly improve visibility had to do with the facilities team. This manufacturing site has its own power station and water plant among other things to ensure that manufacturing isn't ever disrupted. In working on other projects with the team, it became obvious that the plant facilities team was in the dark about network maintenance issues, etc. This team would mobilize into "outage mode" whenever the network was undergoing maintenance. After spending time with this team and understanding why they had to react the way that the do, we were able to extend a specific set of tools to them that would make them aware of any outages, give them insight into when/why certain devices were offline, and provide visibility into when the network would come back online. This increased awareness of their needs, combined with additional visibility from network tools has reduced the average cost of an outage significantly as well as solved some communication challenges between various teams. We were also able to give them a dashboard that would help discern between network and application level issues.

This is a brief of example as to how we can all start to build the case for network management tools and do so in a business relevant way. Justifying these tools has to be about the business rather than simply viewing red/yellow/green or how hard a specific server is working. A diverse team can help explain the total business impact better than any single team could. For admins looking to get these tools look for some of these business impacting advantages:

Reduced Downtime

We always seem to look at this as network downtime, however as in the example above there are other downtime issues to be aware of and all of these can impact the business. Expanding the scope of network related issues can increase the perceived value of any networking tool. Faster time to resolution through the added visibility is a key contributor to reduced downtime. Tools that allow you to be proactive also have a very positive effect on downtime.


This seems rather self explanatory, however enabling helpdesk to be more self-sufficient through these tools can reduce the percentage of escalated tickets. These tickets typically carry a hefty price and also impact the escalations team to work on other issues.

Establish and Maintain Service Level Agreements

Many organization talk about SLAs and expect them from their carriers, etc. but how many are offering this to their own company? I'd argue very few do this and it is something that would benefit the organization as a whole. An organization that sees IT as an asset will typically be willing to invest more in that group. As network admins, we need to make sure we are providing value to the company. Predictable response and resolution times are a good start.

Impact on Staff

Unplanned outages are a massive drain on resources from help desk to admins to executives, everyone is on edge. These also often carry the financial impacts of overtime, consulting fees, etc. in addition to some of the intangibles like work/life balance, etc.


Very good....the is one area that you haven't touched on that is vary important...scalability.

If it doesn't scale well then your supportability goes down and your impact on staff goes up.

Level 10

Getting buy in from the business is crucial. If they see IT as a cost prohibitive investment rather than just an expense it helps resolve many inter-departmental struggles. I couldn't agree with you more about maintaining service level agreements internally. Treating internal users as if they are your customers is the best way to show the value that you're adding to the business.

Level 14

Great write up.  Must have ROI.


the other "extra" is staff time needed to make the tool provide its ROI.

Many toolsets rely on admin skill to get the best out of them, but a less skilled person will get much less ROI.

Level 10

You could definitely add in root cause analysis in there. No matter how proactive you are or how much monitoring you have, stuff is still going to break.  Of course this isn't really something the powers that be usually like to hear, but it's the cold truth. The tools not only help you in reducing downtime, and pinpointing the trouble spot.  Using all the data collecting, from alert times, to syslog collections, you can not only accurately pinpoint the root cause, but build an accurate timeline of the outage and all cascading outages as a result.  This helps you evaluate the problem and how you may want to build it out in the future (back to pro-activity). It also makes the powers that be a little happier.

My organization has had a Network Service SLA for internal clients/customers for at least the last twelve years that I've been here.  Maybe this is more the norm in a 7x24 health care environment than in other businesses?  They may have "off hours" during which maintenance can be performed?  Or may be unable or unwilling to pay O.T. for after-hours work?

On the other hand, a local public utility has a completely different sort of SLA for network outages--their network outages are to be addressed during business hours only, and the network people must coordinate with the union electricians and other union staff for a switch or router to be removed from a rack and replaced or serviced.  That SLA seems all about the employees and not about the clients.  While I'd like those kinds of demands on my time over the 7x24 on-call I'm accustomed to, I think the internal customer is not as well served in that environment as they are in my business.

Level 10

I agree with you Jfrazier‌ scalability is very key. Any product or tool has to be scalable to avoid being overwhelmed with usage growth.

The reduced downtime and the goal of 5 9's is a powerful tool.  Trying to convince the CIO to spend money to save money is not always an easy task. But the points you make on impact of the tool are handy in that conversation.


For many the panacea of 5 9's is unobtainable because of how they classify downtime or they have services hosted on single servers with no failover/high availability solutions in place.

Even then, if they count every single outage against the 5 9's and not of the service...they are likely doomed to failure.

Level 12

great wrote by sv_neal

Level 17

Excellent write up, great details.


I'm with you on the "Supportability" part. With the right tools the helpdesk wouldn't have to transfer 80% of the calls to some system admin, which would allow the system admin to make better tools so that the helpdesk can resolve even more stuff.

Level 10

Thanks for the feedback as always.  Point well taken.

Level 10


Level 10

Sometimes we are our own worst enemy or our own worst customer.

Level 10

I agree that unplanned outages can also have a huge financial impact


Within our organisation we inform everyone of planned outages. We do these outages when it least impacts other department.

But the worst part is not many read the outage notifications so then try and complain if they were affected by the outage.

Level 17

So many cheap electronic setup's with some of the facilities devices. We've run into many devices having issues with broadcast blowing out the tcp stack. Isolated little networks reside across the board for these darn things.

On top of that many folks if they are even watching the gear may only be checking a ping to the programmed IP, or using default community strings and passwords.... but hey at least they are monitoring in some way rather than none at all.

Tools and Access are the key to quick remediation and service.

Level 20

I'm using some of this info now...