cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post

Fiefdoms, Silos, and Plumbing

Level 11

In the world of information technology, those of us who are tasked with providing and maintaining the networks, applications, storage, and services to the rest of the organization, are increasingly under pressure to provide more accurate, or at least more granular, service level guarantees. The standard quality of service (QoS) mechanisms we have used in the past are becoming more and more inadequate to properly handle the disparate types of traffic we are seeing on the wire today. In order to continue to successfully provide services in a guaranteed, deliberate, measurable, and ultimately very accurate manner, is going to require different tools and additional focus on increasingly more all encompassing ecosystems. Simply put: our insular fiefdoms are not going to cut it in the future. So, what are we going to do about the problem? What can we do to increase our end to end visibility, tracking, and service level guarantees?

One of the first things we ought to do is make certain that we have, at the very least, implemented some baseline quality of service policies. Things like separating video, voice, regular data, high priority data, control plane data, etc., seem like the kind of thing that should be a given, but every day I am surprised by another network that has very poorly deployed what QoS they do have. Often I see video and voice in the same class, and I see no class for control plane traffic; my guess is no policing either, but that is another topic for another day. If we cannot succeed at the basics, we most certainly should not be attempting anything more grandiose until we can fix the problems of today.

I have written repeatedly on the need to break down silos in IT, to get away from the artificial construct that says one group of people only control one area of the network, and have only limited interaction with other teams. Many times, as a matter of fact, I see such deep and ingrained silos that the different departments do not actually converge, from a leadership perspective, until the CIO. This unnecessarily obfuscated the full network picture from pretty much everyone. Server teams know what they have and control, storage teams are the same, and on down the line it goes with nobody really having an overall picture of things until you get far enough into the upper management layer that the fixes become political, and die by the proverbial committee.

In order to truly succeed at providing visibility to the network, we need to merge the traditional tools, services, and methodologies we have always used, with the knowledge and tools from other teams. Things like application visibility, hooks into virtualized servers, storage monitoring, wireless and security and everything in between need to be viewed as one cohesive structure on which service guarantees may be given. We need to stop looking at individual pieces, applying policy in a vacuum, and calling it good. When we do this it is most certainly not good or good enough.

We really don’t need QoS, we need full application visibility from start to finish. Do we care about the plumbing systems we use day to day? Not really, we assume they work effectively and we do not spend a lot of time contemplating the mechanisms and methodologies of said plumbing. In the same way, nobody for whom the network is merely a transport service cares about how things happen in the inner workings of that system, they just want it to work. The core function of the network is to provide a service to a user. That service needs to work all of the time, and it needs to work as quickly as it is designed to work. It does not matter to a user who is to blame when their particular application quits working, slows down, or otherwise exhibits unpleasant and undesired tendencies, they just know that somewhere in IT, someone has fallen down on the job and abdicated one of their core responsibilities: making things work.

I would suggest that one of the things we should certainly be implementing, looking at, etc., is a monitoring solution that can not only tell us what the heck the network routers, switches, firewalls, etc., are doing at any given time, but one in which applications, their use of storage, their underlying hardware—virtual, bare metal, containers—and their performance are measured as well. Yes, I want to know what the core movers and shakers of the underlying transport infrastructure are doing, but I also want visibility into how my applications are moving over that structure, and how that data becomes quantifiable as relates to the end user experience.

If we can get to a place where this is the normal state of affairs rather than the exception, using an application framework bringing everything together, we’ll be one step closer to knowing what the heck else to fix in order to support our user base. You can’t fix what you don’t know is a problem, and if all groups are in silos, monitoring nothing but their fiefdoms, there really is not an effective way to design a holistic, network-wide solution to the quality of service challenges we face day to day. We will simply do what we have always done and deploy small solutions, in a small way, to larger problems, then spend most of our time tossing crap over the fence to another group with a “it’s not the network” thrown in as well. It’s not my fault, it must be yours. And at the end of the day, the users are just wanting to know why the plumbing isn’t working and the toilets are all backed up.

14 Comments

AppStack: It's like the Force. Binds everything together

MVP
MVP

Your suggestion is a good one.  Solutions do exist and have for a while now.  The challenge is the cost, software and hardware, complexity, and talent pool to setup and administer a full end to end solution.  Some aspects need to be ARM (Application Response Measurement) which requires it is coded to provide internal transaction start and end times and a tool to capture all that data followed by correlation.

Level 14

Well put SomeClown​...

Playing catch up is tough... Jfrazier​ is right.. The solutions exist but it will always boil down to cost, complexity and internal capabilites to make it work. The smaller the shop the more of a "vision" it really is.

As far as the blame game... We are just easy targets... End users just want it to work... We want it to work well and have the ability to quickly pinpoint where we went off the rails...

As an old friend of mine says "they want cake... they don't give a dam* how it was made"

When I read this topic I intuited a parallel between department silos and QoS.  It was so unexpected that I started examining it and drawing some conclusions.  Soon it was more than a comment, it was a "paper", and I've submitted it separately at the link below.

A parallel exists between QoS categories and departmental silos.

I read this blog post and I can't help but think of DevOps and SDN and the momentum that they bring with them. I, for one, have a love/hate relationship with silos. In my 20+ years of IT (almost exclusively IT Support & Service) I see where silos work and where they fail. Often times I have witnessed where the silos weren't parallel. From a collaboration perspective they conjoined at the bottom (the individual contributor level), they bowed out at the director/VP level, and then met up as they all reported to the CIO. Things got done, and when things broke it was easy to drive accountability from a Continuous Improvement perspective.

   I am of the belief that silos and collaboration can exist in the same space.

MVP
MVP

Amen!   And coming from an atheist, that's something!  🙂

I've worked in company's that had great cross functional working teams and ones that have bad ones.  At a prior job, I loved the interaction between server folks and the network team.  Before I started there there was a mindset of every problem being "the network".   After getting things standardized and implementing Solarwinds Orion (my first experience with the product), we were able to get to a state where we would work together to interpret the problems and come up with solutions to them.   Rollouts were always fully vetted in a change management setting and had people from the appropriate teams on them to handle all aspects of a major change or implementation of a product.   I really enjoyed working there until the corporate headquarters in another company came in and wanted to close down the datacenter I was working at AND get rid of Solarwinds in favor of their very inferior "Spectrum" solution they had up and running.  A big part of that was that they didn't really sign on to the whole "redundancy = uptime" concept.  Rather than have Orion to pinpoint a device that was out in a fully redundant environment where the outage usually didn't cause a big disruption, they preferred to spend the money on "Spectrum"  to suppress the plethora of alerts "child nodes" generated to find the node that was bad and get a tech out to replace it.   This was before Orion really had a good way of doing this.   But, as a result of this (and not wanting to move to Chicago) I moved on.

The current job I'm at is one of the worst I've seen.  Lot's of little fiefdom's and silo's where the two groups in charge of different things don't really even talk to each-other at all.   Server teams don't talk to network teams.  The times I've tried to talk to the server team, responses are quite often non-existent without getting a manger involved.    The applications programmers rarely listen to end-users and definitely don't write their applications for high-latency environments.   Some of their apps might take 5-15 minutes to load in the morning and they consider that to be acceptable.    Solutions are pushed down from on-high without any regards to whether or not they will work well, if at all.   Very frustrating to work in, luckily I don't work at our HQ so I'm out of the political line of fire, not to mention a lot of the work I have been doing is integrating, standardizing and bettering security - things that don't need me to interface with others much.   Although one of the main Network Engineers quitting earlier this year has definitely raised my frustration level by having to get into this miasma more and more.

They decide they want to save money by dismantling the MPLS network that existed with a group of sites from an acquisition, then wonder about why the VoIP quality has gone down when all we have are direct internet connections with no QoS.  Admittedly they are fast Internet connections, but even that doesn't alleviate the need for QoS.   So now I hear that a higher-up in the voice group is telling the board that we should go with VoIP in another 400 sites, and citing how well it worked.  But he's referring to how well it worked when we had MPLS before we went to DIA links.   Not sure if he's trying to hide the problems we're currently having, or just that he's in a silo of his own where he doesn't hear about it.    Crazy stuff. 

Level 17
someone has fallen down on the job and abdicated one of their core responsibilities:

May be the reason that Silo'ed IT Departments don't like the monitoring team? At least they monitor it somehow, even if it isn't what the rest of us use.

NOC Talk is Cheap if you only allow pings to your critical devices. Per the proper collaboration of Silos it's easy to make your environment sing more than Red!

From the comments, it's apparent there are great needs for improved communications within an organization's multiple teams.  How does one make that happen?

If such a directive came from on-high, and folks were taught to be good listeners--even if something incorrect or inflammatory were said by anyone--and each group truly recorded and learned what other groups believe or feel or need, there'd be a list of items to begin working on.  Each team needs to be able to present its grievances AND its recommendations--unhindered, without fear of retribution.

Then managers and ITIL people and HR and IT can take that laundry list and prioritize the major issues and misunderstandings, and begin a program towards team building and trust creation.

When the sticks all begin to fall into place, just watch the improved morale and performance and customer experiences!

Solarwinds' single pane of glass concept can be the tool that spans the silos, and if there were an early-adopter of NPM or other Orion modules from every team--sort of a train-the-trainer environment--who learns and understands and explains and shares with the team, then you've got a recipe for success.

We have long advocated that the single Pane of Glass could unify the kingdom.

MVP
MVP

Ah...but is it tempered ?

Level 11

A lot of that comes down to team-building. Everyone has their own mannerisms and eccentricities when it comes to communication, so the more you have your teams engaging and interacting with one another, the easier larger-scale projects are going to be. And you're right, a lot of that is going to fall on management's shoulders, as it's their responsibility to know their team and build those bridges of trust.

Level 13

One screen One view

Level 20

And next vxlan, NSX, and openstack are going to throw huge big new wrench in the gears!!!  Who's responsible for what is changing... the old storage team and network team and security team are all having the water muddied by virtualization of everything!

Level 21

We broke down the silos years ago and consolidated all of the teams on one toolset under one manager and it made a huge difference.

One thing to realize is a the end of the day all anybody really cares about are the applications, everything else is just a means to delivering those applications.  Because of this everybody involved needs to come together to work toward that goal. 

About the Author
"Father, Husband, Gamer, Geek" - First draft of my headstone! In all seriousness, I've been working in IT for around 20 years, but have embraced IT as a hobby for 30. It all started back in the day when my father bought me a Sinclair Spectrum 48K (the original one, with the rubber keyboard). There I tried my hand and coding, and with the help of INPUT magazine, wrote my first program! Now, a seasoned (OK, OK, veteran, I am in my 40's now after all..) IT pro, and founder/principle consultant at my own IT consultancy business, I still do the odd but of scripting, but nowadays I work exclusively with SolarWinds' products. I help my own clients, and end-users alike, get the most out of their investment in this awesome set of products! I'm a self professed IT swiss-army knife, with deep knowledge in some fields, and enough to get by in most others. I have a thirst of knowledge and never turn away from a challenge. After all, we humans are all built to learn, right? =B']