How Monitoring can help to break up the Silo Mentality

The Root cause of the Silo thinking

I had to face the silo problem many times in the past. In IT many organizations work in silos. For example, the Server department only cares about the problems concerning Servers. When a Ticket with a problem

arrives often it starts a process that I call: "Ticket Ping Pong" Instead of solving the problem it is easier to forward the ticket to a different IT department and let them take care of it.

Some User Help Desks assign all tickets to the Networking Team because "It´s always the networks fault “. With that mindset people working in the UHD are in the position of proving that your system is not

responsible for causing the problem. But that isn´t the best approach in many cases. Problems could be solved quicker and more effectively if everybody would work together.

One Monitoring to rule them All

It is very common that every IT department has its own monitoring in place. Often a high specialized system that is directly coming from the hardware vendor. A shiny little box from the well trusted vendor they

have been using for ages. These systems have their benefits and are often a combination of management and monitoring. So for example for Server guys there are no problems unless they show up in

their monitoring systems. For a specific problem that is only related to one system that is working. But in the real world you are facing often more complex problems that are related to multiple systems. You need a

monitoring that covers all the complexity in your infrastructure and can handle different device types and services. The highly specialized vendor specific monitoring can coexist with that. But all the IT departments have to

break up the silos and start to work together. The general thinking should be we are all in the same boat. A monitoring project can build bridges and bring the different IT departments closer together.

The goal should be to have all systems in the environment on the same monitoring at the end of the day. That creates visibility and trust. When everybody is looking at the same monitoring they share the same knowledge of

what is going on when a ticket shows up. When the server admin sees in one pane of glass that the Firewall is running on 100% CPU utilization he knows how to address the ticket and that it is maybe a good idea to wait for a

feedback from the firewall guys.

In times of virtualization and SDN this is even more important. There are so many dependences between the different parts of the infrastructure that your initial entry point is hard to figure out. Sometimes the problem is hiding

behind the different layers of virtulization. It is a big effort to bring all the systems to centralized monitoring, but it absolutely worth the effort of doing it. At the end of the day all Software defined anything runs on hardware and

that hardware needs to be monitored.

  • I would like to but I can't post a screen shot. Private network.

    We have maps that start off two portals, one for the Network Engineers, the other for the servers and application teams. When you start off each portal we demark where one team starts and stops. We also dead end the diagrams so the alerts don't cascade all the way up to a router. So, when the Server teams maps goes red, they know it is theirs. We do allow each team to see each others portals and the demarks. Seeing the big picture is so cool for trouble shooting. A picture is worth a 1000 words, more than a 1000.

    Visual Engineering, this is the art we practice along with alert management. We have over 700 maps but our tiered teams use these maps as the O&M diagram. We put a hook on the maps to the actual Network Engineering diagram, but for most troubleshooting folks don't pull up the NE diagram. Too much information. We also lay out the maps as services, so the ticket assignment to a service is derived from the map. Works like a charm but discipline is key. Process Engineering, People following the process and discipline.

  • I have used the Logical Maps to break down silos. Taking the Network level and putting those devices into a Server groups view for reference, then add in Application Monitors from SAM you can hear the breaking of the silo walls. I've seen what took hours and days to troubleshoot be brought down to minutes.

    A much smarter Tier 1 when you go across the silos or we like to say, flip over the maps. The services are down in those maps, you just have to work them up to a quality view.

  • I think most of us are on board with this monitoring thing... I used original CiscoWorks on Sun Net Manger before HP Openview really even took off... it was terrible!

  • because they can keep things proprietary and charge more money.

Thwack - Symbolize TM, R, and C