My top three NOC goals, in order of chronological importance are:
- Network Management Visibility - You cannot manage what you cannot see.
- Network Fault Management - Dead devices trump slow devices.
- Today's topic!
Goal Three - Network Performance Management
How should we interpret this utterance; "Oh man! The network is so slow today!" I think it roughly translates to "Something is making this really slow and I don't know what it is and everyone always blames the network so....". The truth is that the end user does not care at all what is causing the poor performance, they just want if fixed. The typical silo (AKA cylinders of excellence) approach is to give the issue to the network department. If they cannot find an issue it gets tossed over to the server group. If they come up empty, then the finger pointing begins. There are obvious problems with approaching network performance monitoring this way, but I think that all of the problems come from a systemic issue - the network is being viewed as a group of individual electron movers, when in fact the network is a system for completing business transactions.
Rethinking Network Systems
Network devices should not be thought of as machines, but rather as interconnected links in a business process chain. Now your network systems become hundreds of these chains, relying on their links and other process chains as well.
Here is my take on properly managing performance:
- Define your network enabled business processes and identify all the links.
- Prioritize each of the processes by the impact they have on your company's business line.
- Begin with the most critical process and create performance management groups containing the links. These groups should monitor machine level performance (CPU, memory, etc) and link utilization, synthetic traffic performance over involved links (IP SLA), and synthetic transactions mimicking a user experience.
- Set thresholds and alert triggers specific to the business process.
Now you will be managing the ability to operate your business. Instead of getting an alert that just says "SQL server 007 is at 100% CPU" you can add information about the actual impact of the business process (transaction time) and other issues in the chain.
While I have placed performance management as Goal 3 chronologically, I think it is the most important part of network management to show that you understand the business you are in and manage to business expectations.
For a quick look at the technologies I mentioned in action, see this site.