Over the next few posts, I'm going to explore some newer thinking in network monitoring. We start by designing centralized management stations and remote agents, but is this sufficient? We look at things from a device and interface perspective and establish baselines of how the network should operate. This works, but is something of a one-size-fits-all solution. Where do we look when we want more than this?
A Little Bit of SDN
Over the last few years, Software Defined Networking (SDN) has generated a lot of buzz. Ask 10 different people what it's all about and you'll likely get 10 different answers, none of which is really incorrect. It's a term that has a definition that tends to exceed usability. Still, SDN maintains mind share because the components people tend to associate with it are desirable. These include, amongst others, centralized management and/or control, programmable configuration management and application-aware networking. This last one is of the most immediate interest for our current topic.
The network's performance as it relates to the key applications running on it is immediately relevant to the business. This isn't just the performance of the application across a single device. This is a look at the gestalt of the application's performance across the entire network. This allows detection of problems and performance where they matter most. Drilling down to the specific devices and interfaces that have impact can come later.
Recently, I ran across a new term, or at least it was a new term to me: Network Tomography. This describes the gathering of a network's characteristics from endpoint data. The "tomography" part of the term comes from medical technologies like Magnetic Resonance Imaging (MRI), where the internal characteristics of an object are derived from the outside. Network tomography isn't really tomography, but the term conveys the meaning fairly well. The basic idea is to detect loss or delay over a path by using active probes from the endpoints and recording the results.
Monitoring loss and delay over a path is a beginning. Most application performance issues are going to be covered by this approach. We can track performance from many locations in the network and still report the results centrally, giving us a more complete picture of how the business is using the network.
If we're going to look at network performance from the applications' view, we'll need a fingerprint of what those applications do. Many pieces of software use a single network access method, such as Hypertext Transfer Protocol (HTTP) and can be tracked easily. Others use many different data access methods and will need more complex definition. Either way, we need to monitor all of the components in order to recognize where the problems lie. Getting these details may be as simple as speaking to the vendor or more complex and requiring some packet analysis. Regardless, if we have problems with one aspect of the applications' communications but not another, the experience is still sub-par and we may not know why.
The Whisper in the Wires
We've all run into frustrated users claiming that "the network is slow" and have torn our hair out trying to find out the specifics of what that means. Ultimately, technical specifics aside, that is specifically what it means. We're just not used to looking at it that way.
What do we need to take this into practice? Smarter agents? Containers or VMs that allow us to either locally or remotely test application performance? How to we automate this? I'll be giving my own perspective over the next few weeks, but would like to hear your thoughts on it in the meantime.