In part one of this series we looked at the pain that network variation causes. In this second and final post we’ll explore how the network begins to drift and how you can regain control. 


How does the network drift?

It’s very hard to provide a lasting solution to a problem without knowing how the problem occurred in the first instance. Before we look at our defenses we should examine the primary causes of highly variable networks.


  • Time The number one reason for shortcuts is that it takes too long to do it the ‘right way’.
  • Budget Sure it’s an unmanaged switch. That means low maintenance, right?
  • Capacity Sometimes you run out of switch ports at the correct layer, so new stuff is connected the the wrong layer. It happens.
  • No design or standards The time, budget and capacity problems are exacerbated by a lack of designs or standards.


Let’s walk through an example scenario. You have a de-facto standard of using layer-2 access switches, and an L3 aggregation pair of chassis switches. You’ve just found out there’s a new fifth-floor office expansion happening in two weeks, with 40 new GigE ports required.


You hadn’t noticed that your aggregation switch pair is out of ports so you can’t easily add a new access-switch. You try valiantly to defend your design standards, but you don’t yet have a design for an expanded aggregation-layer, you have no budget for new chassis and you’re out of time. 


So, you reluctantly daisy chain a single switch off an existing L2 access switch using a single 1Gbps uplink. You don’t need redundancy it’s only temporary. Skip forward a few months, you’ve moved onto the next crisis and you’re getting complaints of the dreaded ‘slow internet’ from the users on the fifth floor. Erm..


The defense against drift

Your first defense is knowing this situation will arise. It’s inevitable. Don’t waste your time trying to eliminate variation, your primary role is to manage the variation and limit the drift. Basic capacity planning can be really helpful in this regard.


Another solution is to use ‘generations’ of designs. The network is in constant flux but you can control it by trying to migrate from one standard design to the next. You can use naming schemes to distinguish between the different architectures, and use t-shirt sizes for different sized sites: S, M, L, XL. 


At any given time, you would ideally have two architectures in place, legacy and next-gen. Of course the ultimate challenge is to age-out old designs, but capacity and end-of-life drivers can help you build the business case to justify the next gen design.


But how do you regain control of that beast you created on the fifth floor? It’s useful to have documentation of negative user feedback, but if you can map and measure the performance this network showing that impact, then you’ve got a really solid business case.


A report from a network performance tool showing loss, latency and user pain, coupled with a solid network design makes for a solid argument and strong justification for an upgrade investment.