Network configuration management could be described as “hours of boredom punctuated by moments of terror.” Certainly it’s no fun to remote into a couple hundred routers or switches and manually enter config commands. And it’s absolutely terrifying to watch one of those teeny-tiny changes ripple into a full-blown network disaster! But it can happen all too easily. Unfortunately getting fingered for having fat fingers doesn't improve your career prospects either. But rest easy. This doesn't have to happen to you.
This seven-part series will explore frequently overlooked, yet proven and highly effective, network configuration practices that will help keep your network humming, users happy, and possibly make you the stuff of which career fast-track legends are made of.
Today we’ll start by talking about why network configuration errors are the leading cause of network downtime. Next week we’ll explore what needs to be done, and in the remaining posts we’ll dive into the specific practices that too many network engineers and admins overlook at their own peril. So let’s begin.
It’s a well-publicized fact that the number one cause of network failure is human error – the kind of error that results in device misconfiguration and produces 80% of network downtime. One thing this statistic makes certain is that we all need to eliminate human error.
You might be asking yourself, “What’s the big deal if the network is down for 5 minutes here and 10 minutes there?” Simply put, all of those many small outages add up to one huge expense.
If you've been working to improve network availability, then you know how difficult it can be achieve 100% network uptime. While an annual uptime of 99.9% is good, it still represents about nine hours of network downtime a year. And with downtime costs ranging from $100K to $300K per hour, this represents $900 thousand to $2.7 million a year in unnecessary expense. Not exactly packet change.
If network availability is so important, then why can’t more organizations achieve 100% uptime?
There are three primary reasons why: lack of standardization, sheer quantity and diversity of devices, and the complexity of the configuration command set.
Today’s networks are large, very complex, and may utilize thousands of network devices including firewalls, routers, switches and more. To make things even more complicated, network devices can come from a variety of vendors and each has its own unique rule set. Furthermore, many devices use a remote command line interface (CLI) where each command must be entered separately. On top of that, many devices use hundreds of complex command statements. (Did you know there are roughly 17,000 Cisco® IOS commands?) Finally, there is no end-to-end view. Each device is administered separately without any insight into how a change to a firewall can affect a down-line router or switch.
Like an intricate mosaic, your network has many tiny pieces which all must fit and work together perfectly. There is no margin for error. If everything doesn’t work just right, then your network breaks and when it does the business is also broken.
What has been your experience? Leave a comment and add to the discussion. In our next post, we’ll talk about what the ideal solution is and introduce often overlooked best practices that can help improve network availability.
In the meantime, check out this video showing how a Cisco network engineer uses SolarWinds Network Configuration Manager (NCM) to make complex network changes easily and accurately or Download and evaluate a 30-day fully functioning version of NCM today.
You can also find and read past posts in this 7-part series here