A recent survey commissioned by Avaya reveals that network vulnerabilities are causing more business impacts than most realize, resulting in revenue and job loss.
- 80% percent of companies lose revenue when the network goes down; on average, companies lost $140,000 USD as a result of network outages
- 1 in 5 companies fired an IT employee as a result of network downtime
- 82% of those surveyed experienced some type of network downtime caused by IT personnel making errors when configuring changes to the core of the network
- In fact, the survey found that one-fifth of all network downtime in 2013 was caused by core errors
Cases of Device Misconfigurations Leading to Network Downtime
Real-world scenario 1: Company Websites Down, Reason Unknown
Soon after a software giant had a big advertising campaign with major incoming Web traffic expected, their websites went down. Unable to pinpoint the actual cause of downtime to being a configuration change made earlier, the websites remained unreachable for a few hours. Taking time to identify the issue and re-establish connectivity, the organization suffered huge losses in revenue from the millions of dollars spent on the promotional campaign.
Troubleshooting: With the current network situation, all thoughts pointed to a core router failure or a DoS attack. On checking and confirming all critical devices to be ‘Up’, the next assumption was that the network was the victim of a DoS attack. But again, seeing no traffic flood on the network the root cause had to be something else. After hours of troubleshooting and individually checking core and edge device configurations, it was later found that the WAN router had a wrong configuration. The admin who made the configuration change, instead of blocking access to a specific internal IP subnet on port 80, ended up blocking port 80 for a wider subnet that also included the public Web servers. This completely cut off Webserver connectivity to inbound traffic —a typo that cost the company millions!
Real-world scenario 2: Poor VoIP Performance, Hours of Deployment Efforts Wasted
A large trading company uses voice and video for inter-branch and customer communication. To prioritize voice and video traffic and ensure quality at all times, QoS policies are configured across all edge devices over a weekend. However, following the change, the VoIP application begins to experience very poor performance.
Troubleshooting: QoS monitoring suggests that VoIP and video has been allocated lesser priority than required. Instead of marking VoIP traffic to EF (Expedited Forwarding) priority, the administrator ended up marking VoIP packets to DF (Default class) resulting in the poor performance of VoIP and video traffic. Correcting the VoIP traffic setting to EF on all edge devices meant many more hours of poor performance and loss of business.
The network downtime in the above two cases could have been avoided via simple change notification and approval systems.
In the first case, notifying other stakeholders about the change would have helped correlate and identify the recent change as a possible cause of the issue. Troubleshooting would have been faster and normalcy restored by quickly rolling back the erroneous change.
In the second case, a huge change involving critical edge devices should have gone through an approval process. Having the configuration approved by a senior administrator before deployment can help identify and prevent errors that can bring the network down.
Both cases reflect poorly on the administrators. Bringing down the network was clearly not intentional!
Human errors are expected to occur in daily network administration. However, considering the impact a bad change can have on both the company and the person, it’s imperative that there are NCCM processes put in place. To reduce human errors and network downtime, use a tool that supports NCCM processes such as change notification and approvals.
Check out this paper for more tips on reducing configuration errors in your network.