The Network Is Down

The Scenario

Imagine yourself as a user.  You get in to work, login, and start preparing for the day. You open your browser to check the headlines, and bam - you get a network error.

Okay, no big deal, right? Maybe the bosses are cracking down on browsing at work and blocked the local news site. It's annoying, but livable.

People start to queue up outside; it looks like it's going to be a busy day.  You have five minutes before the flood gates let loose. You open the lifeblood of your job - some intranet site full of internal applications that are absolutely necessary to performing your job. You get a connection error.

You can feel the sudden tension in the room as each of your coworkers attempts to get to the intranet. You look out at the nascent mob of humanity, growing more restless by the moment.

Someone finally, tremulously, asks, "Can anyone connect to the Intranet?"

Error - Could not connect

This is a (over-the-top) dramatization of what users face when the network goes down, as has recently happened to a certain government department in California. If the users are lucky, IT can quickly identify and fix the problem. If everyone is unlucky, the users have to face the unhappy hordes of customers armed with whatever pre-network monitoring tools they have on hand for as long as it takes IT to fix the problem.

Frankly, I'm not sure if I'd rather face the angry customers or the frenetic bosses angered by the loss of revenue and reputation. It's a tough call.

The Network View

I'm going to now grossly over-simplify the IT side of this scenario.

In the lucky scenario, which is a relative term in these days of 99.9% uptime, you could quickly identify the problem device and get it working. If you were super lucky, you had a network monitoring tool or dedicated LAN monitoring software and were already resolving the problem when the call came in.

In the unlucky scenario, you were blind-sided by the outage and had to scramble to find the issue. You have one person on the line with your ISP trying to see if the problem is on their side. You have a couple of other people crawling through the network to find the offending device. If you are super unlucky, your network is a disorganized mass of multiple technologies and topologies you've inherited from many other network personages that you're still trying to untangle.

Notice that in the overly simplified "lucky" example, the hypothetical network team was using a network monitoring tool.

The Case for Network Monitoring

Recent, news-worthy network outages once again emphasize how important it is to monitor your network. From stock markets to governments to small businesses, network outages are painful, expensive, and have an increasingly large economic impact.

If the network team in the "unlucky" example used a monitoring tool, they'd be able to quickly see

  • which device is down
  • who is affected by the outage
  • what is affect by the outage

Network monitoring tools can do more, of course. But this short list is relevant to the scenario. You can take a look at SolarWinds' NPM if you want to find out more benefits to network monitoring.

  • On my team, they are used to looking at NPM at regular basis and are able to be proactive when things fail.  They contact the appropriate support people with the knowledge of where the problem exits, and they can contact the affected departments and inform them before the department realizes that there is a problem.  emoticons_happy.png  It is good to be in the "lucky" group.

Thwack - Symbolize TM, R, and C