The Top 5 Network Issues You Didn’t Know You Had

(and how monitoring can solve them)

I spend a lot of time talking about the value that monitoring can bring an organization, and helping IT professionals make a compelling case for expanding or creating a monitoring environment. One of the traps I fall into is talking about the functions and features that monitoring tools provide while believing that the problems they solve are self-evident.

While this is often not true when speaking to non-technical decision makers, it can come as a surprise that it’s sometimes not obvious even to a technical audience!

So I have found it helpful to describe the problem first, so that the listener understands and buys into the fact that a challenge exists. Once that’s done, talking about solutions becomes much easier.

With that in mind, here are the top 5 issues I see in companies today, along with ways that sophisticated monitoring addresses them.

Wireless Networks

Issue #1:

Ubiquitous wireless has directly influenced the decision to embrace BYOD programs, which has in turn created an explosion of devices on the network. It’s not uncommon for a single employee to have 3, 4, or even 5 devices.

This spike in device density has put an unanticipated strain on wireless networks. In addition to the sheer load, there are issues with the type of connections, mobility, and device proximity.

The need to know how many users are on each wireless AP, how much data they are pulling, and how devices move around the office has far outstripped the built-in options that come with the equipment.

Monitoring Can Help!

Wireless monitoring solutions tell you more than when an AP is down. They can alert you when an AP is over-subscribed, or when an individual device is consuming larger-than-expected amounts of data.

In addition, sophisticated monitoring tools now include wireless heat maps – which take the feedback from client devices and generate displays showing where signal strength is best (and worst) and the movement of devices in the environment.

Capacity Planning

Issue #2

We work hard to provision systems appropriately, and to keep tabs on how that system is performing under load. But this remains a largely manual process. Even with monitoring tools in place, capacity planning—knowing how far into the future a resource (CPU, RAM, disk, bandwidth) will last given current usage patterns—is something that humans do (often with a lot of guesswork). And all too often, resources still reach capacity without anyone noticing until it is far too late.

Monitoring Can Help!

This is a math problem, pure and simple. Sophisticated monitoring tools now have the logic built-in to consider both trending and usage patterns day-by-day and week-by-week in order to come up with a more accurate estimate of when a resource will run out. With this feature in place, alerts can be triggered so that staff can act proactively to do the higher-level analysis and act accordingly.

Packet Monitoring

Issue #3

We’ve gotten very good at monitoring the bits on a network – how many bits per second in and out; the number of errored bits; the number of discarded bits. But knowing how much is only half the story. Where those bits are going and how fast they are traveling is now just as crucial. User experience is now as important as network provisioning. As the saying goes: “Slow is the new down.” In addition, knowing where those packets are going is the first step to catching data breaches before they hit the front page of your favourite Internet news site.

Monitoring Can Help!

A new breed of monitoring tools includes the ability to read data as it crosses the wire and track source, destination, and timing. Thus you can get a listing of internal systems and who they are connecting to (and how much data is being transferred) as well as whether slowness is caused by network congestion or an anaemic application server.

Intelligent Alerts

Issue #4

“Slow is the new down”, but down is still down, too! The problem is that knowing something is down gets more complicated as systems evolve. Also, it would be nice to alert when a system is on its way down, so that the problem could be addressed before it impacts users.

Monitoring Can Help!

Monitoring tools have come a long way since the days of “ping failure” notifications. Alert logic can now take into account multiple elements simultaneously such as CPU, interface, and application metrics so that alerts are incredibly specific. Alert logic also now allows for de-duplication, delay based on time or number of occurrences, and more. Finally, the increased automation built into target systems allows monitoring tools to take action and then re-test at the next cycle to see if that automatic action fixed the situation.

Automatic Dependency Mapping

Issue #5

One device going down should not create 30 tickets. But it often does. This is because testing upstream/downstream devices requires knowing which devices those are, and how each depends on the other. This is either costly in terms of processing power, difficult given complex environments, time-consuming for staff to configure and maintain, or all three.

Monitoring Can Help!

Sophisticated monitoring tools now collect topology information using devices’ built-in commands, and then use that to build automatic dependency maps. These parent-child lists can be reviewed by staff and adjusted as needed, but they represent a huge leap ahead in terms of reducing “noise” alerts. And by reducing the noise, you increase the credibility of every remaining alert so that staff responds faster and with more trust in the system.

So, what are you waiting for?

At this point, the discussion doesn’t have to spiral around whether a particular feature is meaningful or not. As long as the audience agrees that they don’t want to find out what happens when everyone piles into conference room 4, phones, pads, and laptops in tow; or when the “free” movie streaming site starts pulling data out of your drive; or when the CEO finds out that the customer site crashed because a disk filled, but had been steadily filling up for weeks.

As long as everyone agrees that those are really problems, the discussion on features more or less runs itself.

  • Good stuff Leon...

    The capacity planning portion really needs to tie into a data warehouse for long term growth analytics and reporting.

    It should include detailed data points as well as some roll-ups for various types of reports.

    Please look at my idea for a data warehouse, its uses and benefits..

  • I'll cast a vote for another seemingly obvious, but often overlooked/ignored problem that tools can help fix:  Silos.

    IT departments can become separated from each other by simple commonalities and specialization.  They can also be silo'd--or isolated--through Management error or intent.

    When the DBA's all sit in one area, the Windows SysAdmins are cubed somewhere else, the Desktop Support staff are in the basement, the Network Analysts are in their own room, and the Security Analysts sit in a special quiet room with an always-closed sound proof door, while the Apps Analysts concentrate deeply in another part of the building--you've found an environment where the IT Department doesn't know what its various teams are doing. 

    When you don't know what your peers are working on, you can't help them achieve their goals.  You don't know what your actions are doing that may be at odds with other groups' tasks.

    And the poor Help Desk, across town in another isolated building dedicated just to them, can only take incoming complaints from angry users and do their best to triage the issues and assign them to the various departments as best they can.

    The right suite of tools, like Orion, can help break through those vertical silos.  When I show the SysAdmins bandwidth utilized by their server's ports, they're mildly interested.  When I show them errors on their NIC's ports, they're starting to pay attention.  When I show them which applications are responsible for the bulk of their bandwidth (via NetFlow info) they are sitting up and analyzing that information to see if it makes sense with what they expect.

    Now I don't have Quality of Experience set up for their specific systems or users, but when I talk about what it can show, they're on board.

    Next I talk about the other Orion modules that can help set thresholds and send alerts when DB latches start taking much longer than necessary, how I/O info and timing is affecting users, how SAN performance has begun reducing available cycles for business . . .

    You get the idea.  And all of the above happens when the folks and departments comprising the IS Team are focused on their own worlds, not teaching their peers processes and technologies and what's important.  Orion can help break down those walls and start the communication flowing again.

    Rick S.

  • Automatic dependency mapping too! I like this release better all of the time. OK, I'm moving up my schedule for updating NPM.

  • Thanks Leon, great read.

    Would you please be able to give a few references where do I learn more about #5 - Automatic Dependencies Mapping

THWACK - Symbolize TM, R, and C