Troubleshooting Issues – Administrators Play Network Gumshoe

Network Gumshoe.pngNetwork admins definitely play the role of Network Gumshoe. Dealing with daily network issues like bandwidth hogs, IP conflicts, rogue users, and moreadministrators spend a considerable amount of time investigating and resolving network issues. But are they really equipped for this kind of troubleshooting? Is there a specific troubleshooting process involved in finding problematic users/devices while ensuring minimal downtime?

In a network, employees come in with devices pre-configured with IP addresses from prior Internet connections (home or elsewhere). This could result in an IP conflict with a critical application server that could cause an interruption of services. In other cases, IP conflicts happen when a network admin accidently assigns a duplicate IP address, or a rogue DHCP server operating in the network hands out IP addresses at will. Bandwidth issues creep up in the presence of a YouTube hog, or when someone misuses company resources for unofficial purposes. Finally, rogue users who’ve somehow gained entry to the network may attempt to access confidential data or restricted networks. All these frequently occurring incidents threaten to upset any smoothly functioning network.

In any case, the primary goal of a network admin is to fix an issue with minimal downtime and take steps to ensure that it doesn’t happen again. For issues associated with problematic users/devices in a network, here are four simple steps to follow when troubleshooting:

  • Quickly identify and investigate the problematic user/device.
  • Locate the problematic user/device.
  • Immediately remediate the problematic user/device.
  • Take steps to prevent the same situation from happening again.

Network Trouble Shooting Process.png

  1. To quickly detect problems in the network, it’s best to have a monitoring tool in place. Depending on which specific area of the network needs monitoring, admins can set up timely alerts and notifications. Specific monitoring tools are available to help, including those that let you see the up/down status of your devices, IP address space, user/device presence in the network, etc. Once the bandwidth hog, IP conflict, or rogue DHCP is identified, the first step of the troubleshooting process is complete.
  2. The next critical step is determining whether the user/device in question actually caused the problem. You need to look at detailed data that reveals the amount of bandwidth used, who used it, and for what application. You should also look at details on devices in an IP conflict and determine what type of conflict it was, look for the presence of unauthorized devices in the network, and so on. This investigation should also provide data on the location of the user/device in the network, including details like switch port information, or the Wireless Access Point (WAP), if it’s a wireless device.
  3. The third step is remediation. Whatever caused the network interruption needs to be fixed. Knowing the location of the problemas mentioned in the previous stepit’s very helpful in taking immediate steps. Admins can both physically locate the device and unplug network access, or they can use tools that enable the remote shutdown of devices. The remote facility especially helpful for admins working with networks spread over large areas or multiple locations. The critical point here is that network access needs to be revoked immediately.
  4. Finally, take steps to prevent the same problem from happening again. If it’s the case of a problematic user/device, make sure you block or notify entry of these systems into the network. Create check points and monitoring mechanisms so that you can take proactive measures and prevent unauthorized users from entering your network.

What troubleshooting processes do you follow in your organization? Feel free to share your experiences, which fellow network admins might find useful.

  • We pretty much follow the steps that you outlined.  We work diligently to eliminate tribal knowledge and to ensure multiple people are part of projects and that projects are communicated between team members and departments.  Fortunately, most of the tech team rely on NPM and the NOC views that have been setup to help with the big items.  Unfortunately, we still the old spreadsheet and like goodzhere we have granular VLANs.  Evaluating IPAM to help remedy the spreadsheet.

  • We are working to implement IPAM.  We do surrently have some things in place now.  Port Security (sticky) is set on all ports to ensure only one specific device is plugged into a specific port.  We have a home-grown system to track all devices and IP addresses.  We also use spreadsheets (I know...this really sucks) to track used/available IPs.  We also have VLANs setup for specific purposes that are very granular.

  • IPAM from Solarwinds is the goto tool we have been going to first when suspecting an IPConflict or suspecting a DNS issue, or even reserving an address in DHCP. Another favorite among our Solarwinds users where I work is looking at the IPs historical information to see what may have used the IP address previously. Many network issues can be prevented with proper design, such as putting critical servers on different subnets or at least excluding servers, printers, and other critical device IPs from DHCP scopes.

  • with a small team, individuals have personal preferences for the initial steps taken or their own goto places they are comfortable with to start the trail at.

    Some folk see the alert and dive straight into looking at device console screens, while others just use Solarwinds and only at the end look at the device.

  • Having the right tools is vital and allows the proper cultivation of monitoring, condition setting and alerting. Without the proper alerting you may be left to just review your logs and traps.

    I find it easy to miss a visual notification/flashing red if I have my nose already investigation or researching another situation. This is where the alerts come in, even for lesser events that you may just want to catalog and review or reference later.

Thwack - Symbolize TM, R, and C