Think about your network architecture. Maybe it's something older that needs more attention. Or perhaps you're lucky enough to have something shiny and new. In either case, the odds are very good that you have a few devices in your environment that you just can't live without. Maybe it's some kind of load balancer or application delivery controller. Maybe it's an IP Address Management (IPAM) device that was built ages ago but hasn't been updated in forever.
The truth of modern networks is that many of them rely on devices like this as a lynchpin to keep important services running. If those devices go down, so too do the services that you provide to your users. Gone are the days when a system could just be powered down until the screaming started. Users have come to rely on the complicated mix of products in their environment to perfect their workflows to do the most work possible in the shortest amount of time. So how can these problem devices be dealt with?
Know Your Enemy
First and foremost, you must know about these critical systems. You need to have some kind of monitoring system in place that can see when these devices are performing at peak efficiency or when they aren't doing so well. You need to have a solution that can look outside simple SNMP strings and give you a bigger picture. What if the hard drive in your IPAM system is about to die? What about when the network interface on your VPN concentrator suddenly stops accepting 50% of the traffic headed toward it? These are things you need to know about ASAP so they can be fixed with a minimum of effort.
Your monitoring solution should help you keep track of these devices while giving you plenty of options for alerts. A VPN concentrator isn't going to be a problem if it's offline during the workday. But if it goes down the night before the quarter reports are due, the CFO is going to be calling and need answers. Make sure you can configure your device profiles with alert patterns that give you a chance to fix things before they become problems. Also make sure that the alerts help you keep track of the individual pieces of the solution, not just the up or down status of the whole unit.
Be Ready To Replace
The irony of being stuck with these "problem children" types of devices is that they are the ones that you want to replace more than anything but can't seem to find a way to remove. So how can you advocate for the removal of something so critical?
The problem with these devices is not that the hardware itself is indispensable. It's that the service the hardware (or software) provides is critical. Services can be provided in many different ways. So long as you know what service is being provided, you can create an upgrade path to remove hardware before it gets to the "problem child" level of annoyance.
Most indispensable services and devices get that way because no one is keeping track of who is using them or how they are being used. Workflows created to accomplish a temporary goal often end up becoming a permanent fixture. It's important to keep a record of all the devices in your network and know how often they are being used. Regularly update that list to know what has been recently accessed and for how long. If the device is something that is scheduled to be replaced soon, a preemptive email about the service change will often find a few laggard users that didn't realize they were even using the device. That will help head off any calls after it has been decommissioned and retired to a junk pile.
Every network has problem devices that are critical. The trick to keeping them from becoming real problems lies less in trying to do without them and more with knowing how they are performing and who is using them. With the right solutions in place to keep a wary eye on them and a plan in place to replicate and eventually replace the services they provide, you can sleep a bit better a night knowing that your problem children will be a little less problematic.