SNMP - It's Not a Trap!

The Simple Network Management Protocol (SNMP) has been a key part of managing network devices in the data centre for some time. It really is a pretty simple protocol to work with (hence the name), and I think it’s underrated as a key tool for monitoring unusual events. Unfortunately, SNMP has had some issues over time. One of these has been sending out a lot of information over the network in an insecure fashion. SNMP v3 was developed to address this. Another issue has been that Joe Sysadmin doesn’t always take the time to configure custom strings to use in the environment with the devices he’s trying to manage. Instead, the default “public” community string is left configured on the devices with more access than is required. This kind of behaviour drives information security folks nuts, and has operations staff questioning whether SNMP is worth the hassle.

SNMP is an extremely flexible solution that provides a robust framework with which you can leverage things like vendor-specific management information base (MIB) files. You can use these to provide both read-only and write access to networked devices. The advantage to this approach is that you can feed information into your management system that provides useful insights, rather than simply showing whether the device is up or down.

Following on from this, alerting that aligns with your devices gives you a better chance of identifying unusual issues in your environment. You could, for example, set your devices to send a trap when local user credentials are used to log in to a device rather than directory credentials. This type of activity may indicate that someone’s up to no good in your environment.

Security in your environment isn’t just about people cracking device credentials, though. It’s also about having devices available to provide the appropriate services to applications and their users. Configuring devices to send meaningful information via SNMP as issues occur can be a great way to get to minor problems before they become major issues. If one of your two firewall devices has suffered a failure, your infrastructure is compromised and you need to address the problem. I’ve seen plenty of situations where internal systems failures go unnoticed for far too long, leading to reduced performance in the environment and angst for both the operations staff and end-users. But people don’t just become unhappy with the infrastructure. They start to use workarounds to get their work done, which can involve unsafe practices such as storing unsecured corporate data in personal mailboxes or on publicly accessible file sharing sites.

A lot of people would agree that data centre operations can be a difficult thing to do well, particularly at a large scale. There always seems to be some device or another that’s run out of capacity, has a failed component, or has simply stopped doing what it’s meant to do. That’s why tools such as SNMP and syslog can help tremendously with keeping things under control in the DC. There’s a wide range of management systems available in the marketplace that can be used to do some pretty cool stuff with SNMP. Most device that can be deployed in a 19” rack can speak SNMP and syslog, so why not get as much information about what’s happening in your environment as you can? The investment in effort upfront can save you a lot of time and headaches down the trick when things invariably go awry.

  • Some vendors seem to be moving on from SNMP.  Microsoft seemed to give it half-hearted support at best.  Cisco Meraki seems to thumb its nose at SNMP.

  • I agree - it can be a pain to have everything talking sensibly. And often it feels like no matter how much effort you put in to it, some vendor will invariably turn up with some weird MIBs and send a bunch of gibberish out there. Still, I think the effort is worth it mostly emoticons_happy.png

  • I agree here.  Since traps and / or syslog are UDP there's no guarantee you will ever get that message.  If you poll you can at least hope that the next polling cycle will detect whatever was missed.  As a whole all monitoring usually ends up as secondary to other work and an "I'll get to that when I get a chance" attitude.  Proper monitoring is complicated and time consuming to set up but worth the effort.  I agree that SNMP will not be going away for a long time as nearly every device supports it.  Plus the data that can be collected, aggregated and analyzed by a single system like NPM is hugely important especially as the size of the network and the number of devices grows.

  • I don't think it's going anywhere soon, but your point about what to use is certainly valid. I think it's a combination of experience and guidance from the device vendors. Which sounds less exact than I'd like it to be.