Could SNMP Please Just Die Already?

I-heart-snmp.png

The Simple Network Management Protocol – SNMP – was originally proposed by way of RFC–1067 back in 1988. While that doesn’t hold a candle to TCP/IP which is approaching middle age, SNMP, at the grand old age of 27, seems to be having a bit of a mid-life crisis.

Reliability

One problem with SNMP is that it’s based on UDP, or as I like to call it, the “Unreliable Data Protocol”. This was advantageous once upon a time when memory and CPU were at a premium, because the low session overhead and lack of need to retransmit lost packets made it an ideal “best effort” management protocol. These days we don’t have the same constraints, so why do we continue to accept an unreliable protocol for our critical device management? If Telnet were based on UDP, how would you feel about it? I’m guessing you wouldn’t accept that, so why accept UDP for network management?

It’s kind of funny that we’re still using SNMP when you think about.

Inefficient Design

Querying a device using SNMP is slow, iterative and inefficient. Things were improved slightly with SNMPv2, but the basic mechanism behind SNMP continues to be clunky and slow. Worse, SNMPv3’s attempt at adding security to this clunkmeister remains laughably unused in most environments.

Alternatives

So if SNMP is the steaming pile of outdated monkey dung that I am suggesting it is, what should we use instead? I’m open to suggestions, because I know there must be something better than this.

SNMP Traps

For traps, rather than waiting for a UDP alert that may or may not get there, how about connecting over TCP and subscribing to an event stream? If you only want certain events, filter the stream (a smart filter would work on the server side, i.e. the device). This is pretty much what the Cisco UCS Manager does; the event stream reliably sends XML-formatted events to anybody that subscribes to them. This also means that you don’t get what I see so often in networks, which is a device wasting time sending traps to destinations that were decommissioned years ago. An event stream requires an active receiver to connect and subscribe, so events are only sent to current receivers.

SNMP Polling

The flavors of the month in the Software Defined Networking (SDN) world are things like NETCONF and REST APIs. These are TCP-based mechanisms by which to request and receive data formatted in JSON or XML, for example. You’ve spotted by now, I’m sure, that I’m network centric, but why not poll all devices this way? Rather than connecting each time a poll is requested, why not keep the connection alive and request data each time it’s needed? XML and JSON can seem rather inefficient as transfer mechanisms go, but if we’re running over HTTP, maybe we can use support for GZIP to keep things more efficient?

Parallel connections could be used to separate polling so that a high-frequency poll request for a few specific polls runs on one connection while the lower-frequency “all ports” poll runs on another.

Screen-Scraping

If we put aside fancy formatted data, have you ever wondered whether it would be easier to just SSH to a device and issue the appropriate show commands once a minute, or once every 5 minutes? SSH can also support compression, so throughput can be minimized easily too. The data is not structured though, which is a huge disadvantage, so unless you’re polling a Junos device where you can request to see the response in XML format, you’ve got some work to do in order to turn that data into something useful. In principal though, this is a plausible if rather unwieldy solution.

And You?

How do you feel about SNMP; am I wrong? Are you using any alternatives – even for a small part of your assets – that you can share? Are you willing to use a different protocol? And why has SNMP continued to be the standard protocol when it’s evidently so lame? The fact that SNMP is ubiquitous at this point and is the de facto network management standard means that supplanting it with an alternative is a huge step, and any new standard is going to take a long time to establish itself;just look at IPv6.

Parents Comment Children
No Data
Thwack - Symbolize TM, R, and C