Joys of NTA "Unmanaged Interfaces"

Question

Disclaimer, this has turned into a long post / rant... proceed with caution...

This is a general plea to SW and to the community at large for some reason and logic as it relates to NTA.

We've actually owned NTA for a couple years now, and only with the recent changes in NTA 3.9 have we actually been able to use it at all.

But it's still got some pain points that I think really need to be addressed.  I doubt I'm alone here.

We have a lot of non-Cisco routers/switches/etc.  Many of our devices do not allow setting the NetFlow/sFlow source interface as in Cisco devices.  Yes, this isn't rocket surgery, but the vendors don't seem to think it's important enough to warrant adding the feature, so it is what it is.  Because of this, for 2 years NTA was pretty much useless to us, since we manage our routers by a loopback IP and they would send sFlow data from the egress interface IP, NTA would summarily and happily throw away all the sFlow data it was receiving.  That was very helpful.

NTA 3.9 made big strides with the "Allow matching nodes by another IP Address" feature.  Now NTA is smart enough to realize that my router might have 10, or 100 IPs and understand that they all belong to the same device, and that device is a managed node within NPM.  Hooray for that.  The bigger question is, really, "why should it even matter?"  But, more on that later in the post.

The not-so-awesome part? -- edge interfaces.

Let's say I have a data center switch with 500 interfaces on it.  I don't really want to make all 500 of those interfaces "managed" in NPM, because if you multiply those 500 interfaces across, oh.. a few dozen switches, suddenly your NPM server is polling 15,000 elements instead of 1,000 or so.  Does that mean that I don't care about traffic statistics across the interfaces that aren't managed in NPM?  Of course I do!  I want to know if Server A is blasting traffic at Server B within my data center, and for what reason. Since both of those servers hang off of edge ports on my DC switch, I don't necessarily want to monitor those interfaces in NPM, but I don't want NTA to throw away the sFlow stats for all those flows either, just because they're coming from "unmanaged interfaces."

Here's an example:

NetFlow Receiver Service [LEELA] is receiving NetFlow data from an unmanaged interface on cs-104-dc.network.local. The NetFlow data will be discarded. Use the "Edit this device" link or Orion node management to manage interface 'GigabitEthernet16/21' and process its NetFlow data.

Multiply that by a few hundred interfaces per switch, and a few dozen switches, and suddenly you start to wonder why you're paying so much for the NTA add-in module when, and I want to emphasize this, many (most?  all?) other NetFlow/sFlow/etc. collectors/analyzers out there just accept the data they receive and analyze it / report on it / etc.

Another case -- consider a large network or campus with 700+ edge switches, and let's just say they're all 48 port switches.  If I manage my 700 switches, and the important interfaces on them (uplinks, critical links, etc.) I might have around 2,000 or so elements to poll with NPM.  Why don't I manage all the edge interfaces in NPM?  Well, first of all, I don't care if edge interfaces go up/down, or change speeds, etc.  That happens all the time.  But I do care, however, if two, or a group of users are blasting traffic at each other (and for what reason).  If they are serviced by the same switch, traffic doesn't cross the uplink interfaces and thus isn't reported in the sFlow for that interface.  Since I don't want to have 700 switches x 48 ports = 33,600 elements in NPM because I don't have the finances of an oil conglomerate to pay for all the physical servers and SW licensing to actually handle that kind of load, I would still love to be able to see the sFlow reports across my edge interfaces.

I truly don't understand why SW continues to be so draconian with NTA.

It would seem to me that it adds a non-trivial amount of overhead to processing every single *Flow packet that comes in to check:

1.  Did this come from a managed node (primary IP?)

2.  Did this come from a managed node (secondary / additional IP?)

3.  Did this come from a managed interface on a managed node?

There are dozens of NetFlow / sFlow / whateverFlow analyzers out there, some paid, some free.  I've not run into a single one other than NTA that throws away packets based on unmanaged nodes, or unmanaged interfaces.  They take the data you send to them, analyze it, and report on it.  Period.

The draw to NTA is that it integrates into Orion.  Great, except when such strict rules are enforced as to make it less useful to you than having a standalone tool that doesn't impose these silly restrictions.

The NTA/NPM integration/correlation could still bring great value, except do it at the presentation layer (logically) rather than the data gathering layer.  Let NTA just accept and process the *Flow packets.  Get rid of the strict rules about managed nodes, managed interfaces, etc.  Without doing all those checks for every *Flow packet, the servers could likely scale to a higher load since they're doing a lot less work for each packet, too!  Bonus points for that.  It could clean up the code base for NTA, too.  Simple is good, remember that from our programming 101 days?  Why over-complicate something when it just doesn't need to be?  When the data is pulled to be presented in reports, or in Orion, then it could be correlated with NPM nodes.  Flows coming to/from NPM managed interfaces?  Great, present that information with the relevant interface info, links, etc. gathered by NPM.  Flows going between managed devices?  Great, enhance the report / display with additional information as a benefit of NPM knowing about those nodes.  Flows coming from UNmanaged interfaces or nodes?  No biggie, just display the flow data without the enhanced data that would have come from linking nodes/interfaces to managed elements in NPM.  Having a node or an interface managed through NPM should be an enhancement to how NTA displays data, not a prerequisite!

This has caused me to question the recurring licensing costs for NTA.  If it can't/won't be allowed to stretch its legs and give me useful information in my environment, then what value is it bringing me?  I might just have to switch back to one of the free or significantly-less-expensive solutions out there which happily accepts data from any of my sources and gives me lots of useful reports and information.

Am I alone on this?  I'd love to hear some others' experiences and how you all feel about these issues.  I'd love to see SW do something in the spirit of making things a whole lot better for the customers here.

keithr · Answer

@Ismo,

Netflow is not the problem.  Yes, Cisco supports Netflow better than anyone -- that's because Cisco designed Netflow as yet another proprietary protocol.  Other vendors support standard protocols such as sFlow, or variations of them such as Juniper's Jflow.

However, this isn't about Netflow -- this is about SW's licensing and lockdown of the NTA capabilities.

In a network with over 30,000 interfaces, it is impractical and nearly impossible to have all of those interfaces as monitored elements in Orion.  The capacity of the polling servers, licensing of polling engines, etc., adds up to astronomical cost.  However, the NTA product will only accept flow collection from interfaces that are monitored in Orion.  As I mentioned before, every other *flow collection/analysis software I've used, whether paid or free, simply accept and analyze the flow statistics that are sent to them, without imposing draconian licensing restrictions in the process.  I cannot have 30,000 monitored interfaces in Orion, but it's pretty important to be able to monitor my flow statistics from all of those interfaces, for reasons which I outlined in my first post.

As I mentioned, it would simplify the architecture of the NTA product, likely improving system performance, increasing capacity, giving better return to customers on their investments, and providing better services to the customers.  This seems like a win/win to me.

I am quite curious why SW's doesn't seem to have anything to say on this topic either -- they're pushing this forum as the community's connection into SW, yet they are conspicuously absent from this conversation (and a few others).

Ismo · Answer

About Cisco thing you mentioned in the beginning... Cisco supports netflow the best way in the market. We have pure Cisco environment, thats the reason I choose NTA and not some RMON hardware-software-combination. So right tools to right use. But that's not necessarily SW:'s fault. Shouldn't use complain to other network hardware manufacturers? Why they don't want to develop netflow?

I'm using netflow only in LAN (to replace my old Netscout) and when I bought it, it was cheap way to monitor traffic, compared to RMON hardware etc. I saved about $15000. And NTA was enough for my use, and still is. So I haven't had any of those other problems you mentioned because I'm monitoring one core swich with NTA and that's all. But I'm not saying you're totally wrong. If things were as you hope, it probably wouldn't be nothing taken away from me. So good luck for your wishes.

keithr · Answer

So... am I really alone on this?  Everyone else is really happy with the way NTA does these things?