cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post

Orion Advanced Traffic Analysis (aka DPI, NBAR, Flexible Netflow, Wireshark...)

Level 15

We have been keeping an eye on the discussions here asking for improvement in terms of network traffic analysis, beyond what MIB 2 (Interface traffic) and Flow technology (NetFlow, sFlow, jFlow...) have to offer.

And we recently saw that NBAR support was voted #1 enhancement request for NTA - the Network Traffic Analyzer (see more about SolarWinds Network Traffic Analyzer here)
As we were thinking about this, we wondered whether the requirement was basically for NBAR - period - or if this was the sign of a larger requirement for better tools to ananalyze your traffic.
For example, we did a Traffic Analysis survey, asking if you used DPI-type solutions and we discovered that a large proportion was using Wireshark (not really a surprise, but was a good confirmation) or any other form of DPI product
DPI solutions used by SolarWinds customers.PNG
Also, I recently talked to some of you and confirmed that a significant proportion (actually higher than what the above chart suggests) either had a Deep Packet Inspection (DPI) solution in place (Riverbed Cascade/Opnet, LanCope, ...) or had this in their budget for this year.
So all this fueled a lot of thoughts and raised some questions on our side, on which we'd really appreciate your comments and answers:
  • Does Wireshark meet most of your needs?
  • Do you need Advanced Traffic Analysis for environments other than Cisco, where solutions like NBAR may not be available? In other words, do you need a vendor-agnostic solution?
  • All of you who own a DPI-type solution, mentioned their (prohibitive) cost as an issue. We are convinced that you don't need to be an Enterprise, able to spend $200K or more on these solutions, to really need them. Small and medium-size businesses also encounter challenges with their traffic which require deep analysis.
    Is it time for SolarWinds to commoditize that market and offer 80% of those features for 20% of the cost?

Before I'll let you go and express your thoughts and describe your experience, here is a bit more on how we think about this problem.

What are the needs for Advanced Traffic Analysis?

We see mainly 4:

Breakdown my traffic by application and user, like Netflow does ... but better

As convenient as Netflow technology is, it actually does a pretty average job at identifying your applications (unless you deploy Flexible Netflow):
    • Limited to static ports. Any app using dynamic ports will be invisible to (Net)flow technology
    • Ignores that many application use port 80 to go through firewalls and are actually NOT HTTP / Web applications
    • Is unable to identify reliably true Web applications: either because (Net)flow does not inspect the HTTP header and does not do URL extraction. Also, content networks such as YouTube.com (owned by Google) are identified by the Content Distribution Network they use as opposed to the Web site they really have (e.g. 1e100.net for YouTube.com...).
      See this typical question we have, pointing to this great post explaining why (Net)flow is really not the ultimate weapon and why Flexible Netflow is not the panacea either.

I need an aggregated view of the Quality of Experience rendered by my applications to my users

In a perfect world, users experiencing slowness in their application will open a ticket or send an email, and IT will know about it. But the world is not perfect and how many IT engineers discovered the hard way, as they were thrown under the bus by an email from a user to the CIO, that the QoE offered by IT is actually not that great, despite what they thought?

Of course there are solutions to simulate users connecting across the network (E.g. SolarWinds VNQM, based on Cisco's Ip SLA technology) as well as simulate the details of their transactions on their mission critical applications (e.g. SolarWinds WPM), but those are based on simulations and do not reflect the true experience of users.

Wouldn't it be great to have a dashboard looking at the traffic of all your users connecting to their applications and calculate the latency they REALLY experience and reporting this in near real-time, or on a daily, weekly, monthly, quarterly basis?

I need help troubleshooting slow access to applications

Your users are complaining about slowness when accessing some application? (or a QoE dashboard, as presented above, is keeping you informed about that)

The first things people do is use the basic tools they have, to try to troubleshoot:

    • Look at saturated interfaces that can explain slowness. Then go to  (Net)flow information to understand the nature of this traffic and see what non mission-critical traffic could be removed to avoid the saturation moving forward. The problem with this, is that a) it's not always easy to identify all interfaces on the path from the impacted user(s) to the application and b) excess of traffic is not always the cause of the slowness
    • Look at the devices along the path - from the switch the user(s) is(are) connected to, up to the server this application resides one, via all WAN routers - and see if any experience poor health explaining the slowness (CPU, memory, IO...)

Decent start, but what if this does not answer the question?

What if it's a mis-configuration? What if it's one particular transaction of this application, out of dozen, that is slow, how do you figure which one? What if the slow application is actually spread across multiple-tiers (application server, database, storage...) and what you thought was a "simple" analysis of the path between the user and the application server, happens to be more complicated and involve several back-end servers?

All these more complex but unfortunately real-life examples, are almost impossible to troubleshoot with the basic tools.

Security is key for us and I need to know exactly what peole do on my network

Who is accessing internal file shares? From the outside of my network, really?

How about browser-based file shares (e.g. dropbox)? Is last amount of internal material being copied over to those?

Are your users downloading copyrighted content from P2P media and storing it on company-owned asset, e.g. their workstation?

Do you have unusual traffic from unexpected countries? What is this traffic about?

You already have an Advanced Traffic Analysis (e.g. DPI-based solution)? Tell us what you think about it...

  • What use case(s) does it meet for you (from list above or other)?
  • Do you consider it cost effective?
  • How important is it to have these products integrated to platform such as an NMS / IT infrastructure management platform such as Orion?
  • Would you consider a solution integrated to Orion that would address your most important need at a fraction of the cost? Or do you need the full power of those expensive solutions? Tell us what the minimum bar is!

How does encrypted traffic impact the effectiveness of your Advanced Traffic Analysis (e.g. DPI-based solution)?

If you currently run some form of Advanced Traffic Analysis product (e.g. DPI-based solution), how does encrypted traffic impacts it.

Did encrypted traffic dictated where you deployed your packet capture probes? Are there areas of your network that carry encrypted traffic that you are blind on due to that?

Sniffing traffic, ok, but which one?


Let us know about your most important traffic types, on which to perform Advanced Traffic Analysis:

  • A) LAN traffic on corporate / Data Center (internal)
  • B) LAN traffic on remote site
  • C) WAN traffic (general)
  • D) WAN optimized traffic (e.g. Cisco WAAS, Riverbed, Bluecoat)
  • E) VM to VM traffic
  • F) Load balanced traffic (e,g, Cisco ACE, F5)
  • G) Virtualized traffic (e.g. analyse traffic in/out of an application hosted by a Cloud SP)
  • H) DMZ traffic

Sniffing traffic, ok, but how?

Agents, span or Tap or RITE?

  • The agent-based technique is about running agents on the OS - virtual or not - that hosts your mission critical applications. If your focus is the traffic that goes to your applications (as opposed to look at all traffic including the fully meshed opne that goes across all your sites sites), then agents are a good solution because they make the 3 techiques below unnecessary, but they require an invasive action on your OS by adding a component to it: what CPU/memory do they consume on the OS? Are they really only looking at the traffic? How do we upgrade 100's of them that are deployed on yuour server farm...
  • The Port Spanning / Mirroring technique is basically about high-jacking one port from your switch and dedicate it for mirroring all or part of the entire traffic of the switch. Then the management product listens to packets from this port and performs analysis, storage...

Pretty simple, because most switches support this now, just a commend top issue and a cable to connect and you can start drinking from the fire hose; but may have an impact on the switch and can't guarantee 100% packet captured on very heavily loaded switches.
Note that spanning a port is possible on a HW switch but also on a vSwitch within a virtual environment

  • Network Taps are basically pieces of hardware that you buy to replace the switch for this function (capturing packets). They are placed inline and have no influence on the switch and pretty much capture 100% of the traffic.
  • RITE - Router IP Traffic Export, is a Cisco technology like spanning a port on a switch, except that it's done on a router. Again, easy to deploy since it leverages an existing device, but it impacts it and won't work great at high speed. See this nice and short blog post

Tell us about your experience, your preferences, what's allowed and not allowed on your networks, as far as capturing packets for Advanced Traffic Analysis.

14 Comments
Level 9

PLEASE MAKE IT VENDOR AGNOSTIC!  I know Cisco is dominant and the solution should definitely support that line of hardware, but there are a lot of non-Cisco shops out there.  We use Force10 which run like a top and sFlow is great on these.  You have a lot of HP and Juniper customers also I would assume.

It would be awesome if you could incorporate some tools like Afterglow or Treemap.  For reference see Raffael Marty's book "Applied Security Visualization".  Scatter plots detailing port usage frequency can also be a great tool for detecting abnormal port traffic.  Should have the option to exclude certain ports such as 25, 53, 80, 443, etc. or display ranges of ports and how much traffic is running on each one.  Another great book that is coming out is "The Practice of Network Security Monitoring" by Richard Bejtlich.  Either of these books give great examples of useful tools that give us better insight as to what is running around our LANs.

I would initially focus on WAN (general) inbound and outbound mainly due to resources I have to monitor.  I'd like to watch everything, but right now my focus is more on what is going in/out and not what is local.  I hope to expand to increased local monitoring soon.  One set of eyes can only watch so much.  One thing I have always wished for but not taken the time to build is a global map that shows me where all of my traffic is originating from.  Of course geolocation can be subverted, but it is still a nice tool to have to quickly spot some obvious things that should be investigated.

Please no agents.  I have enough agents that I need to install and I think the world is moving away from those.  Span would be easy for me, but would be willing to use TAP even if I had to purchase one as long as it didn't cost a fortune.  What about SYSLOG feeds from firewalls?  Similar to how Splunk does it.  If this is the case, your parser would need to me something that the user could map fields to.  Take an example SYSLOG string, map IN to one field, OUT, to another, etc.  If you could build this, it would then remain vendor agnostic.

Encrypted traffic is a problem yes.  Many bots attempt to run over SSL so that is a problem.  I can break it with my firewall but then it screws up Gmail, etc.  It is a problem that needs to be resolved somehow.

Level 14

+1 for Juniper (sflow & jflow)

Level 16

Mustafa is right

Before running to find shine new features fix the old bugs.

1.Sflow from  Juniper routers is somthing that work on all other netflow collector in the market !!!

/SJA

Level 15

Thanks a lot for thorough answer.

Yes, one of the key benefit of DPI is that it's inherently vendor agnostic.

See more above, about the Juniper comment of Mustafa and sja.

Level 15

We are ALSO working on these issues. What we see on Juniper and flow, in general is this:

  • JFlow is technically same as NetFlow, so the work done to support sampled NetFlow applies also to JFlow. But Juniper devices don’t follow the specification fully and some models don't export the sampling rate, so what we have recently done on Sampled Netflow does not work. We are working on a fix (179957).
  • sFlow should be supported accross all devices. We are going to look at issues specifically on Juniper. What do you see? Not working at all, or just ignoring the scaling factors, so you see stats but they are wrong?

Assuming we have these flow issues on Juniper fixed, do you have more needs on Traffic Analysis (e.g. DPI)?

Level 14

About sflow. The scaling is wrong.

Sent from my Sony Xperia™ smartphone

Level 15

That helps. Thanks (261258)

Level 16

Hi

Cisco is doing a great job in that area that NBAR  is very handy.

I just don't think that the current NTA is right to that (DPI NBAR DDOS...)

All of the other that du that well running dedicated appliance.

If solarwinds "think"  there money here they should buy and re-brands nDPI

But never mind that.

Yes you should know some time what running on port 80 on that IP 🙂

and all kind of DDOS is becoming very relevant to any internet base service. 

Wireshark is handy 2.

/SJA

Level 20

We also have OPnet but I've gotta be honest... for the huge expense it's not my favorite toolset.  A better improved NTA integrated into our NPM I have seen become better and better all the time... Keep adding features and functionality to NTA.

Level 15

Great interaction, thanks all, keep it coming.

If someone has an opinion/need about the level of protocol decoding, we'd be happy to hear it. Either describe it here, or let me know, I'll ping you off line to discuss.

Thanks again all or your time.

Level 8

I've tried Cisco's NBAR in the past in an attempt to throttle torrent traffic which was consuming too much of our Internet bandwidth.

I found that NBAR was not efficient at it as it was not identifying torrent flows accurately.

So the question here is that if it could not identify for policing, how accurate will the reporting be?

What we ended up doing was buying a DPI box which did the job.

Another question is, what about non-Cisco traffic?

So i don't believe we should be comparing NBAR to DPI as that would be misleading.

A software DPI engine would be a great add-on.

Now how would you accomplish this? you'll need to capture the traffic. Putting NTA inline is out of the question, and so the 2 remaining options are: TAP and SPAN.

Spanning traffic is know to cause packet drops on switch interfaces as it overflows the ASICs responsible for several ports at once. So if a user configured the destination port of the SPAN on the same ASIC responsible for a critical server port, he might get into performance issues, and more discarding ports on the TOP 10 NPM page 🙂

The only option left is TAPs. Perhaps a Solarwinds branded smart TAP? 🙂 At that point why stop there? Integrate them with SAM, and we could have a multi-TAP application performance solution that would allow us to easily identify what hop is impacting application performance.... And perhaps the ability to integrate with wireshark, handing off packets for analysis based on the filter criteria from NTA and/or SAM, which would be converted to a capture filter.

The possibilities are many ... brain overheating 🙂

Level 9

Has anyone had any luck getting DPI to run on a VM?

Level 20

DPI on physical interfaces takes some major horsepower... especially on 10G ports.  I suppose if the whole thing happened in virtual world it might be possible.  I'm not familiar with any all VM DPI???  I'm sure there is some.

Level 7

The Idea of DPI through Sniffing vs Netflow is simply the ability to send netflow from remote sites as well vs sniffing sending ,,,,,

Netflow is for network management and statistics vs sniffing which is a different product aimed to deeper analysis of packets themselves.

netflow FNF (v9) is flexible and more ideal for remote sites sending statistics.

do you have FNF support for other fields like TTL/TCP Flags etc ?

About the Author
Francois has joined the SW product management team in Dec 2010. He has been in the network management space for about 15 years, first in a startup company, then in one of the big 4 and back to a human-size company. Despite his bizarre accent, he is a decent guy to talk to.