cancel
Showing results for 
Search instead for 
Did you mean: 

Get with the flow… with NetFlow!

Level 11

IT professionals are admittedly a prideful bunch. It comes with the territory when you have to constantly defend yourself, your decisions, and your infrastructure against people who don’t truly understand what you do. This is especially true for network administrators. “It’s always the network.” Ever heard that one before? Heck, there’s even a blog out there with that expression created by someone I respect, Colby Glass. My point is, as IT professionals, we have to be prepared at a moment’s notice to provide evidence that an issue is not related to the devices we manage. That's why it's imperative that we must know our network very well inside and out.

With that being said, It should be no surprise to you that when I started my career in networking in 2010, I thought NMS platforms were pretty amazing. Pop some IP addresses in and you’re set.¹ The NMS goes about its duty, monitoring the kingdom and alerting you when things go awry. I could even log in and verify it for myself by looking if I wanted to be certain. I could even dig in at the interface level and give you traffic statistics like discards and errors, utilization, etc. I had instant credibility at my finger tips. I could prove the network was in great shape at a moment's notice.  Want to know if that interface to your server was congested yesterday evening at 7pm? It sure wasn't and I have the proof! Can’t get much better than that, right?

Until…

I saw netflow for the first time. Netflow has a way of really opening your eyes. “How did I ever think I knew my network so well in the past?”, I thought. I had no visibility into the traffic patterns flowing through my network. Sure, I could fire up a packet capture pretty easily, but that approach is reactive and time-consuming depending on your setup. What if that interface really WAS congested yesterday evening at 7pm? I have no data to reference because I wasn't running a packet capture at that exact time or for that particular traffic flow. It’s helpful to tell someone that the interface was congested, but how about taking it a step further with what was congesting it? What misbehaving application caused that link to be 90% utilized when traffic should have been relatively light at that time of the day? The important thing to realize is that I’m not just an advocate for netflow, I’m also a user!² Here’s a quick recap of an instance where netflow saved my team and I.

I recently encountered a situation where having net flow data was instrumental. One day at work, we received multiple calls, e-mails, and tickets about slow networks at our remote offices. They seemed to be related, but we weren't sure at first. The slowness complaints were sporadic in nature which made us scratch our heads even more. After looking at our instance of NPM, we definitely saw high interface utilization at some, but not all of our remote sites. We couldn't think of any application or traffic pattern that would cause this. Was our network under attack? We thought it might be prudent to involve the security team, in case it really was an attack, but before we sounded the alarm, we decided to check out our netflow data first. What we saw next really baffled us.

Large amounts of traffic (think GBs/hour) was coming from our Symantec Endpoint Protection (SEP) servers to clients at the remote offices over TCP port 8014. For those of you who have worked with Symantec before, you probably already know that this is the port that the SEP manager uses to manage its clients (e.g. virus definition updates). At some point, communication between the manager and most of its clients (especially in remote offices) had failed and the virus definitions on the clients became outdated. After a period of time, the clients would no longer request the incremental definition update; they wanted the whole enchilada. That’s okay if it’s a few clients and the download process ends in success the first time. This wasn't the case in our situation. There were hundreds of clients all trying to download this 400+MB file from one server over relatively small WAN links (avg. 10Mb/s). The result of this was constantly failing downloads which triggered the process to start over again ad infinitum. As a quick workaround, we decided to QoS the traffic based on the port number until the issue with the clients was resolved. With this information at our disposal, we brought it to the security team to show them that their A/V system was not healthy. Armed with the information we gave them, they were quickly able to identify several issues with the SEP manager and its clients which helped them eventually resolve several issues including standing up a redundant SEP manager. Without net flow data, we would have had to set up SPAN ports on our switches and wait  for a period of time before analyzing packet captures to determine what caused the congestion. By having netflow, we were instantly able to capitalize on it by viewing specific times in the past to determine what was traversing our network when our users were complaining.

That’s just one problem netflow has solved for us. What if that port was TCP/6667 and it was coming from your CFO’s computer? Do you really think your CFO is on #packetpushers (irc.freenode.net) trying to learn more about networking? No, it’s more likely a command and control botnet obtaining its next instructions on how to make your life worse. From a security perspective, netflow is just one more tool to add in the never-ending fight against malware. So what are you waiting for? Get with the flow… with netflow!

1. Of course it's never quite that easy. You'll have to configure SNMP on all of your devices that you want to manage and/or monitor.

2. Hair Club for Men marketing

20 Comments
Jfrazier
Level 18

I agree it is a good tool...I would also tend to think it is underutilized in most environments.

Thank you for posting this.

kmillerusaf
Level 11

No problem Jfrazier. Have any stories to share about how it has helped you?

Regards,

Keith

familyofcrowes
Level 16

For us netflow is used almost daily.  "Why is XYZ plant so slammed" is asked quite often and answered easily with a click or two off the top 5 netflow interfaces report.

Netflow really does save a crazy amount of time and makes an engineer look good when troubleshooting.

But what about when we are building new systems?  We find that looking at the type and quantity of traffic via graphs we can determine how a new system will fit in, or and upgraded system will perform.  How about Riverbeds or loadbalancing, this is where netflow can save you alot of headaches in the future.

Great post...  when I have more time I'll elaborate on some stories...

kmillerusaf
Level 11

Same here, once the business knew we had that capability, it seemed like we had requests at least once a day to check on traffic to a remote office.

Our plants typically had higher bandwidth connections but our business offices where we took customer payments were always on 10Mb/s metro-E or T-1s.

Thanks for the reply and I look forward to reading some of your stories.

ecklerwr1
Level 19

And with these new probes timing the three way handshake on all of these conversations too!

superfly99
Level 17

Netflow is great! I used to manually look on the router to see who was hogging the bandwidth but it was reactive. Now with netflow, I can see who is doing what and when. I've set up an alert to let me know if any of my links are over 90% utilisation for a continuous 10 mins. If so, then I get an email with a hyperlink directed to the interface in question. I click on the link and straight away I can see the cause of the traffic.

Brilliant.

zackm
Level 15

HUGE NetFlow fanboy here! I absolutely love the technology and can sit in a room of whiteboards and discuss architecture and design ad nauseam.

<soap box>

That being said, one thing I keep seeing over and over at different clients is duplicated data. In most environments, there isn't a technical reason to not grab both ends of a point-to-point link (I mean, the devices can usually handle the increased load) but it's just not best practice and I wish it would stop. haha

</soap box>

Specifically in relation to NTA, I am a big believer in putting in the time/effort to create IP Address Groups based on both location and application as possible. I've seen some great dashboards created using IP Address Groups and the Flow Navigator panel's "save" feature. Really cool stuff.

prowessa
Level 12

Just tried it for a few months. It was nice.

kmillerusaf
Level 11

I agree with your thoughts zackm. At one point, we had netflow data configured on both our WAN edge routers and remote office routers. It didn't make sense to have duplicated data like you mentioned so we stuck with leaving it on the WAN edge routers only. They had more resources anyway.

jkump
Level 15

Having used it for years in other organizations, it is one of the first things I turned on at my present company.  It is valuable, underutilized, and can save hair-pulling at the same time.

dnerdahl
Level 10

Additionally, in a highly geographically diversified network with hundreds up on hundreds of network access circuits keeping those circuits as small as you can without impacting application performance is a crucial point of minimizing your overall network cost. Netflow is a critical tool is being able to identify the network impact of business applications on a satellite office's available bandwidth, helping to direct effective QoS policy creation and helping to identify opportunities for bandwidth savings by filtering or blocking types or sources of content.

Believe it or not, sometimes you don't have to just throw a bigger pipe at it.

lhoyle
Level 10

At my former employer, our main app was very latency sensitive. Rather than throwing bandwidth at the issue, we utilized NTA to find folks doing things they shouldn't be doing during peak periods, like YouTube (we could not block it due to our employer putting stuff out there) and the biggest culprit - SCCM. We ended up putting SCCM boxes at each location, so that traffic only traversed the WAN once. I guess that was a huge NTA success story.

kmillerusaf
Level 11

Amen! Throwing more bandwidth at the problem isn't always the answer. Understanding requirements and traffic patterns is key. Netflow helps you get there.

kmillerusaf
Level 11

Similar setup at my employer as well; regional SCCM servers to help keep traffic as localized as possible. And yes, we also had users using our network, specifically wireless, for all sorts of things like Youtube, Pandora, ESPN, etc.

It was nice to have the AVC functionality in our Cisco WLCs as well to catch things applications that were sent over the WAN using CAPWAP tunnels.

naburleson
Level 9

Same here, We used Altris for years and had nothing but problems, now we've switched to SCCM with site DP's(distribution points) and it has helped quite a bit.  We still encounter issues every once in a while when a new image is imported and pushed because they set it as a major update so it will push inside of business hours.  It helps to have netflow so we can identify the traffic quickly to terminate the job.  Another example that we've benefited from having a netflow tool in place is when we're looking to decommission old servers or remove old vlans/networks from sites, we check the netflow on the core switches and determine if anything is currently talking to the device/network.

kmillerusaf
Level 11

That's a perfect use case. I hadn't thought about using netflow to help with the decommission of old devices. Thanks!

mcam
Level 14

NTA has been an eyeopener for sure

Can't live without it now

jeremydr
Level 11

netflow has always proven it value, even with wan accelerators in place

rschroeder
Level 21

ASA's with FirePower (SourceFire) and ACI leverage full Netflow to identify flows and secure them without the traditional IP address / zone rules.  Now that Cisco's providing all 5 of the key Netflow items, we should see WLC's playing better with NTA and NPM.

I like the idea of Netflow being leveraged for "Retrospective Security Remediation".  As it sounds, you have to be compromised or infected to benefit from this, but once you are, Cisco says their latest solutions will identify the malware or intrusion, then track it back to every device it touches.  I've got 40,000+ active devices on the network, and knowing which ones were touched by malware or a hacker will be helpful for Security to identify the impact and loss.

Once the "good" flows are all identified and allowed, anything not tagged as "good" gets another tag, much like a VLAN tag, and then it gets filtered out or dropped or quarantined, and alerts go out to admins to chase down the offending malware & remove it.

Netflow will be more important than ever, per the Cisco Tech Day info I received this past week in Minneapolis.

kmillerusaf
Level 11

I agreee rschroeder‌, net flow will continue to become a major player as more and more people deploy it and realize the wealth of information that it provides. I am looking to deploy the new ASA 5500-X series firewalls at our HQ and colo in the next few months so it'll be interesting to see what I find.

Thanks for the reply!

About the Author
Network Engineer with SCANA