Showing results for 
Search instead for 
Did you mean: 
Create Post
Level 8

Netflow data not matching

On my home screen I see my one router is pushing 357M through the G1/8 interface but when I click on that interface I am only seeing 300k worth or traffic. Why arent these two screens in-sync?



0 Kudos
4 Replies
Level 8

This is on a 6500 Cat Switch and it was setup to export V9, once I changed it to V5 everything sync'd up again

0 Kudos

Good article, but might warrant a bit further discussion.  Netflow and interface stats are both quite different in how they are obtained. 

Interface stats are polled via SNMP, they are a simple counter that increases.  So, if you have your polling interval set to 5 minutes for instance, the only thing Orion knows is that X amount of data passed through those interfaces in 5 minutes.   Lets say in that 5 minute polling interval 100Mbits of traffic came through.  Now, you could have gotten all 100Mbits in the first 30 seconds, or you could have had a steady stream for those 5 minutes, but both would show up the exact same as far as Orion is concerned.

Netflow stats are pushed by the router and instead of raw traffic counters, it keeps track of flows of data.  A flow is a "unidirectional sequence of packets that all share the following 7 values: Ingress interface, Source IP address, Destination IP address, IP Protocol, Source port, Destination Port, IP Type of service".   That's a quick and dirty explanation, a simpler explanation would be its a single conversation going on between a client and a server.   Now, some conversations are transient and very quick, like HTTP.   Others can be much longer, like a non web-based application such as a custom interactive app.   Why is this important?   By default a cisco router won't send its netflow information until a flow is ended.  So, lets say a custom interactive app session goes on for 100 minutes and there is 15Gb of data during that time transferred in one direction.   At the end of that Netflow could push a packet that says 15Gb of data just got transferred that very second.   Not a good thing!   So, as in the article pointed out by Jesquitin​ states, its important to define your timeouts for Netflow and keep them short, preferably around a minute!  That way you're always getting fresh data in Netflow.

Now, lets say that you set up both Netflow and SNMP to have the same timeouts or polling intervals, you still might not get the same amount of data shown, although it would probably be closer.   Why?   There are a few reasons, beginning with the fact that even if your timers are the same period, you aren't guaranteed they will poll or transmit the data on the same interval.  Because of this they will probably >never< be synched up completely.

I did say there were several reasons though, and that is true.   You also have to be careful how you set up your interfaces in Netflow.  Lets say you have one interface in and one interface out on your router.   In this instance you'd be safe to pick one of the two interfaces and set up both "ip flow ingress" and "ip flow egress" to capture the flows in both directions.   If you did this on both interfaces you'd see double the data, because an ingress flow on one interface would be an egress flow on the other one.   However, picking a single interface isn't always the best way to do it.   A safer approach for routers with multiple interfaces is to do "ip flow ingress" on all of the interfaces and you can be reasonably sure you won't double up the bandwidth being seen.

However, it does get tricky when you are doing things like split-tunneling, where the internal traffic goes across a tunnel interface, and the internet traffic goes out the interface without going through a tunnel.  In order to see the traffic on the tunnel, you have to do "ip flow ingress" on the tunnel interface, but if you want to see the Internet traffic you also have to do it on the physical interface.   In this instance you would see both the broken down traffic inside the tunnel, PLUS the encrypted IPSEC traffic of the tunnel itself as one big hunk.

Then there is how you have Netflow set up also!   By default there are a few settings that will allow NTA to reduce its load and amount of storage, by only keeping track of what it considers to be "important information".   So, under NTA settings, there is one setting "Enable data retention for traffic on unmonitored ports" that if left unchecked might make it so NTA never stores information about some of your traffic.   You can either check this box, or go into "Choose the applications and ports that you want to monitor" and make sure the applications you want to watch are there.   However, if you don't click the "enable data retention" box, you will most likely be missing some data!   There is also a "Monitored Protocols" section where you choose which protocols to monitor, these are things like ICMP, IPv4, IPv6, TCP.  Depending on whats on your network, not monitoring all protocols could have an impact also.

And finally (I think) there is the fact that by default NTA only keeps track of the top 95% of all network traffic.   This is known as "Top Talker Optimization" and is another setting in NTA settings.   That means that by default it will ignore the bottom 5% of talkers.  If you want to change this to 100%, read the help section first, it will increase your storage requirements by a factor of 40x to 100x!  Yes, that is 100 times the storage of keeping only 95% of the data.

So, hopefully this points out all the reasons why Netflow data will probably almost never equal SNMP data, and why both are important to a degree!!   And hopefully I pointed out a few things you can do to make it a bit closer at least...

You are correct on all your points.  Very informative and detailed.  The SNMP bandwidth utilization  shown in "Traffic In" and "Traffic Out"  The article sent only discusses possible reasons why the data will be lower  than SNMP interface stats.  There is another article that covers the point on why data would be more in Netflow that SNMP stats

Level 9

There are a few reasons that can cause this issue.  The article in the link below provides some things to check and verify.

Hope this helps