Hi,
Can you please tell us, why we suggest anyone to use NTA in place of NPM for bandwidth utilization? Main focus is to monitor bandwidth.
I like @troy.hrehirchuk 's reply. Allow me to add my two cents. First, You can't replace NPM with NTA. NTA *requires* NPM even to be installed or to function. Without the SNMP data that NPM provides, you do not, in most cases, have a complete picture of the node and interfaces involved in high bandwidth utilization and in many cases, end up with a lot of "unknown" traffic, which is spotty at best when it comes to understanding and troubleshooting bandwidth issues in your environment.
What NTA does is analyze the traffic itself. NPM tells you that you have high bandwidth utilization, so it does well with that high-level aspect of monitoring bandwidth, but SNMP, by nature, doesn't detail WHAT that traffic is, which is necessary to solve the issue. NTA gives you applications (application port mapping), conversations, source and endpoints, and a complete view of the actual traffic that is causing your high bandwidth utilization.
So, I wouldn't frame the conversation as "replacing" one with the other but rather using *both* to provide a complete picture of health, performance, and utilization for nodes and interfaces you are monitoring.
When NPM Might Be Sufficient:
I'd disagree with the assertion that NPM is designed for real-time monitoring though. That's one of the key differences between NPM and NTA is how quickly you get data.
NPM, by default, works off your default polling interval of 9 minutes. So, every 9 minutes your Solarwinds server polls the network device and asks it for the interface counters. These counters include how much data went across the interface in the last 9 minutes, there is no way for it to request any detail on pretty much anything, like what kind of data, where it came from, where it went, etc. etc... Just how much data crossed the interface. Perhaps more importantly, it doesn't know when it crossed the interface, just that it happened in the last 9 minutes. So, if you have a 1Gb/sec link that is 100% full for one minute, then drops off to nothing for the next 8 minutes, NPM would say that the link was at 1/9th capacity for those 9 minutes - it would not see the peak in data or know that the interface was potentially being overrun for a minute.
NTA data is not polled, it is generated by the end device and sent periodically. It's based on traffic flows, and will by default send a summary of a flow when it completes. So if you have a file transfer going on for 3 minutes, some systems will at the end of the 3 minutes, say that this traffic flow just ended and it contained X bytes. You will probably get other info, such as the source IP, destination IP, source port, destination port, some QoS info and what protocol was used. Other info might be sent also, depends how you set up the device. Now, that being said, its not ideal to only send info when a flow ends, so most folks recommend you tell your device to send updates on flows periodically. Usually you set how often you want it to send data for active flows, and another timer for how long you want it to wait after it considers a flow to be completed. I'd set them both to a minute or under.
So, NPM data by default it polled by Solarwinds every 9 minutes, while NTA data you should get updates every minute. So I consider NTA data to be much more "real-time" than NPM. However, it is a bit more difficult to alert off of. You can adjust the polling interval on NPM if you want and make it quicker, but that will definitely affect your polling rate and possibly your polling completion. ie: if you make your server do more, it will be busier, and its possible you might make it so busy that it can't finish its polling on time. That can be much worse because then you are missing potentially vital info. You can add more pollers, which can allow you to poll more in the same amount of time, but those are expensive!! $$$
That being said, Netflow on Cisco devices at least, is done in hardware, so no matter how quickly you tell it to update the data it shouldn't affect your CPU and you get VERY complete information. Some devices don't do full netflow, or they don't do it in hardware. Usually you'll see this as the device doing what is called "Sampled Flow" or something similar. With those devices you have to set the sample rate, which might be something like 1:20000 packets, or as low as 1:100 packets. But, the lower you set the sample rate, the more it might impact the devices CPU.
So, I agree with everyone saying that you won't replace NPM with NTA, they are simply different technologies that give you different data. But, its good to know what you're looking at when you look at each one. While NPM data will give you a general idea about what is going on, you can get good details on what traffic is passing with NTA. NPM data is easy to alert on, its one big batch of traffic. NTA is not, its data on a bunch of different flows at points in time. With NTA you can just say alert if it goes over this value, while with NTA you'd have to do a query on maybe thousands of flows to find out how much data is passing at a point in time.
HTH!
100%. NPM (and really any of the Platform Modules) are not fundamentally designed to be real-time solutions. Even with very short polling intervals (NOT recommended), actions and alerting are triggered from data in the database, not in memory, so not a true real-time solution. We have real-time and near-real-time features such as detail widgets and specific PerfStack metrics, but as a day-to-day function, you are correct. NPM is not a real-time monitoring solution.
@cnorborg Hits it on the head with the cost involved to get near real time monitoring and alerting. @cheryl1 provides the final quick answer - NPM is not fundamentally a real time solution.
The only other point is whether you can actually respond in real time. The monitoring and alerting will be delayed by minutes in most cases. If you really need that bandwidth alert when it is 'full,' then when to alert is the bigger conversation. Unless you are actively sitting, and ready to troubleshooting and resolve in 'real time,' then an alert threshold set for 70 or 80 percent may be what you need. This will give you time to troubleshoot, react if needed, and most importantly document the findings. Then when things go bad, you are already working on them as the alert kicked off before the critical point.
This is a discussion that I have had with many support teams - how quickly are able to fix it if it is already critical? The answer often comes back - lets get a threshold and monitoring/alerting configured to give us time to fix. With experience and documentation, then that threshold and alerting configuration can be modified over time to eliminate false alarms or rather - issues that are temporary and will go away on their own without impacting performance.
Anomaly alerting is the next big boost in this arena and I would like to see how everyone's experience has been. If the system can detect the outliers and notify us when something does not appear to be normal then YAY! This eliminates a lot of discussion and setting manual thresholds and custom alerting. This is why trap and syslog monitoring is and has been trending to anomaly detection instead of just thresholds. The sheer amount of data is staggering, but trends can be picked up by the system.