I'm seeing the same thing, and I have a very robust SQL server on the back end. I'm not sure if it's limited yet to ICMP, but I'm combing through the forums here to see if others have had this issue and what the fix might be.
it's happening to all my devices it seems, and it appears to be getting worse. I'm wondering if I'm polling too many devices?
Attached is a graph that shows what I'm seeing
I recommend opening a support case at www.solarwinds.com/support and ensure to include the Orion diagnostics.
Translation: "We don't know what the hell is going on, but we are sure it's going to be your fault for trusting our recommendations hardware specs."
Don't hold your breath. I have had this issue in the past and opened tickets. SolarWinds doesn't take this issue seriously because it's hard to fix. And they seem to have an inability to give a clear recommendation on how robust your polling engine needs to be.
1) the Orion diags will tell them nothing useful.
2) they will recommend polling less frequently.
3) they will blame the SQL server.
4) you will get frustrated and either purchase a beefier server for your polling engine or find a better netflow analyzer.
I actually did open a ticket and got a decent response. They gave me a document that helped troubleshoot many different things and helped guide you where to look.
Basically, you are right about the NetFlow piece. I had a table called "netflowsummary2" that had 846 million rows, and the table itself was 92 gig.
The Average Disk Queue Length on the SQL server (which the document told me to look at as step one) was about 6 (they indicate it should be below 2) -- I truncated this table and shut off netflow from 35 devices that were sending traffic. The disk queue length is now .002, and the machine is much MUCH faster. It no longer has gaps in data, and I've actually increased the length that I hold detailed information to 45 days.
I admit, I was a little skeptical when I got this response as well, but the support was good, and had several follow-ups asking how things were progressing, and asked if I needed any further help, etc.
--Ron
3liter,
I'm sorry you're frustrated. We've admittedly had a tough time with recommending NetFlow hardware. For most customers, it's easy, but for heavy users, SQL Server is actually a major bottleneck because of the rate of incoming data. Tuning SQL Server to optimize read/write rate often helps.
The fundamental problem is that NetFlow is a high-volume protocol. One longer-term option we've considering is allowing users to choose to keep data at a lower level of granularity (what that means is TBD) in order to impact the server less. Would that be an acceptable alternative to SQL tuning or upgrading hardware? It's just a different tradeoff.