8 Replies Latest reply on Apr 17, 2008 11:41 AM by Nux

    Missing Data

      I have seen a couple of threads on this but can't find a resolution.

      Whenever I turn on Netflow collection and correlation I start getting large gaps of missing data for my devices in Orion.  It looks like it is limited to SNMP polls as the ICMP polls seem to still work.  Basically when Netflow is on we can see times of missing data ranging from 30 min to multiple hours.

      Has anyone else run in to this problem and has anyone figured out how to fix it?  I have contacted support and the told me to get a better DB server though we meet all of the posted specifications.
        • Re: Missing Data
          TinyElvis

           I'm seeing the same thing, and I have a very robust SQL server on the back end.  I'm not sure if it's limited yet to ICMP, but I'm combing through the forums here to see if others have had this issue and what the fix might be.

           it's happening to all my devices it seems, and it appears to be getting worse.  I'm wondering if I'm polling too many devices?

           

          Attached is a graph that shows what I'm seeing
           

            • Re: Missing Data

              I recommend opening a support case at http://www.solarwinds.com/support and ensure to include the Orion diagnostics. 

                • Re: Missing Data

                   Translation:  "We don't know what the hell is going on, but we are sure it's going to be your fault for trusting our recommendations hardware specs." 

                  Don't hold your breath.  I have had this issue in the past and opened tickets. SolarWinds doesn't take this issue seriously because it's hard to fix. And they seem to have an inability to give a clear recommendation on how robust your polling engine needs to be.


                  1) the Orion diags will tell them nothing useful.

                  2) they will recommend polling less frequently.

                  3) they will blame the SQL server.

                  4) you will get frustrated and either purchase a beefier server for your polling engine or find a better netflow analyzer.

                   


                    • Re: Missing Data
                      TinyElvis

                       I actually did open a ticket and got a decent response.  They gave me a document that helped troubleshoot many different things and helped guide you where to look.

                       Basically, you are right about the NetFlow piece.  I had  a table called "netflowsummary2" that had 846 million rows, and the table itself was 92 gig.

                      The Average Disk Queue Length on the SQL server (which the document told me to look at as step one) was about 6 (they indicate it should be below 2) -- I truncated this table and shut off netflow from 35 devices that were sending traffic.  The disk queue length is now .002, and the machine is much MUCH faster.  It no longer has gaps in data, and I've actually increased the length that I hold detailed information to 45 days.

                       I admit, I was a little skeptical when I got this response as well, but the support was good, and had several follow-ups asking how things were progressing, and asked if I needed any further help, etc.

                       --Ron

                      • Re: Missing Data
                        denny.lecompte

                         3liter,

                        I'm sorry you're frustrated.  We've admittedly had a tough time with recommending NetFlow hardware.  For most customers, it's easy, but for heavy users, SQL Server is actually a major bottleneck because of the rate of incoming data.  Tuning SQL Server to optimize read/write rate often helps.

                        The fundamental problem is that NetFlow is a high-volume protocol.  One longer-term option we've considering is allowing users to choose to keep data at a lower level of granularity (what that means is TBD) in order to impact the server less.  Would that be an acceptable alternative to SQL tuning or upgrading hardware?  It's just a different tradeoff.