10 Replies Latest reply on Jun 1, 2016 11:44 AM by byrona

    nDepth Searches very slow and time out

    byrona

      I am having consistent problems with slow nDepth searches that often timeout on one of my LEM appliances.  I have called SW Support and thus far they haven't been able to find anything wrong with my system.  The system handles about 7 million events per day which I have been told isn't a lot in comparison with what it's designed for.  My search is looking for any successful logins for a group of IP addresses (known bad IP addresses).  If I try to search for anything more than a day it fails which is a problem as I often need to search over a week or several weeks which doesn't seem like an unreasonable use case.

       

      I am curious if anybody else has experienced this type of problem and if so what, if anything, was done to resolve it.

       

      Thanks in advance for any feedback!

        • Re: nDepth Searches very slow and time out
          nicole pauls

          What does your search look like? How far back, how large is the result set when it finally completes?

           

          Depending on how you form the search query, it changes how hard LEM has to search for it. For example, the Refine Fields you see are faster than a bare text search.

            • Re: nDepth Searches very slow and time out
              byrona

              Well, it's not completing so I can't say how large the result set is.  I know the appliance takes in about 7 million events per day and my last failed search was for a 48 hour period of time so it would be searching roughly 14 million events.

               

              In the specific case that came up today I am searching events from mid-end of December of last year.  I am specifically looking for user logon events where source machine is equal to Bad IP's.  Bad IP's is a user defined group that contains a few dozen IP's.

               

              I guess the point of my frustration is that this doesn't seem like an unreasonable use case and the specifics shouldn't matter too much.  I have logs and I need to search them, that shouldn't be as much of a problem as it seems to be.  I would be more understanding of the specifics if I was trying to search a 6 month period of time from 3 years ago but that isn't the case, what i am doing is pretty specific for a pretty short (shorter than I would like) period of time.

               

              I am looking into the possibility of a storage IOP's/Latency issue causing the problem, I haven't yet confirmed anything on that front.

                • Re: nDepth Searches very slow and time out
                  nicole pauls

                  Fundamentally, I agree - the data is there, and you should be able to reach it. The reality is with search that the more complex your search, the larger the results, and the farther back you want to go, it is going to take longer. :/

                   

                  Theoretically, searching anything you see in the refine fields should be fast to pull back numbers, regardless of the time frame. There's also a relatively short period of time that's resident in "warm" storage that will be faster to pull back the actual result details, so searching this week (today/yesterday) will definitely be faster than last month. (There are some dials they can turn on the back end to tweak that, but the cost is always disk space; the warm storage isn't compressed so it's pretty fat.)

                   

                  I think there's some voodoo in how searches get optimized, too, that I used to understand really well but as LEM evolved and the storage architecture changed, have started to get much more nuanced. In your case, I wonder if something like "IP Address = <list of IPs>" AND AlertName = UserLogon might be faster than UserLogon.SourceMachine = <list of IPs> - at least to fetch the initial result set, though I think both will suffer from the time to pull back the details. It would also be nice if LEM could do partial results better, so even if the search failed you could get to the results it did return.

                   

                  Do you see the histogram at the top get drawn with all the necessary data before your search times out?

                   

                  Have you increased your timeout beyond the default 5 minutes as well? (Yes, 5 minutes is still a long time.)

                    • Re: nDepth Searches very slow and time out
                      byrona

                      Yeah, I actually have my timeout set to 30 minutes and the histogram never gets to the point of drawing anything out.

                       

                      More and more I think my issue may be related to Storage IOP's and Latency issues.  What I wouldn't give to move this appliance to some SSD storage.

                       

                      With that being said, I certainly hope there are some improvements on the horizon for searches in LEM.

                • Re: nDepth Searches very slow and time out
                  elro73

                  I'm having the same issue.  The n-Depth search is good if you're looking at quite current data (the last hour or two is very quick), but when I go back or try to take a weekly view it grinds to a halt, even with a 30 minute time-out. 

                   

                  It would be very nice if you got the results that it did find, but the timeout just resets everything.

                    • Re: nDepth Searches very slow and time out
                      byrona

                      The performance here is really a concern.  When you consider my deployment isn't even considered that large based on what LEM is supposedly capable of and also by what the SolarWinds support folks have told me.  However, if I am having problems with searches with my relatively small deployment what should I expect with an even larger deployment?

                       

                      I get the feeling that there is some significant back-end work that may be needed to improve the searching capabilities.

                       

                      With that being said, i am still looking into options to improve the disk I/O and latency to see if that makes a difference.

                      1 of 1 people found this helpful
                        • Re: nDepth Searches very slow and time out
                          silverbacksays

                          How many IP addresses are in the user defined group? Have you tried running the same search, but for just one of the listed IP addresses? If so, how is the performance? Does it actually complete and show the results?

                            • Re: nDepth Searches very slow and time out
                              byrona

                              After spending a LOT of time working with support on this case it came down to a few things as follows:

                               

                              1. Expectations
                                1. You need to make sure to have reasonable expectations on how long searches will take.  If there are a lot of different compressed files it needs to open to complete the search then it may take a really long time and even timeout if it's too much.
                              2. Storage Speed
                                1. The faster your storage is the faster these things can process, this assumes you aren't contained by memory or CPU

                               

                              While this all makes sense to me, I would also agree with what dbegin indicates below that this should be considered as a weak point in the product and something in need of considerable improvements.

                            • Re: nDepth Searches very slow and time out
                              dbegin

                              I'm in the same situation. Our LEM brings in about 5-7 million logs per day. We are running the virtual appliance and have doubled the recommended specs (4 CPU's and 16GB RAM), and it STILL takes about 30 min to lookup a small login report from about a week back!!! Other SIEMs can do this within seconds. I'm not sure where the bottleneck is but it has been a major pain point with this product for a couple years now.