12 Replies Latest reply on Jun 26, 2018 2:30 PM by jonathanswift

    Swis v3 Unstable.

    lcsw2013

      Hi,

       

      So I'm having an issue with our environment that I can't seem to locate the underlying source of. Previous to installing version 12 we used version 11.5.3 of NPM. The issue was not present in any previous versions and just became apparent in version 12. Swis v3 is using close to 2 gigs of ram and cpu has gotten as high as 70% or more sustained with this component. This coupled with the iis worker process consuming the rest of the cpu and more ram. When this happens it'll either throw "Request timed out" errors or simply cause the site to hang and spin without response.

       

      I've been having a hard time chasing down what causes this to happen. I thought it could of been something custom but if that was the case than it would of shown in previous versions as well. So I'm having a hard time believing it's anything custom. But rather a leak somewhere making swis v3 unstable.

       

      I've installed all patches and hot-fixes. Set all exceptions on our McAfee and set it not to monitor SolarWinds files and directories. I disabled Windows firewalls. Opened all required ports bi-directional and basically followed written recommendations by solarwinds. Followed all suggestions from support and still no stability and a mystery what's causing this.

       

      Has anyone seen this before? Any suggestions? Opinions? comments? It's all welcomed at this point in my investigation.

       

      Thanks!

        • Re: Swis v3 Unstable.
          cjfranca

          within the monitoring of services, which is more consumption ???

          • Re: Swis v3 Unstable.
            jmodjeski

            I am not sure I have the exact answer, I have observed the issue described, I discovered that I have Agents that were not getting updated, when I deleted those agents from the system the problem disappeared. Merely cause and effect, as when I added the agents back it reinstalled the updated agent, and I could not reproduce the problem.

            • Re: Swis v3 Unstable.
              lcsw2013

              I'm thinking this may be potentially just in my environment. For example, I removed many old maps that where half completed or not working at all from the atlas. Haven't seen swis v3 consume as much resources anymore. I also started removing some old dashboards from the site and this has further reduced consumption of resources. Still confused as to what the issue really is but starting to lean towards it being components on the site that no longer work correctly.

               

              I'll update if I'm able to figure this out.  Otherwise, I'll keep this opened in case anyone else has suggestions or has seen this in their environment.

               

              Question... We have McAfee epo installed and It has all exceptions in place. I was wondering if anyone else has had bad experiences with McAfee and SolarWinds?

              • Re: Swis v3 Unstable.
                ecklerwr1

                I'm not sure but I'm interested in hearing what you find out... unlike many here I wait for while before upgrading.  I'm still on versions right before 12.x and associated module upgrades.

                • Re: Swis v3 Unstable.
                  lcsw2013

                  I'm not exactly sure what this could be. Right now as we speak swis v3 is taking up 70% cpu and 1.5gigs or ram. Logs show a query running for what appears to be groups as it's talking about containers. Then it also has some IPAM errors which says IPAM is trying to pull from UDT. But we don't even have UDT in our environment. Really really weird.

                   

                  Does anyone know if IPAM pulls from UDT and what type of data would it be looking for? Because if it does I want to break that tie. Like I mentioned I don't have UDT installed. We tested it long long ago on version 10.7 of NPM but we never moved forward on it.

                  • Re: Swis v3 Unstable.
                    lcsw2013

                    Interesting thing. Upon restart of the swis v3 component for about a week it works perfectly. Then slowly it gets bogged down till the site spins with no response. I've checked my site and found alot of clutter that needed to be cleaned out. I cleaned out the clutter, optimized the rest and made sure that there wasn't any further problems. The site is perfect by all accounts at this moment. But the problem still happens. So I find is really hard to believe that it's anything custom. I'm not convinced of that. So basically at this point I've proven that SolarWinds Support saying it's custom is wrong.

                     

                    It's not port exhaustion either as I've placed the registry setting in place and also added on a keep not to keep connections opened for longer than a certain amount of time. (All of this found on Microsoft and SolarWinds KB's).

                     

                    Environment is all VM's all in the same datacenter and latency and speed is not an issue. I've even had my Windows team thoroughly analyze and they couldn't find any bottlenecks. VM's and database all deployed using Solarwinds recommendations.

                     

                    The interesting part about it all is that it seems to be only my environment as I don't see any other posts showing the same. I'm stuck, this is a mystery that's getting difficult to resolve. I've even followed advise given on this thread and it didn't seem to help.

                    • Re: Swis v3 Unstable.
                      lcsw2013

                      I think I have figured it out. I noticed that every time swis v3 became unstable there was what appear to be a dynamic query built way before I took over this environment. This query was built to get the devices for a specific group.  When the query ran it ran multiple times and the execution time got longer and longer which explains the request timed out errors on the website. It was killing the database with multiple large requests being sent over and over again. When I delayed the refresh time to refresh the group suddenly the swis v3 became more stable and quite.

                       

                      So dynamic queries for groups is something that solarwinds does not appear to like that well at least from my environment standpoint.

                       

                      Has anyone had issues with dynamic queries  bogging down the swis v3?

                      • Re: Swis v3 Unstable.
                        lcsw2013

                        I found the issue. we have 97 groups set. Some dynamic. And they are all refreshing together. This is killing swis v3 with trying to query the database. This is what has been killing the site. Working on the groups refresh has resolved the issue

                          • Re: Swis v3 Unstable.
                            Jason.Henson

                            Gents,

                             

                            It sounds like you were able to get this squared away.  I'll try to contribute with this bit of info too.  Loop1 had a client who had a requirement of the Orion product to use Groups and Dependencies shortly after the feature was released.  We went round and round with a developer and support doing our best to try and find a way to scale to the need of the client what the software was able to perform at. There is a strategy to making groups scale beyond the ~100 groups you currently have configured.  By being more specific with our query language, we were able to scale the dynamic groups to ~300 dynamic groups.  That 300 count is 3x where you are now and if you approach your dynamic groups by being as specific as possible, you should be able to continue growing your dynamic group count.

                             

                            I hope this helps.

                             

                             

                            Thanks,

                            Jason Henson

                            Loop1 Systems

                            1 of 1 people found this helpful
                            • Re: Swis v3 Unstable.
                              jonathanswift

                              I don't think I have any Dynamic groups, but I can't be sure.  I saw some before and removed them and added everything manually.

                              But I can't be sure there aren't some there.  I've looked through all the attributes in Reporting but can't see anything which would identify a group being dynamic.

                               

                              If you had dynamic groups how would you find them ?   We have 100 applications all with their own groups, and all the usual infrastructure, esx, firewalls, and other groups of equipment you would expect in typical orgnaisations.   I don't think I can spend half a day clicking through every one of them.