20 Replies Latest reply on Aug 9, 2016 3:19 AM by fazl azeem

    SolarWinds Job Engine causing 100% cpu utilization

    jcooler

      I have NPM SL2000 9.5 sp3 with NCM 5.5.1 on my server. SQL is on a separate server. I dont have APM, NTA, VOIP or any of that.

      NPM has 872 elements. NCM has 499 nodes.

       

      I have been having a problem with my NPM/NCM server with the cpu staying at 100%. Up until a week ago, the cpu would usually stay at around 50-60% or below and everything was running fine. I dont know of any major changes to the system that might have caused the change in server performance.

      This is causing all sorts of problems with polling and web site performance and it is getting very frustrating.

      I have a case open (#113419), but tech support hasn't comeup with much so far. I was told that I needed to reinstall the Information service. I did that, rebooted and nothing changed. CPU still stays at 100%.

      The problem seems to be the SolarWinds Job engine. If I stop that service. The CPU drops to about 10 % and everything runs nice and smooth. The only thing that seems to be affected is my UnDP pollers (which I do use/need).

      As soon as I restart the JobEngine service, the cpu goes right back 100%.

      I also noticed that the SWJobEngineWorker.exe processes are using a large amount of memory and cpu.

      Right now, there are 2 SWJobEngineWorker.exe processes running. They are using about 820mb of memory and the cpu usage of the processes ranges from 25% up to 80% constantly.

       

      Anyone have any ideas what is causing this?

      I really need to get this resolved as quick as possible. My dally reports dont even run properly due to the excessive cpu load.

      Thanks,

      Jeremy

        • Re: SolarWinds Job Engine causing 100% cpu utilization
          Karlo.Zatylny

          Hi Jeremy,

          We have seen this issue at a few customers.  Do you have a lot of wireless and/or EnergyWise capable devices?  The Job Engine does the work of polling the wireless, energywise, VMware, and APM (which you don't have).

          Some questions for you that will help us debug your issue:

          Are you seeing a lot of SNMP traffic from your server while the CPU is spiked?  You can use TCPView to look at which processes are using which ports and then Wireshark to sniff the network and see which SNMP packets are going back and forth to those ports.

          Does the CPU tail off after some time?

          Do you have a case number with support that I can lookup?

          Thanks

            • Re: SolarWinds Job Engine causing 100% cpu utilization
              jcooler

              Thanks for the response Karlo.

              I do have a case open. Case # 113419.

              I'll go ahead and answer some of these questions now. I'll take a look at the SNMP traffic and get back to you on that one.

              I HAD 11 wireless controllers(about 35 thin APs) being monitored. I have removed those today to see if that might be the culprit. This didn't seem to have any effect on the cpu load.

              Not sure how many energywise capable devices I have,  but I am sure that we are not leveraging the NPM energywise feature on any devices. Does it still have an effect on the job engine even though we are not using the energywise feature?

              The CPU does not tail off after a while. It stays at 100% almost constantly, day after day. It is causing the website to crash, as well as numerous other issues.

               

              I'll install TCPView so I can find out what the snmp traffic looks like.

              Thanks for help thus far.

              Jeremy

            • Re: SolarWinds Job Engine causing 100% cpu utilization

              Hi Icooler,

              I am having the same issue for a few months now (Case 104794) and Solarwinds is asking me to do several actions. Unfortionatly none of them worked until now.......

              By the way we use:
              NPM 9.5 SP3
              APM 3.0 SP1
              Approx 110 wireless devices, majority are "Cisco 1200 Access Point's" and a few (19) "Cisco AIR AP 1240"

              This week i asked for a update and provide a new diagfile after installing SP3 but unfortionatly no responce until now.

              This issue is also reported in following posts:
              NPM 9.5 and Jobengine having multiple instances (My own post)

              Re: 9.5 SP2 install results (last few entries)

              • Re: SolarWinds Job Engine causing 100% cpu utilization
                kampmalm

                Hi jcooler

                I have had the same problem. And the soloution was that there was not enough memory.
                So I moved the SQL to another machine, and everything was ok.
                The memory utilization was about 80%, and obviously that is too much.

                Now we are down to about 60% and everything is working ok.
                So I would suggest that you add som  more memory to the machine.

                Regards
                Paul

                  • Re: SolarWinds Job Engine causing 100% cpu utilization
                    jcooler

                    It is possible that adding more memory would help. The server only has 2 gigs. However, my SQL database is already on another server.

                    I'm to speak to our VM guys and see if they can give me my NPM VM another gig or 2.

                     

                    I still think there's something else going on here though. Everything was running fine up until a week and half ago. And I haven't really made any changes or adding any large number of elements since then.