4 Replies Latest reply on Sep 4, 2017 8:45 AM by omar88

    NPM - CPU Maxed Constantly - False Latency Alerts

    techbender

      I've opened a case with support (Case #940956), but while I'm waiting on a response about the diagnostic file I've uploaded I want to ask here. Running version 11.5.2 of NPM on a virtual server. Database is on a separate dedicated SQL server. We have 8 products total and are using 8 CPU's running at 2.00 GHz with  24 GB of RAM.

       

      On Feb 12th we began getting inundated with latency alerts from various nodes. Investigating showed that they were not experiencing latency. We looked into Orion to determine the source and found that the CPU was completely pegged. Stopping all SolarWinds services causes it to return to almost no utilization, starting them will cause it to jump right back to 100%. Contacted support and they had us reinstall CoreInstaller, JobEngine, Job Engine.v2, InfomationService, and CollectorInstaller. Doing this fixes the problem for a couple days but then it seems to come back.

       

      Trouble shooting we've tried indicates that starting up the SolarWinds.JobEngineWorker.v2 is the specific service that causes the load. We tried increasing the number of CPU's available from 8 to 16 and this worked for a couple days but now its back to the same issue even with double the recommended number.

       

      http://www.solarwinds.com/netperfmon/solarwinds/wwhelp/wwhimpl/common/html/wwhelp.htm#context=SolarWinds&file=orionaghow…

      This was helpful for understanding how NPM works

       

      Our problem seems similar to these other issues:

      Re: 11.5.2 Upgrade Issues (SQL Connections\High CPU) 
      Re: High CPU usage SWjobengineworker2.exe

       

      This is the relevant info about how much stuff we're polling.

      Polling Completion99.88
      Elements5325
      Network Node Elements1086
      Volume Elements2148
      Interface Elements2091
      SAM Application Polling Rate5% of its maximum rate.» Learn more
      Routing Polling Rate1% of its maximum rate.» Learn more
      UnDP Polling Rate0% of its maximum rate.» Learn more
      Polling Rate30% of its maximum rate.» Learn more
      VIM.VMware.Polling10
      IPAM.Dhcp.Polling0
      SAM Windows Scheduled Tasks Polling Rate2% of its maximum rate.» Learn more
      Hardware Health Polling Rate15% of its maximum rate.» Learn more
      Fibre Channel Polling Rate0% of its maximum rate.» Learn more
      Wireless Polling Rate0% of its maximum rate.» Learn more
      Wireless Heat Map Polling Rate0% of its maximum rate.» Learn more
      Total Job Weight1890
      Number of HW Health Monitors478
      Number of HW Health Sensors10757

       

      This is a capture of all the Job Engine workers. Is it normal to have this many Job Engine Workers?