This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

High CPU - IIS worker process

Does anyone have an issue with their SolarWinds application pool using all remaining CPU on their NPM (12.3) server?

We have a VM on ESXi 6.0 U2 running Server 2016, 500 license load, 5 vCPU, and 16gb of ram,  and cannot stop the IIS worker process from running away with the CPU.

When the IIS worker process ISN'T out of control, the machine is at about 10-20% CPU load, with occasional spikes up to 30-40%.

Myself and support have rebuilt the site several times, but there appears to be no fix.

The pool is set to reset every 29 hours. There are no obvious errors in the logs that either myself or support can find.

Sometimes after reset, it spikes right back up and requires yet another reset.

Suggestions welcome.

attachments.zip
  • Hey,

    I just got off the phone with SW Support and the fix we found was ending the IIS Process. After stopping it, it came back up on its own and started running normally again.

  • Yeah, that's nice, but get ready to do that at least once a week. I literally just restarted our server again. IIS was using all resources available.

    I've had several support calls with them and we've tried just about everything to resolve the issue, but no luck. 

    It'll come back, don't worry.

    It appears to be random too, so get ready to lose monitoring at random times.

    -Elliott

  • We are getting the same issues. We notice that the memory usage of the solarWinds Orion application pool is climbing from about 2am until it hits 3GB of Private bytes then the CPU usage will soar until the worker process is killed and restarted. Once the CPU usage begins to spike we lose the website access.

    Funny thing is no users are using SolarWinds until 6am.

    iis issues.jpg

  • Wow, I can't believe I'm not alone. Even the timing, our issue also happens around 2 am. Likewise, had several support calls and the best answer I got was "the next release/hotfix will address it". Which proved not to be the case. I'd really appreciate if someone got any kind of resolution for this and posted it here.

  • Need more CPUs for god of Solarwinds!

  • Haha, that's fundamentally wrong design to fix performance/error by throwing more resources at it. I appreciate the suggestion but I think the issue is with the application. We have 16 cores and it runs at 30% CPU on average but then on those occasions CPU spikes to 100% and crashes it. So, definitely not a resource problem.

  • This is certainly odd behavior and my apologies ahead of time if these questions have already been asked and answered previously with support.

    1. Do you have any reports scheduled to run off-hours? If so, does the behavior improve when/if you disable those reports from running?
    2. Is antivirus installed and running on the machine? If so, have you set all the appropriate exclusions? There are known cyclical issues that can occur, similar to the behavior you describe above when antivirus is locking files or delaying read/write operations.
    3. Have you reviewed the IIS logs for the Orion website during the time when the AppPool begins to consume all CPU? The IIS logs will tell you what the most is being accessed and by whom (IP).
  • Thanks for the suggestions aLTeReGo.

    1. Only daily reports are ran at 8am, nothing before this. There are some weekly reports that again run at 8-9am but we are seeing issues on a almost daily basis.

    2. Antivirus exceptions have been put in place a long time ago to try and make the server a bit more responsive. Ended up also giving it more resources (8x CPU, 32GB RAM). Antivirus scan runs daily at 3:30am but we can see the IIS memory usage start climbing before the scan even starts.

    3. I've checked through the IIS logs and there is nothing standing out as an issue at 2:15am or at the point the worker process fails and restarts. Ive also looked through event viewer and cannot see anything here. If anyone knows which solarwinds logs might have some relevant information i can check these too.

    Any other ideas are greatly appreciated.

  • How many Orion Modules are you using?

    Check this Best Practice below for any Orion Deployment:

    -Orion NPM alone requires the ff:

    a. 4GB of RAM and 4 Cores of CPU

    b. Add 2GB of RAM and 2 Cores of CPU for Every Orion Module added

    If the problem persists, gather Orion Diagnostics and send it to Support or send me a copy. I will review it and give you my recommendation. Also, if you have more than 30 Users accessing Orion web at a give time, then you might need to get an Additional Website to help out the Primary Poller's website.