cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post
Level 9

SolarWinds Agent using 2-3+ GB RAM

We're noticing that the SolarWinds agent is consuming a significant amount of RAM on our servers.

In this instance, it's using over 3GB.

pastedImage_0.png

Restarting the agent does temporarily alleviate the issue. I'm thinking of running a scheduled task on all my servers to restart the agent daily, but that seems to be masking the problem.

Does anyone have any ideas on how to resolve this?

72 Replies
Level 9

I applied 2018.4 this morning and it doesn't look any better.  The leak seems slower but it's still present.

Oddly enough, it seems to only be really affecting my servers in AWS that are hosted in Europe.

0 Kudos
Level 9

Following up on what I've tried on my side.

I suspect this had something to do with the Solarwinds Cortex service as it also had a massive memory footprint.  I traced this back to disk queuing issues on the Solarwinds SQL service and it appears this had cascaded its way back to the individual agents if that's even possible.

I'm still monitoring but over the last 24 hours or so, the memory usage on the remaining agents has stabilized significantly since I addressed the disk queuing issues on the SQL side.  Memory consumption was somewhere in the neighborhood of 100MB a day, so I should know by tomorrow if this was the issue.

YMMV but in my case, this looks to be the result of me under provisioning the SQL server like a rookie

Product Manager
Product Manager

In the course of identifying and resolving the memory leak that was resolved in the Buddy Drop, a new one was found. It's a slower/smaller leak, which is how it was hidden/masked by the first. We are working to address the cause of that leak now. If you have a support case open on this issue, you will be the first to be notified when a fix is available.

Level 8

Thank you for your help on this!

Level 9

I'm not convinced the hot fix they gave us actually resolved the issue, so I'm betting no. 

0 Kudos
Level 9

Update,  I have an open case with SW on this. It's currently being escalated to the developers.

I'm running NPM 12.3 Hotfix 6, and we're still dealing with the issue.

0 Kudos
Level 8

I have also been having the same issue. We have had about 100 servers all show the solarwinds.servicehost.process.exe running at over 3GB of usage. I also have a support ticket open. I might escalate this up as it is causing a lot of headaches

0 Kudos
Product Manager
Product Manager

We have a buddy drop is available for this issue. cpeheotel​ and bourlis​ I have notified your support engineer and provided them access to the fix. They should be notifying you shortly with a link to download the file. Please update this thread if it does address your issue.

0 Kudos
Level 12

Alright, we're on pins & needles here.

We'll install the fix as quickly as we can get it passed though our change management process.

0 Kudos
Level 9

I just applied the fix in our environment, still waiting for it to get pushed out to the servers.

*fingers crossed*

0 Kudos
Level 8

anyone have any success? We applied the buddy drop this morning and now are monitoring. What is the typical memory usage for that process. I am guessing its not supposed to be over 1GB

0 Kudos
Level 12

We applied the buddy fix late Friday as an emergency change. We definitely saw a drop in CPU and Memory usage, but it's not looking good so far.

From the screen shot below you can clearly see when the buddy patch was installed and the memory usage dropped.  Then it steady climbed over the weekend and then suddenly jumped this morning. 

Capture.JPG

Now to be 100% honest, this is the the same across the board for all Solarwinds.ServiceHost.Process.exe process monitors. Some dropped and stayed down, others dropped and then steadily climbed back up and yet others dropped rose then dropped again and then rose again. 

I would like to let this site for the entire week before I can definitely say if this resolved our issue or not, but early results aren't looking good.

0 Kudos
Level 9

I applied and and it's not looking good.

0 Kudos
Level 9

We had to write a PowerShell script to restart the agent after it got over 300MB. They say a patch is coming, but we needed to do something in the meantime.

0 Kudos
Level 13

Can you share the script with the community? It would be great to wrap this up in a SAM template.

I am glad they have found the issue. We would love to see logic embedded into the agent that watches for this type of consumption and self-heals with a restart if necessary. Whatever measures are necessary to make sure the agent is as dependable as possible. Nothing looks worse then your monitoring agent causing issues on a prod server. We all know it can happen regardless of vendor when changes are made, we just need to be diligent in reducing the possibility of impact as much as possible.

0 Kudos
Level 9

Here's a down and dirty script. I don't have it in a SAM template, just scheduled on the servers.

$process = Get-Process SolarWinds.ServiceHost.Process

$threshold = '314572800'

$mem = $process.ws

if ($mem -gt $threshold){

Restart-Service -Name SolarWindsAgent64 -Force

}

Level 12

We're having the same issue as well.  We opened a case with support [Case # 00155294], the only solution we received was to change the fetching method from WMI to RPC.

We made the changes and that did not resolve the issue.

I'm about to open our second case for this issue, and refer to this Thwack thread and offer a memory dump.

************************************************UP DATE************************************************

According to support, they are aware of this and it's being addresses in Orion 2018.2 Hotfix 6.

************************************************UP DATE************************************************

0 Kudos
Level 7

We are running HF 6 and experienced this today where solarwinds agent attempted to strangle 50 production servers. Opened a case: #00213488

0 Kudos
Product Manager
Product Manager

cpeheotel, please also open a support case if you are seeing this same issue. It's usually helps when multiple customers report the same issue as it helps us to identify common threads. Obviously this isn't happening for all customers, nor are we able to reproduce it internally. So we need to find out what's in common across those customers that are experiencing this issue.

0 Kudos
Level 8

We are encountering the same issue, I've filed support case# 00221887.

0 Kudos
Level 7

We opened support case: #00213488. The only somewhat meaningful thing I can think of is that the SCCM agent also queries WMI periodically.

0 Kudos