Has anyone else experienced a memory leak on the main poller?
I am having to restart the service every few hours because it uses up all the memory on the main poller. It uses over 10GB.
Hello,
Even though support told me the memory usage was normal, the Orion Platform 2017.1-Hot Fix 2 resolved all my memory issues. But not before I spent over $5,000.00 upgrading memory that I now don't need.
Now I have a new issue after the hotfix. My HA nodes are failing over once or twice a day.
>>Frustrated<<
I have not seen a true memory leak, but I did disable hardware polling in my environment because it kept using the memory on my APEs. I have moved all polling to my APEs to allow the poller to run the application more efficiently.
What products / versions are you running? Is your main poller doing any polling? If it is, I would look how much it is polling.
Sorry, read through that too fast. I overlooked the Recommendation part. I ended up disabling recommendations in my environment because I was also having problems with it. I have not tried yet since upgrading to 6.4.
Daniel
I'll reach out to you to gather diagnostic data. I want to dig into this deeper with the engineering team to address your issue.
Case # 1143886 Created: Memory leak.
What was the result of your case?
We upgraded 1 week ago and so far this is our only real problem. The RecommendationService consistenly climbs to 9.5 GB fairly quickly after I restart it. I tried turning off Policies that had a lot of things in them, but I'm just considering turning off the entire feature if I can't figure this out...
I am seeing the same issue as well with high memory
The result of the case was that 8GB is the low side of the memory requirement for the recommendation engine in our environment. Our environment consists of about 120 ESXi hosts and around 1,200 servers. I find this memory utilization unacceptable. vROPS requires much less to do the same thing. I am about to put in another 32GB of RAM and we will see what happens.
I also upgraded to 12.1 and have a memory leak on SolarWinds Information Service V3. Total was running 8 - 9G but now runs up to the 15G total. I've had a ticket in to dev. but no resolution yet. I need to restart this service everyday or it will cause a problem to the entire system.
I'm facing the same issue after upgrade to 12.1. I have to restart the poller every few days to resolve this issue.
I don't have this issue before the upgrade.
Orion Platform 2017.1-HF2 *DOES NOT* fix the memory leak, or perhaps it created a new one.
I've got a ticket open with support, they said that some customers experience it and others do not. My information Service V3 will climb and climb, and I can watch the queues in RabbitMQ get longer and longer until the whole thing starts exhibiting various problems.
We have a scheduled task to restart InfoServices and the Module Engine every 15 mins to keep the thing afloat.
I was facing the same Information Service memory leak after the 12.1 upgrade. I was given a buddy drop that contained a new updated Information Service Service. After installing that I still experienced the issue and couldn't even install the new Hot Fix 2 because it would hang on the Configuration Wizard on Importing Sample Map. So today I uninstalled the new Information Service Service and reinstalled the original Information Service (Version 2017.1.00192) Then I installed the Hot Fix 2 and it successfully installed with no issues. My Information Service hasn't gone above 500MB of memory since then. Before I did all this the Service would climb in Memory until I had to restart it.
As for the Solarwinds.Recommendations.Service I was told that this service will climb in memory then drop down which I have confirmed it does in my case it will climb to almost 6gb then drop down to 110MB. I was told that's normal
I did the uninstall and reinstall on the Main Poller and all Additional Pollers
What do your output queues look like after your method of remedy?
Mine will run fast and well for about 10 mins, BusinessLayer and PubSub going great, then suddenly the BusinessLayer will drop to 20-30 messages/second, which is not nearly enough to keep up with the inbound rate. The whole thing clogs up and timeouts ensue.
Another piece of my particular version of this puzzle, SW Dev guys had us replace the .sdf files for the Collector and JobEngineV2 on all pollers. Since then, the website seems to be staying up, so that's progress.
I received a buddy drop yesterday that seems to have fixed my issues - it's a SAM hotfix.
There's still some smaller issues with it, sounds like the developers are polishing it right now..... meanwhile my Primary and APE's are mostly functional again.