Last week I did an upgrade to all of the latest versions of solarwinds and now every two days it just stops working to a point where I have to reboot the server.
Alerts still work, but you cant view anything on the main front page. I have done a config on the web front end several times and it still doesn't work. I have included an any/any exception so that nothing will be blocked, though it never had this problem prior to the upgrade.
Does anyone else have this issue? Or something similar?
Solved! Go to Solution.
Alright, we’ve gotten enough feedback to officially declare the fix for this issue is to change the TransferMode property in the Information Service settings from Streamed to Buffered. To recap, you can do this in the centralized settings section of your website:
1) Go to http://YOURHOSTNAME/orion/Admin/AdvancedConfiguration/Global.aspx
2) Find "SolarWinds.Orion.InformationServiceClient" and change TransferMode to "Buffered"
3) To apply changes please restart Orion services on all SolarWinds Servers.
You are not losing anything by making this change – Streamed was introduced in 12.3/2018.2 to help manage SWIS Memory consumption. Moving to buffered is basically how SWIS has worked in every other version.
We will soon be introducing an official fix that will make this change. We’ll also be revisiting Streamed mode on our end to see if we can make it work without causing port exhaustion, but that will be at some point down the line.
I'm having issues with the latest version as well, it started with Hardware Health throwing a strop, but then Friday night (just after we had left, of course) everything just died a death.
A reboot brings it back up, but then the Collector service chews up all the CPU after a while and you have to reboot again.
I also have a case open, waiting to hear back.
My server was fully patched before upgrading.
Are your MS patches up to date? We, and a few others via the Help Desk have had issues:
To make it clear as Microsoft always make it a little difficult.
You need https://support.microsoft.com/en-ie/help/4095875/description-of-the-security-and-quality-rollup-for-net-framework-3-5-f
Which means, your 2012 r2 servers require:
KB4103473 - https://support.microsoft.com/en-ie/help/4103473
Download here, please choose Windows Server 2012 R2 one!
Please let me know if this helps or if you require further assistance.
A Help Desk Engineer
After patch applied, no issues.
I started having the same issue after the latest set of upgrades...without fail every two days the server crashes and i have to shut down services and restart and everything is fine for another two days....i also have a support ticket in and have sent diagnostics and log files in, have had the tech support look things over twice and have not had the issue resolved yet...hopefully this can be resolved soon.
I've had a similar issue, and I've been working closely with an AE on diagnosing it. I haven't heard back in a while, usually that means they've got the data they need and something's cooking.
My Orion Module Engine was crashing and restarting so often that it caused multiple instances of the windows logging process and ran my CPUs up to 99%. This caused page load errors, database timeouts, and even kept me from being able to RDP into the server. restarting all the services or rebooting the server provided only temporary relief as the problem always returned.
Last weekend, I went ballistic on the server and performed the following:
1. shutdown all services.
2. copied c:\ProgramData\Solarwinds\JobEnginev2 folder to a backup location.
3. Went to control panel and uninstalled Solarwinds Job engine v2.13.1337
4. Deleted the Job engine v2 folder located at c:\ProgramData\Solarwinds\JobEnginev2
5. Went to control panel and ran a repair on Visual C++ 2013 redistributable which forced a reboot.(probably not needed)
6. After reboot re-installed Job Engine v2 by running C:\ProgramData\Solarwinds\Installers\JobEnginev2.msi
7. ran the solarwinds configuration wizard (selected services)
After performing the above my server has been stable now with no Orion Module Engine service crashes or any crashes for that matter.
However if the problem returns, i will be opening a case.
I have not applied either of those items, as I am just finding out about them today; I will keep them in mind if any problems return and make sure they are applied before opening a case. However my Information Service seems to be fine, at least for now. It was the Orion Module Engine that was causing my issues, it would crash and cause NetFlow and Collector services to crash as well.
My server has been fine since i re-installed the job engine and I will probably wait for the dust to settle before tinkering with it any further, unless of course issues return.
this looks like the .net Framework Cache on the Server that is hosting the website is broken.
Please try the following:
1. stop IIS
2. Stop all Solarwinds Services on that device
3. delete the following folders:
- "C:\Windows\Microsoft.NET\Framework\v4.0.30319\Temporary ASP.NET Files\root" delete the folder (in my case named "53bc9024") in there
- "C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Temporary ASP.NET Files\root" delete the folder (in my case named "e22c2559") in there
4. Start IIS and Solarwinds services
Please report back if that fixes the issue. I created a sheduled task that does what I described and it does taht every day at 1 AM, because it frequently happens in my environment too.
Hope this helps!
The .net Framework cache does not really get "full", it just gets corrupted or something.
It happens to me very irregular, which is why I created the scheduled task to run daily at a time when nobody needs the Orion webinterface.
Better to be safe than sorry!
I have two questions for you though:
1. Where do you get this error exactly? I have never seen an error message like that coming from solarwinds.
2. Did you try the suggested troubleshooting steps? Did you configure the antivirus exceptions?
When you restart all the services on a schedule, how do you keep sensitive alerts from firing when it comes back up?
My network colleagues are tired of being the first to detect Orion website "unexpected errors" when they try to do their before-the-users-arrive tasks.
But my Storage and DBA colleagues will not tolerate bogus PagerDuty alerts when their sensitive monitors trigger as the services come back up, and the agents reconnect.
Every time Solarwinds plays up, I stop all services. Start them, then try the site. Nothing.
Then after running a config wizard for database/services/website (in either a combination or all together) that window then pops up.
I reboot the device, leave it some time and then it's all ok.
SolarWinds solutions are rooted in our deep connection to our user base in the THWACK® online community. More than 150,000 members are here to solve problems, share technology and best practices, and directly contribute to our product development process.