This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Solarwinds Needs Reboot Every 3 days

Since upgrading to NPM12.3 a couple of weeks ago our SolarWinds system now seems to hang every couple of days and we need to reboot the server in order resolve the issue.

When trying to access the Server via the webpage we get the error below. I installed the latest hotfix bundle earlier in the week but the problem persists.

I'm seeing on another thread that other people are also experiencing this, but it's not clear whether a fix was found?

Capture.PNG

  • Try this change out

    Alright, we’ve gotten enough feedback to officially declare the fix for this issue is to change the TransferMode property in the Information Service settings from Streamed to Buffered.  To recap, you can do this in the centralized settings section of your website:

    1) Go to http://YOURHOSTNAME/orion/Admin/AdvancedConfiguration/Global.aspx
    2) Find "SolarWinds.Orion.InformationServiceClient" and change TransferMode to "Buffered"
    3) To apply changes please restart Orion services on all SolarWinds Servers.


    You are not losing anything by making this change – Streamed was introduced in 12.3/2018.2 to help manage SWIS Memory consumption.  Moving to buffered is basically how SWIS has worked in every other version.

     
    We will soon be introducing an official fix that will make this change. We’ll also be revisiting Streamed mode on our end to see if we can make it work without causing port exhaustion, but that will be at some point down the line.

    Solution from

    Solarwinds is now horribly unstable.

  • Thanks Steven - really helpful, I’ll give this a try and see if it improves stability. I should have a good idea within 4 days.

  • Hi,

         I would run the SolarWinds Configuration Wizard again from the server where SolarWinds is installed. Whenever I have had an issue in the past, this usually fixes any issues. Just accept all the default values (existing settings). It will go through and update all the settings with the correct values if any are not correct and also optimize the web pages as well (pre-compile them).

    An extra step that you can do if you don't mind the additional downtime is to verify that your Windows server is fully up-to-date and reboot the server before invoking the Configuration Wizard. I did have an issue once where an update was not correctly applied to to the server for a .NET or similar update and it was causing issues for the SolarWinds server.

  • I’d actually tried all of that, without any success, before I posted the original message.

    So far the workaround that Steven mentioned above seems to have resolved the reliability problem. The server has been reliable for just over 3 days now which is the longest period of time it’s lasted since being upgraded. Watch this space.....

  • I have been battling this issue for over 6 weeks (very early adoption of 12.3 emoticons_sad.png, won't make that mistake again) and was experience more severe symptoms. Actually had my app server on a schedule to reboot every 6 hours, and even then it was barely functional. I applied this fix today  and should know by morning if it resolved the issue. Stay tuned....

  • Ditto.....I think I'll be leaving the next upgrade for a few weeks and check the feedback....To be fair I've usually upgraded within a week or so of each release and not had any problems, but this issue has caused us some real operational problems...

  • It is difficult to schedule changes in our environment so we like to wait until there is at least one hotfix released and then ensure that at least three weeks go by between any additional hotfixes to ensure things are as stable as possible before upgrading.

    A great resource to watch for this is:

    Orion hotfix release notes - SolarWinds Worldwide, LLC. Help and Support

  • Sound advice. Thanks for the link as well.

  • I wonder why this is an issue for some and not for others - I checked my settings and found that "SolarWinds.Orion.InformationServiceClient" was set to streamed. It could be that the issue only happens when you have a certain number of objects and I am below that threshold with 107 nodes and 86 interfaces.

    I am also using Windows Server 2016 Datacenter as the NPM server and updated after SolarWinds had released 1 or 2 hot fixes.

    I am also only using NPM and SAM.

    There are two many variables to pinpoint the cause by us, the end user - that is up to Orion to do. Besides, it appears that that setting has fixed the issue for you.

    Good luck!

  • I think you could be correct with that....I think the more  nodes/interfaces you monitor the quicker the system will hang. We're monitoring around 3,000 interfaces across circa 700 nodes and the system needed a restart every 70 hours... Since I changed the  'SolarWinds.Orion.InformationServiceClient" to Buffered the system has been stable for around 98 hours and counting...

    I'm fairly confident the fix recommended by Steve has resolved the issue for us.