4 Replies Latest reply on Aug 2, 2018 4:26 AM by sierra1011

    Windows Server 2016 black screen/hang while SW still useable - RPC blamed

    sierra1011

      Hi all, sorry for asking for help but I'm just trying to get as much info as I can currently. Orion 2017.2 (DB OS version limited, almost ready to upgrade)

       

      Story so far:

      So, we've got an APE on Windows Server 2016, and it's saying "unexpected shutdown" sometimes when I log into it. Logs tenuously point towards VMM restarting it. We move it off VMM and onto a dedicated plain ol' Hyper-V (2016) host. Now it says "heartbeat lost" on the host, and the server itself just hangs at a black screen when you try to log in, now and then. The only "fix" is to give it a hard reset from the host, and then it's all fine when it comes up. We investigated a few issues such as the TermServer service issue that got fixed in previous Windows patches, and none of those leads produce anything. Long story short, we opened a call with Microsoft and gave them a couple of full dumps and they have come back with...

       

      "Uninstall SWJobEngineWorker"

       

      So you see why I'm here.

      The claim is that RPCSS is breaking, and with it the server. Yet, polling will carry on when the server 'breaks', and we can get up to date info and edit nodes etc. RPC is hitting 20,000 threads or so. We have a MPE and another 2 APE with similar load, a spread of OS's and no issues (not like that at least )

       

      I'm putting in a support call with the Solarwinds team when I get in on Monday, but I wanted to ask the community in case anybody else has encountered this, I'd love to know where other people have gone to fix it.

       

      Thanks all!