This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

July .NET Patches and SolarWinds/SolarWinds Agents

After installing, and then uninstalling the July Microsoft patches around .NET Framework, we have been dealing with some serious instability in our environment.  If you aren't familiar with the patches, they're documented here:

Advisory on July 2018 .NET Framework Updates · Issue #74 · dotnet/announcements · GitHub

Microsoft released these, we installed, they pulled them and then released another one to fix the issues that were found, but then said they did not think that it fixed everything on the 2008 R2 servers (we have two in our environment - one being the core Orion server, along with 9 2012 R2 servers).  We have since uninstalled all of the patches from our environment, but still experience the issues.

The issues we are seeing is that the businesslayerhost process is crashing very often on our pollers, and we have a ton of apps (mostly the ones that monitor on our agent-based machines) going into an unknown state continuously throughout the day - about 1,000 out of the 8,000 total.  The event log errors we are seeing are at the bottom of this email.  My question is are you guys aware of these patches causing instability with SolarWinds?  What about on the agent side?  I know the agent relies on .NET framework, as it installs it during the installation process if it isn't already there.  With the way that we are seeing the issues on our pollers, it almost makes me think that we are having issues communicating with the agents, thus causing the unknown app numbers to bounce around all day as the pollers are having trouble getting the data in time.  I believe all of our agent-managed machines still have these patches, even though they are all 2012 R2 and up.

For reference, here is the version(s) we are at:

pastedImage_2.png

Errors:

Application: SolarWinds.BusinessLayerHost.exe

Framework Version: v4.0.30319

Description: The process was terminated due to an unhandled exception.

Exception Info: System.InvalidOperationException

   at SolarWinds.BusinessLayerHost.BusinessLayerHostService+<>c__DisplayClass25_0.<CheckPlugins>b__0(System.Object)

   at System.Threading.QueueUserWorkItemCallback.WaitCallback_Context(System.Object)

   at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)

   at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)

   at System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()

   at System.Threading.ThreadPoolWorkQueue.Dispatch()

   at System.Threading._ThreadPoolWaitCallback.PerformWaitCallback()

Faulting application name: SolarWinds.BusinessLayerHost.exe, version: 2017.1.5300.1698, time stamp: 0x58ac4615

Faulting module name: KERNELBASE.dll, version: 6.3.9600.18938, time stamp: 0x5a7dd8a7

Exception code: 0xe0434352

Fault offset: 0x00015ef8

Faulting process id: 0x1d0c

Faulting application start time: 0x01d42e91b9959d28

Faulting application path: C:\Program Files (x86)\SolarWinds\Orion\SolarWinds.BusinessLayerHost.exe

Faulting module path: C:\WINDOWS\SYSTEM32\KERNELBASE.dll

Report Id: 00ec59ef-9a86-11e8-80fd-e4115bafdd78

Faulting package full name:

Faulting package-relative application ID:

  • Yes i have seen these issues before on this version of NPM. Its to do with the md5 certs and tls 2.0 incompatability. Log a call with solarwinds to get the solarwinds md5 repair toolkit. I have not been able to find a resource online to download it from.

    Endless issues with this im afraid but have been ressolved since 12.2.

    .NET compatibility

  • Thanks for the info.  We checked with the support engineer assigned to the case, and he sent us a link to it: Web Console not loading because CoreBusinessLayer failed to start - SolarWinds Worldwide, LLC. Help and Support

    However, the link that it references in the article for the Fix It tool is dead, emoticons_laugh.png

    I'll let you know if we are able to get a working one.

  • I'd like to know what your outcome is as well.
    I've had a case open with SW Support for about six weeks now on the issue.

    We haven't gotten much farther than validating .Net environment and repair install of C++ redist, but mine are SWJobEngineSvc2.exe on agents, so... large number of agents on which application polling is dead.

    Application: SWJobEngineSvc2.exe

    Framework Version: v4.0.30319

    Description: The process was terminated due to an unhandled exception.

    Please update us if you do get yours resolved

  • Ive had this issue for 3 months now, It went to SW dev they said it was my .net. Microsoft said it the application. Ive done every thing possible even building new servers still have the same issue.

  • I was dealing with similar crashes and tried all sorts of things to fix the issue. The one that made the biggest differences in our environment was to modify "C:\Program Files (x86)\SolarWinds\Orion\SWNetPerfMon.DB" and change the various "Timeout" related settings to something much higher, like "600" or "1200". I'm not sure if it's due to the size of our environment or our SQL server in general (it's shared, no other applications on it have issues), but it helped. Or maybe it's just the amount of modules installed on the poller.

    Not sure if this will help in your case. I was seeing that same exception in our logs, but was also seeing SQL timeout exceptions in other solarwinds logs.

  • While my change above definitely improved my situation, it did not completely fix it. I mentioned this issue to support (with the exact .Net exception message) and they claimed it was a bug fixed in 12.4. Can anyone that has experienced this issue confirm that it was actually fixed after upgrading to 12.4? Not that doing that wont introduce completely new problems...but hey, why not.