This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Orion HA installation error - "Can’t receive data from SolarWinds Administration Services"

I have run up a new instance of SolarWinds with a bunch of modules (below) and am trying to set up HA clusters for the main poller and an additional polling engine. I've run the Orion installer on the HA server and it can connect to the main Orion server but errors out while downloading the files from the main server with the error: "Can’t receive data from SolarWinds Administration Services" with a link to a related KB article (don't have a screenshot handy sorry). However, I don't think the KB article applies as the server definitely has the ports open and it is not taking longer than 30 minutes to copy (it errors within a minute).

I tried to contact support but they cannot help as it is not a licensed module, but at the moment the module won't be purchased if it can't be installed. The case number is #1178593 and they're going to pass it over to a sales engineer but I'm not sure they will be able to assist if it needs escalation.

  • This is a new installation of SolarWinds (modules installed 3 weeks ago) and all latest hot fixes installed as of last week
  • Modules installed (full module versions in the attached log file):
    • NPM 12.1
    • SAM 6.4.0
    • NTA 4.2.2
    • NCM 7.6
    • VNQM 4.4.0
    • IPAM 4.3.2
    • SRM 6.4.0
    • VMAN 7.1 / VIM 7.1.0
    • Additional Polling Engine
  • I have verified that each server can reach the other servers via hostname and IP address and I can telnet to the ports required from the HA servers
  • The same issue occurs on both new servers that will become the HA servers  (main and additional servers)
  • Not all modules are licensed – some are still in the evaluation period (still active, time remaining - NTA, SRM, VMAN).
  • I can confirm I can manually copy all the installer files between the servers quickly (takes less than a minute to copy)
  • The servers are running the same OS (Windows Server 2012 R2) and have same specs (CPU, memory, disk).
  • I tried disabling the antivirus – didn’t fix the issue
  • No production HA license installed however the 30 day evaluation was enabled today

So I now have the following questions for HA:

  1. Any thoughts on the above problem so we install/enable HA?
  2. Do we have to reconfigure NTA FSDB server to use the HA VIP? Does the NTA FSDB initiate communications with the main server or does it only respond to request from the pollers/website?
  3. Do we have to reconfigure VMAN integration to use the HA VIP? How does VMAN handle the active/passive server swap?
  4. Is it possible to have HA change multiple VIPs on the server (i.e. a second NIC with another VIP)? These servers have two network interfaces and roughly half of the objects are polled via the second NIC. Polling should still be OK from the HA server NICs (and we'll confirm the IPs and HA VIPs are in the firewall rules), but what about receiving data? A workaround for now I guess would be to have syslog, traps, NetFlow be sent to the main poller (or additional poller) IP and HA IP - the passive server will have the services disabled so won't record traffic. It will mean duplicating network traffic but it would ensure the data is still received when a failover occurs.

I've attached the installer log file as well. It errors out at the same spot each time so I suspect it might have something to do with the NetworkAtlas hot fix because that file doesn't exist on the main Orion server in the C:\ProgramData\SolarWinds\Installers directory.

2017-06-07 14:54:26,476 [3] DEBUG SolarWinds.Orion.CompatibilityPreInstaller.Forms.DownloadProgressPage - Downloading CollectorInstaller.msi

2017-06-07 14:54:26,663 [3] DEBUG SolarWinds.Orion.CompatibilityPreInstaller.Forms.DownloadProgressPage - Downloading InformationService.msi

2017-06-07 14:54:26,851 [3] DEBUG SolarWinds.Orion.CompatibilityPreInstaller.Forms.DownloadProgressPage - Downloading InformationService-HotFix2.msp

2017-06-07 14:54:26,945 [3] DEBUG SolarWinds.Orion.CompatibilityPreInstaller.Forms.DownloadProgressPage - Downloading SolarWinds-Orion-NetworkAtlas.msi

2017-06-07 14:54:27,773 [3] DEBUG SolarWinds.Orion.CompatibilityPreInstaller.Forms.DownloadProgressPage - Downloading SolarWinds-Orion-NetworkAtlas-v1.16-HotFix1.msp

2017-06-07 14:54:27,820 [3] ERROR SolarWinds.Orion.CompatibilityPreInstaller.Forms.DownloadProgressPage - FileTransferProxyClient.GetFile failed, ex: System.ServiceModel.CommunicationObjectFaultedException: The communication object, System.ServiceModel.Channels.ServiceChannel, cannot be used for communication because it is in the Faulted state.

Server stack trace:

   at System.ServiceModel.Channels.CommunicationObject.Close(TimeSpan timeout)

Exception rethrown at [0]:

   at System.Runtime.Remoting.Proxies.RealProxy.HandleReturnMessage(IMessage reqMsg, IMessage retMsg)

   at System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(MessageData& msgData, Int32 type)

   at System.ServiceModel.ICommunicationObject.Close(TimeSpan timeout)

   at System.ServiceModel.ClientBase`1.System.ServiceModel.ICommunicationObject.Close(TimeSpan timeout)

   at System.ServiceModel.ClientBase`1.Close()

   at System.ServiceModel.ClientBase`1.System.IDisposable.Dispose()

   at OrionInstallerLib.FileTransfer.FileTransferProxyClient.FetchFile(String filename, Stream fileStream, Action`1 reportProgress)

   at OrionInstallerLib.FileTransfer.FileTransferProxyClient.GetFile(FileInfo fileInfo, Action`1 reportProgress)

2017-06-07 14:54:27,820 [3] ERROR SolarWinds.Orion.CompatibilityPreInstaller.Forms.DownloadProgressPage - System.ServiceModel.CommunicationObjectFaultedException: The communication object, System.ServiceModel.Channels.ServiceChannel, cannot be used for communication because it is in the Faulted state.

Server stack trace:

   at System.ServiceModel.Channels.CommunicationObject.Close(TimeSpan timeout)

Exception rethrown at [0]:

   at OrionInstallerLib.FileTransfer.FileTransferProxyClient.GetFile(FileInfo fileInfo, Action`1 reportProgress)

   at SolarWinds.Orion.CompatibilityPreInstaller.Forms.DownloadProgressPage.DownloadFile(String fullPath, Action`1 setDownloadedBytes)

   at SolarWinds.Orion.CompatibilityPreInstaller.Forms.DownloadProgressPage.backgroundWorker_DoWork(Object sender, DoWorkEventArgs e)

2017-06-07 14:54:27,820 [3] DEBUG SolarWinds.Orion.CompatibilityPreInstaller.WorkFlow.Workflow - set result:CantReceiveDataFromSWA

2017-06-07 14:54:27,835 [1] DEBUG SolarWinds.Orion.CompatibilityPreInstaller.WorkFlow.Workflow - set result:CantReceiveDataFromSWA

pastedImage_31.png

  • I've had a chat with some others and they suggest it's to do with the installer manifest not being correctly updated to know about hot fixes. And the general resolution is to uninstall the hot fixes, deploy HA (or APE) and redeploy hot fixes on everything again. I came across a similar issue with the combined APE installer about 3-4 years ago (had to uninstall a SAM hot fix for the installer to run) and raised this as an issue back then.

    I'm presuming I could uninstall the hot fixes and try the HA installer again and if so, I think this process should be investigated because it's not an efficient solution. Or at the absolute minimum, a more informative error message. emoticons_confused.png

    I'll try uninstalling the hot fixes tomorrow and update the thread if that fixes it.

  • There should be no requirement to uninstall anything. Simply install/apply the hotfix on your APE/AWS/HA backups the same as you did on your primary/main Orion server.

  • This is a new HA - the installer won't even install the initial set of applications to apply a hot fix to?

  • I recently rebuilt Orion (npm, sam, ncm, nta, ipam, srm) with the latest versions and hotfixes -- took quite some time. I ended getting the main standby HA server up and running by uninstalling the hotfixes it pulled over and manually installing all the hotfixes. Then I ran the config wizard and all was well. Turns out the installer wasn't pulling over all the hotfixes.

    Dev was made aware of it and was working on something I was told.

    Out of curiosity how is your HA installer communicating to your primary server to download the installs? Via IP directly or via hostname?

  • Perhaps unrelated but when I was standing up an APE on this same environment, I had to manually install everything and then everything was peachy.. The SolarWinds Orion installer was pulling over the .msi for IPAM v4.5 RC1 and causing it to fail every time. Diags were taken at that time and sent to Support and they recommended manually installing all the modules to get it up and running. Not sure if that's recommended on a main standby server though. Could work?

  • The problem I have is the installer isn't installing anything that I can apply a hot fix to. I've tried both IP and host names.

  • Another update:

    I uninstalled all of the hot fixes from the main poller and the HA installer still errors out when trying to copy the files over. Looking through the log files, it looks like it is still trying to download files that don't exist on the server.

  • All Orion installed MSIs and MSPs are located in C:\ProgramData\SolarWinds\Installers\ and from this location (on main poller) they are served when installing AP/AW/HA.

    This folder also contains small module zip files with instrumentation what/where is being installed.

    The issue here is caused by missing file SolarWinds-Orion-NetworkAtlas-v1.16-HotFix1.msp in this location (should be deployed by Core hotfix) yet it's being referenced in CORE_2017.1.5300.zip

    We're trying to figure out how this could have happened, but in the meantime, you can try to reinstall Core/NPM hotfix on the main server and if it doesn't help, this particular file can be downloaded here:

    http://downloads.solarwinds.com/solarwinds/OrionInstaller/SolarWinds/OrionPlatform/2017.1/SolarWinds-Orion-NetworkAtlas-…

  • Thanks for the info Jan. I hoped that by uninstalling the hotfixes it would stop looking for that particular file but it looked like the installer continues to look for it even after uninstalling all hot fixes.

    I've just finished reinstalling the hot fix (OrionPlatform-v2017.1.5300-Hotfix2) on the server now and it didn't add the file to the installers directory so it could be an issue with that hot fix maybe. I copied over all msp file you linked and the installer finished copying over the files.

    I'm going to reapply the other hot fixes and will update this once done.

  • Confirming that with the above file being copied to the server, the HA installer runs successfully and I've got the HA pool up and running. I'll run some tests on it tomorrow as well as run up HA for the additional poller.

    Thanks!