This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Socket Transfer timed out after XXXXX. You have exceeded the timeout set on your binding. The time allotted to this operation may have been a portion of a longer timeout.

Hi All,

We are on the latest SolarWinds 2020.2.4 version having all network modules installed. Couple of days back one of our APE polling services crashed and repair did not work, nor the clean uninstallation. 

We ended up uninstalling things manually. When we try to reinstall again, getting the below error as soon as we give the IP and credentials of the main poller.

We have opened a support case and so far have not identified what could be the case. Can anyone provide some suggestions?

Parents
  • what's the latency like between that poller and the main poller?

  • It is around 120ms and working for some time.

    We recently installed 2020.2.4 and this happened 2 weeks later.

  • , have you tried restarting the Administration Service on the main poller? are there firewalls or some sort of WAN accelerators that might be messing with the traffic?

  • Yes, the Administration service is restarted on both. Firewall ports are checked, no packets denied. Telenet connection is establishing between the required ports.

    Not sure about the Wan accelerators part. I can get that checked.

  • , i've seen where a WAN accelerator or some sort of traffic shaping can cause these kinds of problems. What kind of firewalls are between them? also, have you reviewed the installer log to see if it provides any additional information?

  • I believe it's a Palo-alto firewall.

    Found these logs.


    SWIS v3
    (Inner Exception #0) System.TimeoutException: The open operation did not complete within the allotted timeout of 00:01:00. The time allotted to this operation may have been a portion of a longer timeout. ---> System.TimeoutException: The socket transfer timed out after 00:00:59.8437356. You have exceeded the timeout set on your binding. The time allotted to this operation may have been a portion of a longer timeout. ---> System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
       at System.ServiceModel.Channels.SocketConnection.ReadCore(Byte[] buffer, Int32 offset, Int32 size, TimeSpan timeout, Boolean closing)

    BusinessLayer (Module Engine)
    2021-03-15 16:40:21,305 [7] WARN  SolarWinds.Orion.Core.Common.ServiceHelper - Waiting for SWIS v3.0 start. Attempt 7, exception message: "The open operation did not complete within the allotted timeout of 00:01:00. The time allotted to this operation may have been a portion of a longer timeout."

  • , I highly suspect your Palo Alto FW rules are not written correctly, specifically the application portion of those rules. The problem is that it becomes very difficult to track down because from a consumer perspective everything looks like it should go through via conventional test like telnet, but when the application actually starts to traverse it the Palo starts dropping packets. I would review the Palo Alto traffic logs looking for anything between the two of these servers reporting an "invalid-state" if you're seeing that kind of activity it's more than likely your problem and a modification to the rule within the application portion (i believe from memory that you have to define something to default but i can't remember it exactly) should clear it up.

  • Thank you. Let me review this with the firewall team and confirm you back how it goes.

Reply Children