Is there any i can fix this? An entire poller is down. We are unable to pinpoint the issue and support has given up on this.
Hi Mark, Here is the redacted RCA i have prepared for this issue, I will verify this, but the issue is mostly resolved outside of the SolarWinds Environment.
When got the error , We reinstalled the Solarwinds software on the Poller but we encountered the same error afterwardsThe next step gave us our first clue and was to use wireshark to look at the network traffic being exchanged. This showed us jumbo packets were being sent by the database but not received by the poller.We then ran traces on the SDWAN routers using the Cisco VManage application to further isolate the source of the issue. We found that fragmented packets with a MTU of 1419 were being retransmitted on the SDWAN router We also ran a trace in Cisco Thousand Eyes and it showed a very important detail of this issue. The traffic between the two sites is asymmetric, which also confirmed the issue with the MTU settingWe then looked at the MTU configured on the tunnel between SDWAN routers and found that it was the value was smaller than the size of our packet fragments.
The conclusion we drew from the data we were seeing was that when the connection between the SolarWinds poller and the database was initiated, the MSS value negotiated was too high due to the asymmetric nature of the routing between the two servers. This led to packets being fragmented at a larger size than what was allowed by the SDWAN router tunnel.
*MSS (Maximum Segment Size) is a value that defines the largest segment of data that can be sent in a single TCP packet. It is typically derived from the MTU value minus the TCP/IP headers.
** MSS negotiation between two servers occurs during the TCP handshake when both servers exchange their preferred MSS values, indicating the maximum amount of data they can handle in a single segment. The final MSS is typically set to the lower of the two values to ensure both servers can transmit data efficiently without exceeding the network's MTU (Maximum Transmission Unit) limits.ResolutionConfigure the SDWAN router interfaces with an MSS of 1300. This forces the negotiated MTU size to be smaller than the maximum size allowed by the tunnels.
As stated previously, end devices negotiate the MSS during the TCP handshake, meaning they inform each other of the maximum segment size they can handle based on their own MTU settings. When adding ip tcp adjust-mss 1300 to the router's interface configuration, it tells the router to modify the MSS value in TCP SYN packets passing through that interface to 1300 bytes. Without it, the MSS is purely negotiated by the end devices based on their own MTU settings.
Have you tried to build another poller? Probably quicker than trying to troubleshoot the old one.
i have thought about it...but there are too many devices on it , kept it as a last option.
Fair enough. I'm assuming you have tested SQL ports from APE to SQL instance and/or other port requirements?https://documentation.solarwinds.com/en/success_center/orionplatform/content/core-solarwinds-port-requirements.htm#link3
yeah, it works. that's the first thing i did, i am baffled.