When Pingdom fails, it does not provide context on the call even after timeout occurs. This does not help identify root cause easily.
Pingdom health check is set up and running fine
Pingdom health check begins to fail
Results Log and Test log show no fine-grained details on the why
The only result shown could be "Socket timeout" or the infamous "tslv1" response
This does not help root-cause an issue. Even with a 30 second timeout, Pingdom health checks should still try and wait for the response, say with a global 5 minute timeout (configurable). This would allow users to see response headers, which may help indicate or provide useful unique id's to help track down misc requests.
We used a competitor to help gather this information, because they still allowed health-checks to finish even after they crossed the 30 second threshold. Using them helped us track down a potential root-cause much quicker, than with Pingdom alone.
It would be a powerful capability to introduce to the platform.
SolarWinds solutions are rooted in our deep connection to our user base in the THWACK® online community.
More than 150,000 members are here to solve problems, share technology and best practices, and directly
contribute to our product development process.