When Pingdom fails, it does not provide context on the call even after timeout occurs. This does not help identify root cause easily.
Scenario:
- Pingdom health check is set up and running fine
- Pingdom health check begins to fail
- Results Log and Test log show no fine-grained details on the why
- The only result shown could be "Socket timeout" or the infamous "tslv1" response
This does not help root-cause an issue. Even with a 30 second timeout, Pingdom health checks should still try and wait for the response, say with a global 5 minute timeout (configurable). This would allow users to see response headers, which may help indicate or provide useful unique id's to help track down misc requests.
We used a competitor to help gather this information, because they still allowed health-checks to finish even after they crossed the 30 second threshold. Using them helped us track down a potential root-cause much quicker, than with Pingdom alone.
It would be a powerful capability to introduce to the platform.