Our WPM monitors use the default data collection interval of 'every 5 minutes'.
We know that this configuration gives us one raw data point in WPM's availability chart every 5 minutes.
Therefore, when a WPM monitor fails on some transaction step or action within the monitored website, that makes the data point for that data collection interval show up in DOWN status in SW WPM.
However, if it is possible, I propose the following:
The proposed data collection strategy is done in such a way that SolarWinds will attempt to replay the web transaction recording several times before it makes a final conclusion to mark the monitor with a "Down" status.
For instance, an attempt to play the web transaction recording against the target the website at 10AM failed, the retry logic will kick in 1 minute after that. By then, SW will try to check the website again, and if that website check fails as well, then, it will try again after 1 minute.
So, to sum up the way how it will work, SW will try checking until there's maybe 4 or 5 failed attempts all in all, basically with a 1 minute gap per attempt.
If at the final attempt playback of the web transaction recording still fails, the monitor will then finally get a "Down" status in SolarWinds. However, if the web transaction recording playback succeeds in any of the first 3 or 4 attempts, the monitor gets an “Up” status in SolarWinds.
In addition, the final attempt will have to perform additional checks that will be run coinciding the playback of the web transaction recording.
These additional checks concurrently running with the last attempt at playing back of the recording can be :
- Check route via traceroute
- Resolve DNS name via nslookup or some DNS name resolution strategy
- NetPath or PathPing
- A check of the CPU load, memory used, free disk space of the web server that's supposed to serve the website contents
- A check of the critical performance counters, state of critical services and processes that are supposed to be running on the web server that's supposed to serve the website contents
- Checks for the existence of the very first HTTP 200 response from the web server
- Save all these data in the SolarWinds DB and ensure that all the results of the 6 bullet points above will be listed in the final error message displayed along with the DOWN status that will be assigned to the monitor.
- (Again, emphasis on the hope is that all the first 6 items mentioned above will run CONCURRENTLY with the last attempt of playing back the recording.)