This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

WPM Playback Intermittently Down

Hey All,

so this is the issues that we have for couple of transactions that we have:

  • The transaction intermittently failing on the playback - upon seeing the details it appears that the steps stuck on loading the next page, and the next page never show up, hence the playback shown as down - the site itself is not down or having any issues, tested and verified this many times already so I believe that the issue is with the WPM not the sites. This is happening for some of our transactions, with various occurrences, some site like o365 seemingly failed more often than others. The issue happening in totally random occurrences, which makes it very difficult to diagnose. We tried to run the player on different server, even setup one server for testing, but the issue is still there, so I'm convinced that the server is not the issue. The player running on Windows Server 2012 R2.
  • Recording working fine, tested the transactions using the recorder many times, but never failed even once.
  • I found this old discussion: https://thwack.solarwinds.com/thread/100415 which seems to have the similar issues. I have tried to set maxPlaybacksPerWorker to 1 per that thread but doesn't solve the issue, although it appears that it slightly improves the situation but not solving the issue. I have also tried to do the XY recordings, but that doesn't solve the issues as well.
  • I have also taken out Antivirus out of the equation, tested to run on test server with no antivirus and it makes no difference.

I have an open case with support, but wondering if anybody have similar issues? And if so, do you find any solutions?

Thank you

Parents
  • Been running this gauntlet for almost 2 years now, and the reason we require 2 failures (20 minutes of downtime) before alerting out. It cuts down on false alerts, but takes 20m+ to detect an issue. It is product related bc our simpler sites are solid in WPM and never/rarely fail. It wouldn't be that bad if we could immediately retest after a failure, but WPM is limited in what it can do at runtime and we have to wait an entire run interval to verify. My team has spent 100s of hours banging our heads against our desks trying to crack this egg and the only hope is a better product. Angular JS.

  • thwack212:  one way we get around this is to have the same URL run from 2 different servers. We split our up geographically into 2 different data centers, but you wouldn't need to do that.  

    Then we put those monitors in a group, change the group to show mixed status and then alert off of the groups status.  This way if we have a monitor running every 10 min we don't have to wait 20 min for it to run twice.  If one monitor goes down the group status will go into warning.  If both monitors go down then the group status shows as down and triggers the alert.  This way we only have to wait 10 min for an alert.  We have found it to be very reliable, even with crappy websites. 

Reply
  • thwack212:  one way we get around this is to have the same URL run from 2 different servers. We split our up geographically into 2 different data centers, but you wouldn't need to do that.  

    Then we put those monitors in a group, change the group to show mixed status and then alert off of the groups status.  This way if we have a monitor running every 10 min we don't have to wait 20 min for it to run twice.  If one monitor goes down the group status will go into warning.  If both monitors go down then the group status shows as down and triggers the alert.  This way we only have to wait 10 min for an alert.  We have found it to be very reliable, even with crappy websites. 

Children
No Data