Is the player load percentage based on # of CPUs? So if I have 2 CPUs, then I can safely go up to 200%?
Asking because around 7pm the other night, for some unknown reason at this time, all 4 of my current players doubled in load according to WPM.
All of them are now reporting well over 100% load and none of them even have 25 transactional tests on them yet.
Oddly, this occurred on two different WPM installations, one in PRD and one in DEV. Yet, the load on all players on each WPM installation doubled at the same time.
To learn more about this widget in the latest WPM documentation, see:
Per the Scalability Engine Guidelines for WPM, SolarWinds recommends limiting the number of monitored transactions assigned to a WPM Player to 12 or less. However, note that many factors can impact the load on transaction locations that host WPM Players, including:
I am having an issue with player load on a player that currently has a few tests in an unknown state. The player has 29 tests on it and the load is normally around 80%. Right now, there are 7 tests on it being reported as in an 'UNKNOWN' state and the player load is reported as 348%.
If I unmanage these tests that are in an UNKNOWN state, the load goes back down to normal levels. Does this mean that my other transactions on this player are suffering (not getting ran very often) due to this?
Why does having a few transactions in UNKNOWN cause this problem?
transaction goes to unknown in two cases - if player is down or if transaction is not played for a long time (based on it's frequency). Because your player is up, second case applies to you. Those unknown transactions are for some reason played for much longer time than they should. This causes other "healthy" transactions to wait in the queue for playback.
I would try to open some of those unknown transactions in Recorder on player machine and try to play them from it to see if there are any issues with playback.
There is a current issue with a new version of the app that was rolled that causes IE10 to crash. Basically a user logs into our app and then selects a specific link and IE10 immediately crashes. Not sure why this would cause the transaction to take longer than normal though unless it's just waiting a long time after the crash to actually finish? I can disable the tests again, I was just leaving them on so the developers had something to look at while they worked on the issue.
BTW, the playback of the test in the recorder causes the recorder to completely crash also.
Is some transaction crashes whole process then we retry playback. In your case those crashing transactions are being played over and over and never finish due to that crash. You should unmanage them until the issue is fixed or you should use player with different version of IE that does not crash on it.
the player load is not related to number of CPUs and is computed by the following formula (simplified):
player_load = number_of_running_playbacks/total_number_of_playback_workers*100 + transactions_waiting_for_playback
the transactions_waiting_for_playback value is based on sum of wait times of the transactions on the player before they are played back. So basically the longer the transactions wait for playback on the player the higher this value gets. Based on this formula you for will for example see 100% load when all playback workers are currently playing back transactions. So having a current load around the 100% (even slightly above) is completely natural. But if the load in player load chart is constantly over 100% it should be considered to move some of the transactions to different player.
In your case if the load got over 100% on a two different installations at the same time something probably changed on the monitored side so the playback now takes longer (which causes higher accumulation of transactions in the waiting queue which in turn causes the load to rise). This can also happen when some of the transactions start failing as the player is by default in some cases trying to replay the transaction again to ensure that we won't report false alert.
Thanks, that was helpful. Was always curious why the player load percentage didn't seem to really line up with the actual load on the box.
There was a Trendmicro OfficeScan to all of my boxes around that same time. So looking into that now to see if it is causing tests to take longer.
The total time for a transaction doesn't seem to have changed, but I noticed that it seems to be running less tests than it did also.
For example, in some cases, a test scheduled to run every 3 mins seems to only really be showing data for one test in a 10 min poll in the graphs (no min or max bars).
It's important to note that Trendmicro's OfficeScan installs a transparent proxy and routes all browser traffic through it for purposes of picking up phishing sites and malware through the use of web reputation services. It's very probable that the removal of OfficeScan from the computer where the WPM agent resides will improve overall transaction playback performance.
Thanks. I had them completely remove it on one of my players but it didn't seem to help at all. In the recorder, on a 9 step test that reports taking 24 seconds to run, it takes actually over 5 minutes to complete. Between every step I'm seeing a 30 second pause at the end of the step and before the next step starts. So for example, it highlights "Click on image" at the end of a step and then just sits for 30 seconds before the next step starts. I see the actual page of the next step load in the recorder, but then it just hangs there. If this is happening in the player, it would explain why each 3 minute poll test seems to really only be running once every 10 minutes.
Since it seems to be exactly 30 seconds, I'm guessing it's one of these and will start changing them one at a time to see if I can narrow it down.
This is the one that helped greatly. Any ideas as to why this would be? In the recorder, my 9 step test was taking 5 minutes and 10 seconds on average to run. After changing this one setting to 5 seconds from 30, the test now only takes 1 minute and 22 seconds to run. After changing the player config to this, my player load % has dropped down from being well over 100% to well under 50%.
I don't see any issues in the recorder at all, I can pull up pages and they seem to complete loading very quickly.
I forgot to mention something, if you will be able to pinpoint a request which is causing this issue let me know and I'll send you a message with information how to setup player configuration in order to ignore this one specific request. This approach should be significantly safer than lowering thresholds.
have you had some specific reason to change that setting? Lowering it to 5 seconds can cause incorrect page load duration times reported in WPM if your page loads longer than 5 seconds. It can also cause playback failure if some element on the page does not show up in those 5 seconds.
I have Player Load Percentage more than 100%, having only 22 transactions on that. only one transaction takes duration more than 15s rest are with in 5 sec. Please suggest me what steps i can do here. Is there a way from where we can check what transaction are currently running or make time change to run one transaction at this time and then every 10 minutes (means start time can be changed or not ?)
Solarwinds: WPM Agent service component running in critical. so I suppose must be issue with a lots of transactions in queue...............
Actual duration that you see in web console is only measured duration (page load). Actual playback usually takes more time (finding elements on the page, clicking, ...). So transactions can run much longer than reported.
You can find actual playback duration of transaction if you enable DEBUG logging for WPM Business Layer through Log Adjuster (Start -> All Programs -> SolarWinds Orion -> Documentation and Suppport -> Log Adjuster). Once you set "WPM Business Layer" to DEBUG and apply changes, playback statistics will be logged to c:\ProgramData\Solarwinds\Logs\SEUM\SEUM.Performance.log once playback finishes.
File lines have format:
Transaction <Transaction ID> <Transaction Name> (rec: ... agent: <Agent ID>): | measured duration as timespan | playback duration as timespan | measured durations as total seconds | playback duration as total seconds
You are interested in "playback duration" columns. You can import this file for example to Excel with "|" as a separator to get better overview. If you find some transactions that have very long playback durations compared to measured durations, there may be some issue with them that is causing them running too long. It would then need closer investigation what is causing that.
Would appreciate that info on blocking specific domains when you get a chance. I've been going through the config files but not finding it. I could (?) update the hosts file to point them to local host but hoping there is a better method?
Thanks, I appreciate the input...
Still scratching my head over this one. I just now went and created a new local user on the box, switched to that user and loaded up the same URLs in IE with the developer tools and just don't see any issue at all.
The pages all load almost as soon as I click on the links. Logging into our app takes less than a second. IE shows it waiting on nothing at all.
I created a local account just in case something was getting configured under my corp account that needs to be setup on these local accounts that wpm uses.
If I do figure out that it is a specific request that can be safely ignored, I'll get back with you... Thanks again.
(think i'm just going to totally remove the player and users from a node and re-install to see if maybe some local security routine ran monday night and changed/broke something with the wpm user accounts)
Eureka, guess I should have used the debug logs earlier and saved a couple of hours... Seeing this in the debug logs for every step of every test:
2013-03-14 14:26:05,080 [SolarWinds.SEUM.Agent.Worker.exe][Browser Thread] DEBUG SolarWinds.SEUM.Player.WatiN.WatiNPlayer - Waiting for pending requests
2013-03-14 14:26:06,094 [SolarWinds.SEUM.Agent.Worker.exe][Browser Thread] DEBUG SolarWinds.SEUM.Player.WatiN.WatiNPlayer - Waiting for pending requests
2013-03-14 14:26:06,094 [SolarWinds.SEUM.Agent.Worker.exe][Browser Thread] WARN SolarWinds.SEUM.Player.WatiN.WatiNPlayer - Browser was stuck with pending requests for more than 30000ms.
2013-03-14 14:26:06,094 [SolarWinds.SEUM.Agent.Worker.exe][Browser Thread] WARN SolarWinds.SEUM.Player.WatiN.WatiNPlayer - Remaining pending request: Begin: +1.622 s, Blocked: 0ms, DNS: 0 ms, Connection: 0 ms, Send: 0 ms, TTFB: 0 ms, Download: 0 ms, Size: 0 Mime: text/html Status: 0 URL: https://***.tcliveus.com/i?siteID=.........
So this tclieus.com call seems to be causing my issues and started Monday night. Learning how to ignore/skip requests to specific domains would definitely be appreciated.
We are seeing something similar to this except obviously a different URL. How did you come to the conclusion that this was the problem? Was it that the site wasn't responding quickly enough when browsing to it or something else? I'm seeing player load of over 100% and I'm looking for an explanation as to why and how this is calculated.
This timeout is used for wait for each pending request. So simply put if you have some long running request on page (long poll requests, keep alive requests) we wait for this request specified amount of time to finish. You can see if your page has these kind of request for example in Internet Explorer developers console. Generally it is not recommended to lower this value if there are no issues with playback as having this value too low may cause that player will not wait for the page to fully load and measured step duration would be lower that it actually is. But if you are not seeing any issues I guess it should be OK in your case.
SolarWinds solutions are rooted in our deep connection to our user base in the THWACK® online community. More than 150,000 members are here to solve problems, share technology and best practices, and directly contribute to our product development process. Learn more today by joining now.