4 Replies Latest reply: May 1, 2012 5:01 PM by cdenuyl RSS

SEUM playback stalls but status shows green (up)

cdenuyl

We've had numerous instances in SEUM 1.5 where playbacks stall but the status never changes to Unknown.  All our recordings play back from 3 separate US locations, every 5 minutes.  Periodically, certain playbacks "stall", and stop playing back entirely until manually restarted with the Play Now button.  The GUI shows the last date/time the playback ran, but even if it's days/weeks ago, the status is still Green, since that was the last known status before it went dark.  This is a serious issue for us, because our alerting looks for agreement among all 3 playbacks for either Red (down) or Gray (unknown).  When all 3 playbacks are in either of those two states, we start paging engineers.  If any one of them is Green (up), we do not.  Stalled transactions create a critical blind spot.  Previously, Solarwinds provided a SQL script to detect and alert on stalled events (>15 minutes without updates).  However, it's still a manual process to login and address the stalled playback.  During that time, we are blind.  We are scaling up our infrastructure to meet the needs of the app.  In the meantime, my questions are:

 

1) Shouldn't the "Unknown" status be written to the database when monitored objects fail to update?  Is this a bug or do I misunderstand the status taxonomy?

2) Can weplayb mitigate this with a SQL job that sets the playback status to "Unknown" when the following alert logic is tripped?

 

Trigger Query = SEUM: Transaction

 

JOIN SEUM_Transactions

                    ON SEUM_TransactionsAlertsData.TransactionId = SEUM_Transactions.TransactionId

JOIN SEUM_Agents

                    ON SEUM_Agents.AgentId = SEUM_Transactions.AgentId

WHERE ( SEUM_TransactionsAlertsData.IsEnabled = 1

                    AND DATEPART(year,SEUM_Transactions.LastDateTimeUtc) > '1900'

                    AND DATEADD(second, (3 * SEUM_Transactions.Frequency), SEUM_Transactions.LastDateTimeUtc) < GETUTCDATE()           

                    AND SEUM_Agents.ConnectionStatus = 1

)

 
  • Re: SEUM playback stalls but status shows green (up)
    aLTeReGo

    The issue you describe is usually limited to situations where the player is overloaded and the scheduled job never gets an opportunity to play because it expires in the queue and moved to the back of the line. Pressing the "Play Now" button forces the transaction to the front of the line and is played back immediately. A simple remedy to improve reliability might be to decrease the playback interval of the transactions assigned to these players so there's more idle time for the worker threads to squeeze all of the assigned transactions in. Another possible alternative would be to stand up another player and split the number of transactions assigned to this player across multiple player instances. If none of these options appeal to you, as the last possible option you can contact support who will be able to provide you with instructions on how to increase the number of worker threads on the player. I don't usually recommend this option since it's heavily dependent upon the server where the player is installed having available CPU and memory resources to handle additional threads. If the server is already taxed (high CPU or memory usage) this could actually have a net negative effect by causing all transaction playbacks to take longer to complete, and in some extremely rare circumstances some customers have found that fewer threads has improved overall playback performance.

    • Re: SEUM playback stalls but status shows green (up)
      cdenuyl

      Our new playback hardware has been ordered and will be deployed in the next couple of months (globally).  In the interim, we're making the best of what's available.  My concern is this: The application has an Unknown status, yet it's not being leveraged when playback data gets stale.  The data lives in the database, and we know that anything >15 minutes old has already missed 3 polling periods.  Even if the playback server goes into a tailspin, we should be able to detect the stall and update DB records accordingly.  I've worked with other systems (Big Brother, etc) which set items as "Unknown" once their data was older than X polling cycles.  Since we can detect this stalled condition with the query provided, it seems like an easy workaround to execute a script/trigger that flips the playback status from Green to Unknown.  I'm already monitoring for Down/Unknown status, so that eliminates our blind spot.  I can open a ticket if needed, but I think the focus should be on detection/flagging of stale data.

      • Re: SEUM playback stalls but status shows green (up)
        aLTeReGo

        It's a fairly trivial task to determine if the data is older than a certain period of time or polling/playback intervals, it's however something entirely different to interact with the job scheduler to force a manual playback on a remote player as an action to these results. Remember, the job is being scheduled normally and the player is responding "ok, job received I've got it from here" and placing it in it's queue. When the SeUM server goes back to collect the results of any jobs that have run there are no results for this particular job because another of the same job was scheduled to run before it had an opportunity to execute, placing it at the back of the queue. Manual playbacks interact directly with the job scheduler and do not interact with the database, other than to save the results of the playback. We're working on some changes to help mitigate this situation from occurring that should be available in a future release, but in the meantime the simplest remedies to the situation are listed in my previous post.

        • Re: SEUM playback stalls but status shows green (up)
          cdenuyl

          I'm not looking to manually trigger the playback.  I'm looking to flip the value in the DB from Green to Gray.  Clearly we have playback contention - but I need to defend against blind spots regardless.  When a monitoring app tells me Green (good), but my data is 7 days stale, that's very, very bad.  I can provide a screenshot from today - it was Green in the GUI, but the playback interval was every 5 minutes, and the Last Played value was 4/3/2012 11:14:31 AM.  Next Playback says "Now", but after 3300+ missed intervals, that's of little comfort.  The status of that playback should be shifted to Gray (Unknown) by the SEUM server or DB until the playback server phones home with current data.