This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Custom Alert - Average Blocking Time per Session

Description

This alert will calculate the average blocking time per session and helps you understand the severity of the issue. It is similar to the Total Wait Time for Blocking - Repository alert in that it uses the data already in the repository, but instead of returning a total amount of blocking time, it divides that by the number of sessions being blocked. For example, there could be 200 sessions being blocked for 2-3 seconds, and you would get 600 seconds of blocking and be notified with the other alert mentioned. However, the problem is not too bad because each individual session was not blocked very long. However, if the scenario was instead 20 sessions blocked for 600 seconds, this alert would give you 30 seconds on average which may be a different severity.

Also see the Alternative section below if you want to retrieve the session with the maximum blocking time, i.e. the worst case scenario.

Alert Definition

To create the alert, click on Alerts > Manage Alerts tab and create a Custom Alert of type Custom SQL Alert - Single Numeric Return. Configure the alert with values similar to these:

Alert Name: Blocking Time per Session

Execution Interval: <= 10 Minutes down to 1 Minute

Notification Text: This instance just had a blocking problem where the average blocking time per session is below.

Active Database Instances: Add the Ignite Repository DB only

SQL Statement (Pick the proper statement below depending on if your Repository is on SQL Server or Oracle)


SQL Server Repository Statement:

SELECT COALESCE(AVG(timesecs), 0) avg_blocking_timesecs

FROM (

   SELECT sw.vdsi, ISNULL(SUM(sw.qp)/100,0) timesecs

   FROM consw_#DBID# sw

   INNER JOIN conev_#DBID# ev ON sw.keeq = ev.id

   WHERE sw.d >= DATEADD(n, -#FREQUENCY#, CURRENT_TIMESTAMP)

   AND ev.name LIKE 'LCK%'

   GROUP BY sw.vdsi) sw

Oracle Repository Statement:

SELECT NVL(AVG(timesecs), 0) avg_blocking_timesecs

FROM (

   SELECT sw.vdsi, NVL(SUM(sw.qp)/100,0) timesecs

   FROM consw_#DBID# sw

   INNER JOIN conev_#DBID# ev ON sw.keeq = ev.id

   WHERE sw.d >= CURRENT_TIMESTAMP - (#FREQUENCY# / 1440)

   AND ev.name LIKE 'enq%'

   GROUP BY sw.vdsi) sw


Execute Against: Repository

Units: Average Blocking Time (secs)

High Threshold: Min 30, Max empty

Medium Threshold: Min 15, Max 30

Alternative

Instead of summing the time with:

SELECT COALESCE(SUM(timesecs) / COUNT(1), 0) avg_blocking_timesecs

you could also use a MAX function to get the worst case blocking scenario with:

SELECT COALESCE(MAX(timesecs),0) max_blocking_timesecs

  • Every night, SSIS runs a procedure called SSISDB.internal.cleanup_server_retention_window, which blocks executions of SSISDB.internal.append_event_message for several minutes. I would like to add a WHERE clause to this custom alert query to filter out these nightly events. Which table and column name would I need to check for that procedure name?

    Thanks,

    Mark

  • Mark, is your repo on SQL or Oracle?  If you can grab the SQL hash value that we assign in the tool, then we can exclude that hash by referencing the consw_id.izho column.  So something like where sw.izho <> hash_number

  • Thanks for the quick response! I'm on SQL Server.

    I can add "AND sw.izho <> 6041431186" to the WHERE clause, for example, but if the hash changes due to a restart or memory flush, etc., this won't help. Also, the hash will be different on each server. Is there a join I can add that will get me the procedure name? I don't know the Ignite repository.

  • You are correct that if the statement changes, it will get a different hash.  Regarding joining for the procedure name, try joining to the const_id table using the consw.izho and const.h columns.  In const, you should be able to pull the pname column for the SP.

  • That didn't seem to work, but I put in a hack that does. I added "AND CONVERT(TIME, GETDATE()) > CONVERT(TIME, '00:10')" to the query to suppress the alert for this situation, which always occurs between midnight and 12:10am. I'd prefer something more targeted, but this will get me by until I can install the SP2 for SQL Server 2012, which should fix the root cause of the problem.

    References:

  • I followed the instructions when setting up the alert, but it's not working at all -- I set the threshold to 1 second to test, and no alerts. I cut and pasted the Oracle code because that's our repository platform. I tried running the sql provided at the command line and sqlplus doesn't like the "#" symbols -- are we supposed to substitute what's in between the pound symbols with values?  Let me know, thanks.

  • I think the issue is that the script had like 'LCK%' as the wait and with Oracle, that should have been 'enq%'.

    I've edited the alert on Thwack.  Should be like below.  Let me know if this still errors.

    If you want to test, you can run a "select id, name from cond" and substitute the id number for #DBID#, example:  conev_1

    SELECT NVL(AVG(timesecs), 0) avg_blocking_timesecs

    FROM (

       SELECT sw.vdsi, NVL(SUM(sw.qp)/100,0) timesecs

       FROM consw_#DBID# sw

       INNER JOIN conev_#DBID# ev ON sw.keeq = ev.id

       WHERE sw.d >= CURRENT_TIMESTAMP - (#FREQUENCY# / 1440)

       AND ev.name LIKE 'enq%'

       GROUP BY sw.vdsi) sw

  • Thanks mandevil -- that did  the trick! emoticons_happy.png

  • Here is an example that alerts on any blocked session greater than 60 seconds for a given day between 6am and 8pm and then alerts on any blocked session above 300 seconds from 8pm to 6am.  If the return value is true the alert will return a 1 so you'll need to make sure your Min high value is set to 1.  Just another example of how you can turn your DPA alerts to be multi-conditional.

    DECLARE

    @MAX_BLK_TIME INT,

    @HOUR INT,

    @THRESHOLD INT

    BEGIN

    SELECT @HOUR = DATEPART(HOUR,CURRENT_TIMESTAMP);

    SELECT @MAX_BLK_TIME = COALESCE(MAX(TIMESECS), 0)

    FROM (

    SELECT SW.VDSI, ISNULL(SUM(SW.QP)/100,0) TIMESECS

    FROM CONSW_#DBID# SW

    INNER JOIN CONEV_#DBID# EV ON SW.KEEQ = EV.ID

    WHERE CAST(SW.D AS DATE) >= CAST(GETDATE() AS DATE)

    AND EV.NAME LIKE 'LCK%'

    GROUP BY SW.VDSI) SW;

    -- SET THE THRESHOLD TO 60 SECONDS BETWEEN 6AM TO 8PM ELSE SET IT TO 300

    IF (@HOUR >= 6 AND @HOUR <= 20)

                                    SELECT @THRESHOLD = 60

    ELSE

                                    SELECT @THRESHOLD = 300;

    IF @MAX_BLK_TIME >= @THRESHOLD

                                    SELECT 1 --RETURN A 1 IF BLOCKING IS GREATER THEN 60 OR 300

                    ELSE

    SELECT 0

    END

  • ashrecksiedler This is interesting actually I had seen an example where something like this was done with the built-in alert "Total Blocking Wait Time." Different values were needed for some times of the day so two copies of the alert were made with blackouts for the alert done on whatever one had the incorrect amount of blocking for a period.

    I like this solution much better.