We have been seeing an issue where alerts are delayed, it's like the alerting engine itself is just backed up.
Other observations
(1) Polling Completion Rate is 100% across 10 pollers.
(2) Database Syncs < 30seconds, worst case
(3) Element Count across pollers < 8800 across all pollers
(4) Changed max pool size (From 1000 to 3000) in the NetPerfMon.db
(5) Cleaned up the AlertObjects table where we had NULL values
(6) Disabled all out-of-the-box canned alerts (had a few stragglers)
(7) Did the normal cleanup that we have ALL done
UPDATE [Limitations] SET WhereClause = REPLACE(REPLACE(REPLACE(CAST(WhereClause AS varchar(max)), '( (', ' ( ( '), '((', ' ( ( '),'))',' ) ) ')
DELETE FROM [Limitations] WHERE WhereClause = '1=1'
DELETE FROM [LimitationSnapShots]
DELETE FROM [ContainerMemberSnapshots]
DELETE FROM [PendingNotifications]
DELETE FROM [SubscriptionTags]
DELETE FROM [Subscriptions] WHERE EndpointAddress NOT LIKE 'http%'
(8) Changed the AlertEngine-OverLoadCounter threshold
SELECT TOP 1000 * FROM [dbo].[Settings]
where settingID IN ('AlertEngine-OverloadCounterTimeWindowSeconds',
'AlertEngine-OverloadCounterThreshold')
Change the AlertEngine-OverloadCounter Threshold from 120 to 150
(9) Disabled and optimized all the AppInsight stuff
(10) Per development
Modify External Component Critical
* Change "External - Component Status (Critical)" / Trigger Actions / Log Alert / File Size from (0, unlimited) to 1MB.
(11) We also found where one of our core applications (big) was not muting the alerts every night when they performed nightly maintenance (this was killing us), fixed this.
Development has been able to recreate the issue and has established this as a bug supposed to be addressed in 2024.1 (our hope).
My question is, anyone else seeing this behavior and if you have, what did you do?
As always, thanks for any input and/or advice.