Alerting Broken After 2023.2 Upgrade

We upgraded our instance from 2023.1.1 to 2023.2. After the upgrade 30% of our alerts are broken and not firing, mostly with our Component based alerts. The Alerting.Service Log shows a conversion failed SQL exception for the alert triggers: 

2023-04-28 15:04:43,276 [35] WARN SolarWinds.Orion.Core.Alerting.Plugins.Conditions.Swql.ConditionEvaluatorSwql - Condition evaluation failed : RunQuery failed, check fault information.
Conversion failed when converting the nvarchar value 'net-snmp' to data type int.
2023-04-28 15:04:43,276 [35] ERROR SolarWinds.Orion.Core.Alerting.Service.ConditionsStateEvaluator - Condition 'AlertId: 326, AlertLastEdit: 4/5/2023 1:32:36 PM, ConditionIndex: 0, Type: Trigger' Evaluator failed - Condition evaluation failed for query = (SELECT E0.[Uri], E0.[DisplayName]
FROM Orion.APM.Component AS E0
WHERE ( ( ( E0.[Application].[Node].[Status] = @p0*1 ) AND ( E0.[Status] != @p1*1 ) AND ( E0.[Status] != @p2*1 ) AND ( E0.[Status] != @p3*1 ) AND ( E0.[Status] != @p4*1 ) AND ( E0.[Application].[Node].[CustomProperties].[OPS_Targeted_Alert_Node] = @p5*1 ) AND ( E0.[ComponentAlert].[UserNotes] NOT LIKE @p6 ) AND ( E0.[ComponentAlert].[UserNotes] NOT LIKE @p7 ) AND ( E0.[Application].[Node].[CustomProperties].[OPS_Targeted_Non_Crt_Node] = @p8*1 ) AND ( E0.[Application].[ApplicationAlert].[ApplicationName] LIKE @p9 ) AND ( ( E0.[Application].[Node].[Vendor] = @p1*10 ) OR ( E0.[Application].[Node].[Vendor] = @p1*11 ) OR ( E0.[Application].[Node].[Vendor] = @p1*12 ) ) ) AND ( ( E0.[Status] != @p1*13 ) ) )), condition = (AlertConditionDynamic: scope=(
([Orion.Nodes|Status|Application.Node] = '1')
AND ([Orion.APM.Component|Status] != '27')
AND ([Orion.APM.Component|Status] != '9')
AND ([Orion.APM.Component|Status] != '3')
AND ([Orion.APM.Component|Status] != '0')
AND ([Orion.NodesCustomProperties|OPS_Targeted_Alert_Node|Application.Node.CustomProperties] = '1')
AND ([Orion.APM.ComponentAlert|UserNotes|ComponentAlert] NOTCONTAINS 'NonCritcal:')
AND ([Orion.APM.ComponentAlert|UserNotes|ComponentAlert] NOTCONTAINS 'Serious:')
AND ([Orion.NodesCustomProperties|OPS_Targeted_Non_Crt_Node|Application.Node.CustomProperties] = '0')
AND ([Orion.APM.ApplicationAlert|ApplicationName|Application.ApplicationAlert] CONTAINS 'OPS Telnet - EDI Proxy Ports')
AND (
([Orion.Nodes|Vendor|Application.Node] = 'net-snmp')
OR ([Orion.Nodes|Vendor|Application.Node] = 'Sun Microsystems')
OR ([Orion.Nodes|Vendor|Application.Node] = 'Unknown')
)
): (OR ([Orion.APM.Component|Status] != '1'))) - System.ServiceModel.FaultException`1[SolarWinds.InformationService.Contract2.InfoServiceFaultContract]: RunQuery failed, check fault information.
Conversion failed when converting the nvarchar value 'net-snmp' to data type int. (Fault Detail is equal to InfoServiceFaultContract [ System.Data.SqlClient.SqlException (0x80131904): Conversion failed when converting the nvarchar value 'net-snmp' to data type int.
at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
at System.Data.SqlClient.SqlDataReader.TryHasMoreRows(Boolean& moreRows)
at System.Data.SqlClient.SqlDataReader.TryReadInternal(Boolean setTimeout, Boolean& more)
at System.Data.SqlClient.SqlDataReader.Read()
at SolarWinds.InformationService.DataProviders.SqlQueryRelation.<GetEnumerator>d__8.MoveNext()
at SolarWinds.Data.Query.PhysicalQueryPlan.Provider...). 

Debug for the Alerting Log shows missing entities when alerts are fired: 

2023-04-28 14:08:57,850 [48] DEBUG SolarWinds.Orion.Core.Alerting.Service.ConditionsStateEvaluator - EvaluateScheduled: nothing to evaluate, exiting

2023-04-28 14:08:57,850 [36] DEBUG SolarWinds.Orion.Core.Alerting.Service.ConditionsStateEvaluator - EvaluateScheduled: nothing to evaluate, exiting

2023-04-28 14:08:57,909 [46] DEBUG SolarWinds.Orion.Core.Common.ChannelProxy`1 - Invoking <Query>b__0 finished

2023-04-28 14:08:57,909 [46] DEBUG SolarWinds.Orion.Core.Alerting.Plugins.Conditions.Swql.ConditionEvaluatorSwql - } Start exited

2023-04-28 14:08:57,909 [46] DEBUG SolarWinds.Orion.Core.Alerting.Service.ConditionsStateEvaluator - Condition Evaluator OnNext (AlertId: 244, AlertLastEdit: 7/12/2019 6:30:24 PM, ConditionIndex: 0, Type: Trigger)

2023-04-28 14:08:57,910 [46] DEBUG SolarWinds.Orion.Core.Common.InformationService.SwisSchemaProvider - Missing entity from navigation SolarWinds.Orion.Core.Common.InformationService.SwisSchemaProvider+RelationsSearchItem (Orion.APM.Application) -> Orion.DPA.DatabaseInstance

2023-04-28 14:08:57,910 [46] DEBUG SolarWinds.Orion.Core.Common.InformationService.SwisSchemaProvider - Missing entity from navigation SolarWinds.Orion.Core.Common.InformationService.SwisSchemaProvider+RelationsSearchItem (Orion.APM.Application) -> Orion.DPA.DatabaseInstance

2023-04-28 14:08:57,910 [46] DEBUG SolarWinds.Orion.Core.Common.InformationService.SwisSchemaProvider - Missing entity from navigation SolarWinds.Orion.Core.Common.InformationService.SwisSchemaProvider+RelationsSearchItem (Orion.APM.Application) -> Orion.DPA.DatabaseInstance

2023-04-28 14:08:57,910 [46] DEBUG SolarWinds.Orion.Core.Common.InformationService.SwisSchemaProvider - Missing entity from navigation SolarWinds.Orion.Core.Common.InformationService.SwisSchemaProvider+RelationsSearchItem (Orion.APM.Application) -> Orion.DPA.DatabaseInstance

2023-04-28 14:08:57,910 [46] DEBUG SolarWinds.Orion.Core.Common.InformationService.SwisSchemaProvider - Missing entity from navigation SolarWinds.Orion.Core.Common.InformationService.SwisSchemaProvider+RelationsSearchItem (Orion.APM.Application) -> Orion.DPA.DatabaseInstance

2023-04-28 14:08:57,910 [46] DEBUG SolarWinds.Orion.Core.Common.InformationService.SwisSchemaProvider - Missing entity from navigation SolarWinds.Orion.Core.Common.InformationService.SwisSchemaProvider+RelationsSearchItem (Orion.APM.Application) -> Orion.DPA.DatabaseInstance

2023-04-28 14:08:57,910 [46] DEBUG SolarWinds.Orion.Core.Common.InformationService.SwisSchemaProvider - Missing entity from navigation SolarWinds.Orion.Core.Common.InformationService.SwisSchemaProvider+RelationsSearchItem (Orion.APM.Application) -> Orion.DPA.DatabaseInstanceApplication

2023-04-28 14:08:57,910 [46] DEBUG SolarWinds.Orion.Core.Common.InformationService.SwisSchemaProvider - Missing entity from navigation SolarWinds.Orion.Core.Common.InformationService.SwisSchemaProvider+RelationsSearchItem (Orion.APM.Application) -> Orion.DPA.DatabaseInstanceClientApplication

When creating new component based alerts and mirroring our old alerts, we can no longer select Application as a trigger condition, Error: "Missing field ApplicationName in Orion.APM.ApplicationAlert"

The only fix is to re-create alerts using Application instead of Component and re-writing the email alerts. Everything was running great on 2023.1.1 and we were pleased with the product. We have an open case with SolarWinds to look at this issue, support seems to be stumped at the moment. 

Also, after upgrading to 2023.2 WPM monitors started to flap, we lost our worker configuration on our players, and that module has become very noisy. Re-recording transactions, adding wait times, resolution, image match adjustments, etc. does not correct the issue. We have an open ticket for this issue as well. 

We thought this 2023.2 upgrade was going to be the same as the 2023.1 and 2023.1.1 upgrades that completed successfully without issue. The only reason we wanted to get to 2023.2 is to address the UTC Bug for last reboot that end users were complaining about, of course that led to the system being down with alerting broken. We have made a decision to wait to preform platform upgrades for at least 6 months due to these issues we are seeing. 

Parents Reply Children