Alerting Broken After 2023.2 Upgrade

We upgraded our instance from 2023.1.1 to 2023.2. After the upgrade 30% of our alerts are broken and not firing, mostly with our Component based alerts. The Alerting.Service Log shows a conversion failed SQL exception for the alert triggers: 

2023-04-28 15:04:43,276 [35] WARN SolarWinds.Orion.Core.Alerting.Plugins.Conditions.Swql.ConditionEvaluatorSwql - Condition evaluation failed : RunQuery failed, check fault information.
Conversion failed when converting the nvarchar value 'net-snmp' to data type int.
2023-04-28 15:04:43,276 [35] ERROR SolarWinds.Orion.Core.Alerting.Service.ConditionsStateEvaluator - Condition 'AlertId: 326, AlertLastEdit: 4/5/2023 1:32:36 PM, ConditionIndex: 0, Type: Trigger' Evaluator failed - Condition evaluation failed for query = (SELECT E0.[Uri], E0.[DisplayName]
FROM Orion.APM.Component AS E0
WHERE ( ( ( E0.[Application].[Node].[Status] = @p0*1 ) AND ( E0.[Status] != @p1*1 ) AND ( E0.[Status] != @p2*1 ) AND ( E0.[Status] != @p3*1 ) AND ( E0.[Status] != @p4*1 ) AND ( E0.[Application].[Node].[CustomProperties].[OPS_Targeted_Alert_Node] = @p5*1 ) AND ( E0.[ComponentAlert].[UserNotes] NOT LIKE @p6 ) AND ( E0.[ComponentAlert].[UserNotes] NOT LIKE @p7 ) AND ( E0.[Application].[Node].[CustomProperties].[OPS_Targeted_Non_Crt_Node] = @p8*1 ) AND ( E0.[Application].[ApplicationAlert].[ApplicationName] LIKE @p9 ) AND ( ( E0.[Application].[Node].[Vendor] = @p1*10 ) OR ( E0.[Application].[Node].[Vendor] = @p1*11 ) OR ( E0.[Application].[Node].[Vendor] = @p1*12 ) ) ) AND ( ( E0.[Status] != @p1*13 ) ) )), condition = (AlertConditionDynamic: scope=(
([Orion.Nodes|Status|Application.Node] = '1')
AND ([Orion.APM.Component|Status] != '27')
AND ([Orion.APM.Component|Status] != '9')
AND ([Orion.APM.Component|Status] != '3')
AND ([Orion.APM.Component|Status] != '0')
AND ([Orion.NodesCustomProperties|OPS_Targeted_Alert_Node|Application.Node.CustomProperties] = '1')
AND ([Orion.APM.ComponentAlert|UserNotes|ComponentAlert] NOTCONTAINS 'NonCritcal:')
AND ([Orion.APM.ComponentAlert|UserNotes|ComponentAlert] NOTCONTAINS 'Serious:')
AND ([Orion.NodesCustomProperties|OPS_Targeted_Non_Crt_Node|Application.Node.CustomProperties] = '0')
AND ([Orion.APM.ApplicationAlert|ApplicationName|Application.ApplicationAlert] CONTAINS 'OPS Telnet - EDI Proxy Ports')
AND (
([Orion.Nodes|Vendor|Application.Node] = 'net-snmp')
OR ([Orion.Nodes|Vendor|Application.Node] = 'Sun Microsystems')
OR ([Orion.Nodes|Vendor|Application.Node] = 'Unknown')
)
): (OR ([Orion.APM.Component|Status] != '1'))) - System.ServiceModel.FaultException`1[SolarWinds.InformationService.Contract2.InfoServiceFaultContract]: RunQuery failed, check fault information.
Conversion failed when converting the nvarchar value 'net-snmp' to data type int. (Fault Detail is equal to InfoServiceFaultContract [ System.Data.SqlClient.SqlException (0x80131904): Conversion failed when converting the nvarchar value 'net-snmp' to data type int.
at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
at System.Data.SqlClient.SqlDataReader.TryHasMoreRows(Boolean& moreRows)
at System.Data.SqlClient.SqlDataReader.TryReadInternal(Boolean setTimeout, Boolean& more)
at System.Data.SqlClient.SqlDataReader.Read()
at SolarWinds.InformationService.DataProviders.SqlQueryRelation.<GetEnumerator>d__8.MoveNext()
at SolarWinds.Data.Query.PhysicalQueryPlan.Provider...). 

Debug for the Alerting Log shows missing entities when alerts are fired: 

2023-04-28 14:08:57,850 [48] DEBUG SolarWinds.Orion.Core.Alerting.Service.ConditionsStateEvaluator - EvaluateScheduled: nothing to evaluate, exiting

2023-04-28 14:08:57,850 [36] DEBUG SolarWinds.Orion.Core.Alerting.Service.ConditionsStateEvaluator - EvaluateScheduled: nothing to evaluate, exiting

2023-04-28 14:08:57,909 [46] DEBUG SolarWinds.Orion.Core.Common.ChannelProxy`1 - Invoking <Query>b__0 finished

2023-04-28 14:08:57,909 [46] DEBUG SolarWinds.Orion.Core.Alerting.Plugins.Conditions.Swql.ConditionEvaluatorSwql - } Start exited

2023-04-28 14:08:57,909 [46] DEBUG SolarWinds.Orion.Core.Alerting.Service.ConditionsStateEvaluator - Condition Evaluator OnNext (AlertId: 244, AlertLastEdit: 7/12/2019 6:30:24 PM, ConditionIndex: 0, Type: Trigger)

2023-04-28 14:08:57,910 [46] DEBUG SolarWinds.Orion.Core.Common.InformationService.SwisSchemaProvider - Missing entity from navigation SolarWinds.Orion.Core.Common.InformationService.SwisSchemaProvider+RelationsSearchItem (Orion.APM.Application) -> Orion.DPA.DatabaseInstance

2023-04-28 14:08:57,910 [46] DEBUG SolarWinds.Orion.Core.Common.InformationService.SwisSchemaProvider - Missing entity from navigation SolarWinds.Orion.Core.Common.InformationService.SwisSchemaProvider+RelationsSearchItem (Orion.APM.Application) -> Orion.DPA.DatabaseInstance

2023-04-28 14:08:57,910 [46] DEBUG SolarWinds.Orion.Core.Common.InformationService.SwisSchemaProvider - Missing entity from navigation SolarWinds.Orion.Core.Common.InformationService.SwisSchemaProvider+RelationsSearchItem (Orion.APM.Application) -> Orion.DPA.DatabaseInstance

2023-04-28 14:08:57,910 [46] DEBUG SolarWinds.Orion.Core.Common.InformationService.SwisSchemaProvider - Missing entity from navigation SolarWinds.Orion.Core.Common.InformationService.SwisSchemaProvider+RelationsSearchItem (Orion.APM.Application) -> Orion.DPA.DatabaseInstance

2023-04-28 14:08:57,910 [46] DEBUG SolarWinds.Orion.Core.Common.InformationService.SwisSchemaProvider - Missing entity from navigation SolarWinds.Orion.Core.Common.InformationService.SwisSchemaProvider+RelationsSearchItem (Orion.APM.Application) -> Orion.DPA.DatabaseInstance

2023-04-28 14:08:57,910 [46] DEBUG SolarWinds.Orion.Core.Common.InformationService.SwisSchemaProvider - Missing entity from navigation SolarWinds.Orion.Core.Common.InformationService.SwisSchemaProvider+RelationsSearchItem (Orion.APM.Application) -> Orion.DPA.DatabaseInstance

2023-04-28 14:08:57,910 [46] DEBUG SolarWinds.Orion.Core.Common.InformationService.SwisSchemaProvider - Missing entity from navigation SolarWinds.Orion.Core.Common.InformationService.SwisSchemaProvider+RelationsSearchItem (Orion.APM.Application) -> Orion.DPA.DatabaseInstanceApplication

2023-04-28 14:08:57,910 [46] DEBUG SolarWinds.Orion.Core.Common.InformationService.SwisSchemaProvider - Missing entity from navigation SolarWinds.Orion.Core.Common.InformationService.SwisSchemaProvider+RelationsSearchItem (Orion.APM.Application) -> Orion.DPA.DatabaseInstanceClientApplication

When creating new component based alerts and mirroring our old alerts, we can no longer select Application as a trigger condition, Error: "Missing field ApplicationName in Orion.APM.ApplicationAlert"

The only fix is to re-create alerts using Application instead of Component and re-writing the email alerts. Everything was running great on 2023.1.1 and we were pleased with the product. We have an open case with SolarWinds to look at this issue, support seems to be stumped at the moment. 

Also, after upgrading to 2023.2 WPM monitors started to flap, we lost our worker configuration on our players, and that module has become very noisy. Re-recording transactions, adding wait times, resolution, image match adjustments, etc. does not correct the issue. We have an open ticket for this issue as well. 

We thought this 2023.2 upgrade was going to be the same as the 2023.1 and 2023.1.1 upgrades that completed successfully without issue. The only reason we wanted to get to 2023.2 is to address the UTC Bug for last reboot that end users were complaining about, of course that led to the system being down with alerting broken. We have made a decision to wait to preform platform upgrades for at least 6 months due to these issues we are seeing. 

  • Try this - On the Scope or Trigger condition for a broken one, there's a drop down arrow on the right. Press that, export the SWQL

    Paste into notepad or whatever, and add line-breaks until it's easy to read
    Compare it to the UI carefully. One of the conditions has probably been translated from UI into code badly. It'll look mostly unrelated. Mine was:

            AND ( E0.[Caption] NOT  LIKE '10'*10 )

    Might be fine to switch to SWQL alerts after killing whatever fault's popped up

  • Thats epic, I was able to address the custom property that was complaining, but this is exactly what was happening when reviewing my alerts. Many kudos!

  • This was my Monday/Tuesday!
    Could you pass me your refs and what the offending line looked like so I can convince my support tech to bump it far up the chain?

  • Does anyone have a Case number? I want to track that this is fixed in the next release before we upgrade. There should be an easier mechanism to track these issues for the community.

  • 01349560 - Mine, now I know more it's not got the best title, it'll do, troubleshooting info on it is solid

  • We opened up case 01354893, since we were able to resolve our issue, they archived it. 

    Main thing we were getting errors about was our drop down custom property "tier" which had the values (1,2,3,4, and tierless) we replaced "tierless" with "5" and that was enough to fix all of our alerts.

    SWQL the gui generated:

    SELECT E0.[Uri], E0.[DisplayName]
    FROM Orion.Nodes AS E0
    WHERE ( ( ( E0.[CustomProperties].[Alerts_Suppressed] = 'False' ) AND ( E0.[CustomProperties].[Tier] = '3'*1 ) ) AND ( ( E0.[Status] = '2'*1 ) ) )

    error message from the alertengine v2 logs:

    2023-05-09 05:35:51,017 [43] ERROR SolarWinds.Orion.Core.Alerting.Service.ConditionsStateEvaluator - Condition 'AlertId: 734, AlertLastEdit: 4/28/2023 5:50:25 PM, ConditionIndex: 0, Type: Trigger' Evaluator failed - Condition evaluation failed for query = (SELECT E0.[Uri], E0.[DisplayName]
    FROM Orion.Nodes AS E0 
    WHERE ( ( ( E0.[CustomProperties].[Alerts_Suppressed] = @p0*1 ) AND ( E0.[CustomProperties].[Tier] = @p1*1 ) ) AND ( ( E0.[Status] = @p2*1 ) ) )), condition = (AlertConditionDynamic: scope=(
      ([Orion.NodesCustomProperties|Alerts_Suppressed|CustomProperties] = '0')
      AND ([Orion.NodesCustomProperties|Tier|CustomProperties] = '3')
    ): (AND ([Orion.Nodes|Status] = '2'))) - System.ServiceModel.FaultException`1[SolarWinds.InformationService.Contract2.InfoServiceFaultContract]: Query failed, check fault information.
    Conversion failed when converting the nvarchar value 'Tierless' to data type int. (Fault Detail is equal to InfoServiceFaultContract [ System.Data.SqlClient.SqlException (0x80131904): Conversion failed when converting the nvarchar value 'Tierless' to data type int.
       at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
       at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
       at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
       at System.Data.SqlClient.SqlDataReader.TryHasMoreRows(Boolean& moreRows)
       at System.Data.SqlClient.SqlDataReader.TryReadInternal(Boolean setTimeout, Boolean& more)
       at System.Data.SqlClient.SqlDataReader.Read()
       at SolarWinds.InformationService.DataProviders.SqlQueryRelation.<GetEnumerator>d__8.MoveNext()
       at SolarWinds.Data.Query.PhysicalQueryPlan.Provider...).

  • amazing - found this 2 hours before we were to roll back. Have a fantastic weekend Smiley

  • Though the case has ben closed it has still been sent to development for further investigation to see if its a trending issue

  • We are experiencing the WPM issue as well.   Interesting that they told you (2 weeks ago based on the timestamp of your posting) there was a known bug, but so far they have said nothing to us.   We don't have issues with alerting - other than the flapping of the WPM transactions.  Are you able so share the ticket #?

    Thanks