Alerting Broken After 2023.2 Upgrade

We upgraded our instance from 2023.1.1 to 2023.2. After the upgrade 30% of our alerts are broken and not firing, mostly with our Component based alerts. The Alerting.Service Log shows a conversion failed SQL exception for the alert triggers: 

2023-04-28 15:04:43,276 [35] WARN SolarWinds.Orion.Core.Alerting.Plugins.Conditions.Swql.ConditionEvaluatorSwql - Condition evaluation failed : RunQuery failed, check fault information.
Conversion failed when converting the nvarchar value 'net-snmp' to data type int.
2023-04-28 15:04:43,276 [35] ERROR SolarWinds.Orion.Core.Alerting.Service.ConditionsStateEvaluator - Condition 'AlertId: 326, AlertLastEdit: 4/5/2023 1:32:36 PM, ConditionIndex: 0, Type: Trigger' Evaluator failed - Condition evaluation failed for query = (SELECT E0.[Uri], E0.[DisplayName]
FROM Orion.APM.Component AS E0
WHERE ( ( ( E0.[Application].[Node].[Status] = @p0*1 ) AND ( E0.[Status] != @p1*1 ) AND ( E0.[Status] != @p2*1 ) AND ( E0.[Status] != @p3*1 ) AND ( E0.[Status] != @p4*1 ) AND ( E0.[Application].[Node].[CustomProperties].[OPS_Targeted_Alert_Node] = @p5*1 ) AND ( E0.[ComponentAlert].[UserNotes] NOT LIKE @p6 ) AND ( E0.[ComponentAlert].[UserNotes] NOT LIKE @p7 ) AND ( E0.[Application].[Node].[CustomProperties].[OPS_Targeted_Non_Crt_Node] = @p8*1 ) AND ( E0.[Application].[ApplicationAlert].[ApplicationName] LIKE @p9 ) AND ( ( E0.[Application].[Node].[Vendor] = @p1*10 ) OR ( E0.[Application].[Node].[Vendor] = @p1*11 ) OR ( E0.[Application].[Node].[Vendor] = @p1*12 ) ) ) AND ( ( E0.[Status] != @p1*13 ) ) )), condition = (AlertConditionDynamic: scope=(
([Orion.Nodes|Status|Application.Node] = '1')
AND ([Orion.APM.Component|Status] != '27')
AND ([Orion.APM.Component|Status] != '9')
AND ([Orion.APM.Component|Status] != '3')
AND ([Orion.APM.Component|Status] != '0')
AND ([Orion.NodesCustomProperties|OPS_Targeted_Alert_Node|Application.Node.CustomProperties] = '1')
AND ([Orion.APM.ComponentAlert|UserNotes|ComponentAlert] NOTCONTAINS 'NonCritcal:')
AND ([Orion.APM.ComponentAlert|UserNotes|ComponentAlert] NOTCONTAINS 'Serious:')
AND ([Orion.NodesCustomProperties|OPS_Targeted_Non_Crt_Node|Application.Node.CustomProperties] = '0')
AND ([Orion.APM.ApplicationAlert|ApplicationName|Application.ApplicationAlert] CONTAINS 'OPS Telnet - EDI Proxy Ports')
AND (
([Orion.Nodes|Vendor|Application.Node] = 'net-snmp')
OR ([Orion.Nodes|Vendor|Application.Node] = 'Sun Microsystems')
OR ([Orion.Nodes|Vendor|Application.Node] = 'Unknown')
)
): (OR ([Orion.APM.Component|Status] != '1'))) - System.ServiceModel.FaultException`1[SolarWinds.InformationService.Contract2.InfoServiceFaultContract]: RunQuery failed, check fault information.
Conversion failed when converting the nvarchar value 'net-snmp' to data type int. (Fault Detail is equal to InfoServiceFaultContract [ System.Data.SqlClient.SqlException (0x80131904): Conversion failed when converting the nvarchar value 'net-snmp' to data type int.
at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
at System.Data.SqlClient.SqlDataReader.TryHasMoreRows(Boolean& moreRows)
at System.Data.SqlClient.SqlDataReader.TryReadInternal(Boolean setTimeout, Boolean& more)
at System.Data.SqlClient.SqlDataReader.Read()
at SolarWinds.InformationService.DataProviders.SqlQueryRelation.<GetEnumerator>d__8.MoveNext()
at SolarWinds.Data.Query.PhysicalQueryPlan.Provider...). 

Debug for the Alerting Log shows missing entities when alerts are fired: 

2023-04-28 14:08:57,850 [48] DEBUG SolarWinds.Orion.Core.Alerting.Service.ConditionsStateEvaluator - EvaluateScheduled: nothing to evaluate, exiting

2023-04-28 14:08:57,850 [36] DEBUG SolarWinds.Orion.Core.Alerting.Service.ConditionsStateEvaluator - EvaluateScheduled: nothing to evaluate, exiting

2023-04-28 14:08:57,909 [46] DEBUG SolarWinds.Orion.Core.Common.ChannelProxy`1 - Invoking <Query>b__0 finished

2023-04-28 14:08:57,909 [46] DEBUG SolarWinds.Orion.Core.Alerting.Plugins.Conditions.Swql.ConditionEvaluatorSwql - } Start exited

2023-04-28 14:08:57,909 [46] DEBUG SolarWinds.Orion.Core.Alerting.Service.ConditionsStateEvaluator - Condition Evaluator OnNext (AlertId: 244, AlertLastEdit: 7/12/2019 6:30:24 PM, ConditionIndex: 0, Type: Trigger)

2023-04-28 14:08:57,910 [46] DEBUG SolarWinds.Orion.Core.Common.InformationService.SwisSchemaProvider - Missing entity from navigation SolarWinds.Orion.Core.Common.InformationService.SwisSchemaProvider+RelationsSearchItem (Orion.APM.Application) -> Orion.DPA.DatabaseInstance

2023-04-28 14:08:57,910 [46] DEBUG SolarWinds.Orion.Core.Common.InformationService.SwisSchemaProvider - Missing entity from navigation SolarWinds.Orion.Core.Common.InformationService.SwisSchemaProvider+RelationsSearchItem (Orion.APM.Application) -> Orion.DPA.DatabaseInstance

2023-04-28 14:08:57,910 [46] DEBUG SolarWinds.Orion.Core.Common.InformationService.SwisSchemaProvider - Missing entity from navigation SolarWinds.Orion.Core.Common.InformationService.SwisSchemaProvider+RelationsSearchItem (Orion.APM.Application) -> Orion.DPA.DatabaseInstance

2023-04-28 14:08:57,910 [46] DEBUG SolarWinds.Orion.Core.Common.InformationService.SwisSchemaProvider - Missing entity from navigation SolarWinds.Orion.Core.Common.InformationService.SwisSchemaProvider+RelationsSearchItem (Orion.APM.Application) -> Orion.DPA.DatabaseInstance

2023-04-28 14:08:57,910 [46] DEBUG SolarWinds.Orion.Core.Common.InformationService.SwisSchemaProvider - Missing entity from navigation SolarWinds.Orion.Core.Common.InformationService.SwisSchemaProvider+RelationsSearchItem (Orion.APM.Application) -> Orion.DPA.DatabaseInstance

2023-04-28 14:08:57,910 [46] DEBUG SolarWinds.Orion.Core.Common.InformationService.SwisSchemaProvider - Missing entity from navigation SolarWinds.Orion.Core.Common.InformationService.SwisSchemaProvider+RelationsSearchItem (Orion.APM.Application) -> Orion.DPA.DatabaseInstance

2023-04-28 14:08:57,910 [46] DEBUG SolarWinds.Orion.Core.Common.InformationService.SwisSchemaProvider - Missing entity from navigation SolarWinds.Orion.Core.Common.InformationService.SwisSchemaProvider+RelationsSearchItem (Orion.APM.Application) -> Orion.DPA.DatabaseInstanceApplication

2023-04-28 14:08:57,910 [46] DEBUG SolarWinds.Orion.Core.Common.InformationService.SwisSchemaProvider - Missing entity from navigation SolarWinds.Orion.Core.Common.InformationService.SwisSchemaProvider+RelationsSearchItem (Orion.APM.Application) -> Orion.DPA.DatabaseInstanceClientApplication

When creating new component based alerts and mirroring our old alerts, we can no longer select Application as a trigger condition, Error: "Missing field ApplicationName in Orion.APM.ApplicationAlert"

The only fix is to re-create alerts using Application instead of Component and re-writing the email alerts. Everything was running great on 2023.1.1 and we were pleased with the product. We have an open case with SolarWinds to look at this issue, support seems to be stumped at the moment. 

Also, after upgrading to 2023.2 WPM monitors started to flap, we lost our worker configuration on our players, and that module has become very noisy. Re-recording transactions, adding wait times, resolution, image match adjustments, etc. does not correct the issue. We have an open ticket for this issue as well. 

We thought this 2023.2 upgrade was going to be the same as the 2023.1 and 2023.1.1 upgrades that completed successfully without issue. The only reason we wanted to get to 2023.2 is to address the UTC Bug for last reboot that end users were complaining about, of course that led to the system being down with alerting broken. We have made a decision to wait to preform platform upgrades for at least 6 months due to these issues we are seeing. 

Parents
  • We had a similar issue, many of our most critical alerts stopped working. It was related to the bit in the error:

    Conversion failed when converting the nvarchar value 'net-snmp' to data type int. (Fault Detail is equal to InfoServiceFaultContract [ System.Data.SqlClient.SqlException (0x80131904): Conversion failed when converting the nvarchar value 'net-snmp' to data type int.

    In our case we were able to replace custom property that was causing that error with something that would parse as an integer rather than a string. I'd like to think we were lucky that we were able to do that. But ultimately nothing was coming up under after clicking "Only following set of object (Show List)" until we adjusted our custom property. 

    They definitely broke something in this release. 

  • After we reverted our system to 2023.1.1 on Friday (VM Restores & Database restore) we found that our WPM transactions stopped flapping and alerting was working correctly again. Of course, it took us countless hours over the past couple of days to get all of the changes re-applied to the system since we initially upgraded to 2023.2 (15 days ago) support was not much help, in correcting the issue just the typical links to outdated support/troubleshooting documents. 

    We will not be upgrading the system to any future releases and are making a decision as an organization if we want to continue with the product. Our company has been using SolarWinds over the past 10 years, and over the past 3 years product development and support has been failing at a rapid rate. We were promised this year that SolarWinds would make drastic improvements to the product for more cloud monitoring, UI enhancements, etc. although this is not looking very promising. 

Reply
  • After we reverted our system to 2023.1.1 on Friday (VM Restores & Database restore) we found that our WPM transactions stopped flapping and alerting was working correctly again. Of course, it took us countless hours over the past couple of days to get all of the changes re-applied to the system since we initially upgraded to 2023.2 (15 days ago) support was not much help, in correcting the issue just the typical links to outdated support/troubleshooting documents. 

    We will not be upgrading the system to any future releases and are making a decision as an organization if we want to continue with the product. Our company has been using SolarWinds over the past 10 years, and over the past 3 years product development and support has been failing at a rapid rate. We were promised this year that SolarWinds would make drastic improvements to the product for more cloud monitoring, UI enhancements, etc. although this is not looking very promising. 

Children
No Data