So....
We use advanced alerting for our network equipment for Node Up/Down and LastBoot value changed. Been working fine for well over a year.
We had an outage yesterday... and heard it first from our customer rather than Orion. Turns out Orion never sent an alert.
Looking at the Solarwinds Event Viewer, saw that it looks like the Alerting Manager had an issue connecting to the Orion Database:
[AlertCheckingThread] WARN Error - Error in UpdateRowsThatAreTriggered() - GetNetObjects
System.Data.SqlClient.SqlException: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection)
at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection)
at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj)
at System.Data.SqlClient.TdsParser.Run(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj)
at System.Data.SqlClient.SqlDataReader.ConsumeMetaData()
at System.Data.SqlClient.SqlDataReader.get_MetaData()
at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString)
at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async)
at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method, DbAsyncResult result)
at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method)
at System.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior, String method)
at System.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior)
at AlertingEngine.CheckAlert.UpdateRowsThatAreTriggered()
Our SQL Server was up and running the whole time, happily serving databases to many, many other applications.
Not sure where to start with this. Any ideas?
I bounced the NPM server and upon reboot got my lastboot alerts I would have expected... yesterday.