This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

NPM upgrade 11.5.2 experiences...

Not as happy as I would like.

It seems the information service is quite fragile, in particular it does not seem to recover well if a transaction takes too long to complete or if an exception occurs.

Custom limitations built using the [windows based] account limitation builder are broken: Anything based on custom properties are not handled properly.

NetObjectDowntime tracking -- this has a smalldatetime field, so two updates in the same minute will cause a primary key violation (e.g. if a node goes into warning, and then into 'up' due to a delayed ping)

I'm not seeing an event logged in this case, so it's hard to work out what is going wrong.

2015-10-14 11:50:43,761 [20] ERROR SolarWinds.Orion.Core.BusinessLayer.DowntimeMonitoring.DowntimeMonitoringNotificationSubscriber - Exception occured when processing incoming indication of type "System.InstanceModified"

System.Data.SqlClient.SqlException (0x80131904): Violation of PRIMARY KEY constraint 'PK_ObjectDownTime'. Cannot insert duplicate key in object 'dbo.NetObjectDowntime'. The duplicate key value is (Oct 14 2015  6:51PM, 9504, Orion.Nodes).

The statement has been terminated.

   at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)

   at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)

   at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)

   at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)

   at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString)

   at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async, Int32 timeout, Task& task, Boolean asyncWrite, SqlDataReader ds)

   at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method, TaskCompletionSource`1 completion, Int32 timeout, Task& task, Boolean asyncWrite)

   at System.Data.SqlClient.SqlCommand.InternalExecuteNonQuery(TaskCompletionSource`1 completion, String methodName, Boolean sendToPipe, Int32 timeout, Boolean asyncWrite)

   at System.Data.SqlClient.SqlCommand.ExecuteNonQuery()

   at SolarWinds.Orion.Common.SqlHelper.ExecuteNonQuery(SqlCommand command, SqlConnection connection, SqlTransaction transaction)

   at SolarWinds.Orion.Core.Common.DALs.NetObjectDowntimeDAL.Insert(NetObjectDowntime item)

   at SolarWinds.Orion.Core.BusinessLayer.DowntimeMonitoring.DowntimeMonitoringNotificationSubscriber.OnIndication(String subscriptionId, String indicationType, PropertyBag indicationProperties, PropertyBag sourceInstanceProperties)

ClientConnectionId:0ba46c82-f129-4b44-9cc1-d248a2193d00

Error Number:2627,State:1,Class:14

  • I'm still holding off on this upgrade.......

    Hope you get everything sorted.

  • I'm getting this error every night at the same time, it causes a 10 minute gap in the polling.

    2015-10-30 03:21:12,468 [58] ERROR SolarWinds.Orion.Core.BusinessLayer.DowntimeMonitoring.DowntimeMonitoringNotificationSubscriber - Exception occured when processing incoming indication of type "System.InstanceModified"

    System.Data.SqlClient.SqlException (0x80131904): Violation of PRIMARY KEY constraint 'PK_ObjectDownTime'. Cannot insert duplicate key in object 'dbo.NetObjectDowntime'. The duplicate key value is (Oct 30 2015  7:21AM, 1486, Orion.Volumes).

    The statement has been terminated.

       at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)

       at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)

       at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)

       at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)

       at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString)

       at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async, Int32 timeout, Task& task, Boolean asyncWrite, SqlDataReader ds)

       at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method, TaskCompletionSource`1 completion, Int32 timeout, Task& task, Boolean asyncWrite)

       at System.Data.SqlClient.SqlCommand.InternalExecuteNonQuery(TaskCompletionSource`1 completion, String methodName, Boolean sendToPipe, Int32 timeout, Boolean asyncWrite)

       at System.Data.SqlClient.SqlCommand.ExecuteNonQuery()

       at SolarWinds.Orion.Common.SqlHelper.ExecuteNonQuery(SqlCommand command, SqlConnection connection, SqlTransaction transaction)

       at SolarWinds.Orion.Core.Common.DALs.NetObjectDowntimeDAL.Insert(NetObjectDowntime item)

       at SolarWinds.Orion.Core.BusinessLayer.DowntimeMonitoring.DowntimeMonitoringNotificationSubscriber.OnIndication(String subscriptionId, String indicationType, PropertyBag indicationProperties, PropertyBag sourceInstanceProperties)

    ClientConnectionId:c0501b21-92dd-42f2-8483-9abfa4fb6271

    Error Number:2627,State:1,Class:14

    Case# 890728.

  • Hmmmm, may have to hold off on the upgrade.

  • Based on the size of your environment (if my memory serves me correct) you should reach out to support for a Buddy Drop related to HF5. SWISv3 has a memory leak that is only evident in very large environments.  It is fixed, but that might help. 

    Look, I'm a Solarwinds fanboy.  Have been since the first day I used it 8+ years ago, but the upgrade to Orion 2015 (NPM 11.5.x/SAM 6.2.x/VMAN, etc.) was far from painless. I know it was a monster leap for them to get the framework rejiggered for future growth, but it was meant many long nights over the past 6 weeks for me.

    PS: nickzourdos‌, I get the same entry in the logs. We also have an open incident for it.

  • Same problems.

    Additionally, we had crashes due to automatic report creation.

    They just released a HotFix6 for Orion Platform.

    I'm going to apply this soon.

  • HF6 implements those memory leak fixes.  Definitely worth getting is applied, even if you applied the buddy drop for HF5.

  • nickzourdos​ and jbiggley I know this is almost a year later, but do you remember if you were able to clean this up?

  • Yes!

    Error/Issue #1

    2015-10-30 03:33:16,624 [Scheduler] ERROR SolarWinds.Orion.Core.Common.EngineHelper - System.Data.SqlClient.SqlException (0x80131904): The transaction log for database 'OrionNPM' is full due to 'ACTIVE_TRANSACTION'.

    Action Plan:

    SQL is out of our Product Support, if you have a DBA he/she can easily fix this problem. I am not a DBA but I did a little searching on the topic and here is what I found:

    https://social.technet.microsoft.com/Forums/sqlserver/en-US/2087eb9c-7e7a-4688-8e4b-be9ae4fa13f9/transaction-log-is-full-due-to-activetransaction?forum=sqlintegrationservices

    http://stackoverflow.com/questions/17674973/the-transaction-log-for-the-database-is-full

    Please work with your DBA to completely resolve the problem of the transaction log for database 'OrionNPM' is full due to 'ACTIVE_TRANSACTION'.

    Error/Issue #2

    Problem/From-Orion_Logs:

    ERROR SolarWinds.Orion.Core.BusinessLayer.DowntimeMonitoring.DowntimeMonitoringNotificationSubscriber - Exception occurred when processing incoming indication of type "System.InstanceModified"

    1. System.Data.SqlClient.SqlException (0x80131904): Violation of PRIMARY KEY constraint 'PK_ObjectDownTime'. Cannot insert duplicate key in object 'dbo.NetObjectDowntime'. The duplicate key value is (Oct 30 2015  7:43AM, 1488, Orion.Volumes).

    Action Plan:

    This section of the log is telling us that inside your database there is a table name 'dbo.NetObjectDowntime' and there is a duplicate key value 1488 that it did not like it is complaining about Orion.Volumes. The solution here is to:

    1. 1. Login to your Orion server with the local Administrator account (NOT a Domain Admin account) to avoid permission issues such as GPOs and/or other policies.
    2. 2.       Stop all Orion services (use the Orion Service Manager)
    3. 3.       Open the Orion Database Manager and search/find on the table name 'dbo.NetObjectDowntime'  the duplicate key AKA EntityId 1488 and delete it.
    4. 4.  If other different values are found on the log with the same message but different key value number then it has to be deleted as well.
    5. 5.  Run the Configuration Wizard (select all 3 option if prompted) and just click Next, Next.. etc. – once the configuration wizard ends go to step 6.
    6. 6.  Start all Orion services (use the Orion Service Manager)
  • Im still on 11.5.2, upgrading to 12.0.1 this Sunday.  Guess I wont bother looking into this one until after the upgrade.  My log errors "Violation of PRIMARY KEY constraint 'PK_ObjectDownTime'"

  • I upgraded this week, NPM to 12.0 (from 11.5.3), SAM to 6.2.4 (6.2.3) and NCM to 7.5.0 (7.4.1).

    Seeing this post I just went and checked my custom views and limitations I have assigned to customers and all looks good.

    I did have an issue with NCM and my scheduled job to pull configs off my nodes (switches). It caused a specific model of switches LE31x to reboot when Solarwinds tried to log in. I had a couple of switches tank and need a manual reboot. Since these are EOL anyway it is just lighting a fire for us to get them upgraded. My fix in the mean time was to exclude that model via OID from the scheduled job.

    Since things seem to be running okay, or at least I've identified issues, I'll look at doing the final push to the latest version next week.