This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

SAM 6.4 Upgrade Problems

Upgraded SAM to 6.4 on March 11th.  I have been having various problems and have been working with support to get things fixed, but I think all is back to normal now except my database is having problems.  See below for a couple of screenshots from DPA.  You can see where my wait times jumped up when I initially upgraded.  As things were resolved, the wait times were reduced.  However, still having problems with my PLE.  The PLE hit seems to be tied to a stored proc that is getting executed over 80K times per hour.  Wondering if anyone else is having the same problem.

stevenwhunt

pastedImage_0.pngpastedImage_1.png

  • I'll look into this internally and see what I can find.

  • We're getting ready to upgrade to SAM 6.4. Would you mind sharing with me the problems that you encountered after the upgrade? Thanks.

  • Hi mprobus​,

    The PLE hit seems to be tied to a stored proc that is getting executed over 80K times per hour.

    Can you share the name of the stored procedure, please?

    Thanks,

    Robert

  • See below for the exact info provided by DPA.

    /* (inserted by DPA)
    Procedure: SolarWindsOrion.dbo.swsp_MapOrionServersToNodes
    Character Range: 196 to 385
    Waiting on statement:

    INSERT
    INTO @ServerNodesPairs
    SELECT n.[NodeID],
       os.[OrionServerID]
    FROM [Nodes] n
    WITH
       (
          NOLOCK
       )
    JOIN [OrionServers] os
    ON n.[DNS] = os.[HostName]
    OR n.[DNS] LIKE os.[HostName] + '.%'

    */
    CREATE PROCEDURE [dbo]
    .[swsp_MapOrionServersToNodes] 
    AS 
       BEGIN 
          SET NOCOUNT ON 
          DECLARE 
          @ServerNodesPairs TABLE (NodeID INT, OrionServerID INT) 
          -- find parirs (OrionServer->Node)
    /* BEGIN ACTIVE SECTION (inserted by DPA) */ 
          INSERT 
          INTO @ServerNodesPairs 
          SELECT n.[NodeID], 
             os.[OrionServerID] 
          FROM [Nodes] n 
          WITH 
             (
                NOLOCK
             ) 
          JOIN [OrionServers] os 
          ON n.[DNS] = os.[HostName] 
          OR n.[DNS] LIKE os.[HostName] + '.%'
    /* END ACTIVE SECTION (inserted by DPA) */ 
             -- update OrionServers, set reference to Node
     
          UPDATE os 
             SET os.[NodeID] = pairs.[NodeID] 
          FROM [OrionServers] os 
          JOIN @ServerNodesPairs pairs 
          ON os.[OrionServerID] = pairs.[OrionServerID] 
          WHERE os.[NodeID] IS NULL 
          OR os.[NodeID] <> pairs.[NodeID] -- update OrionServers, remove reference to Node
     
          UPDATE os 
             SET os.[NodeID] = NULL, 
             os.[AgentAutoDeploy] = 0 
          FROM [OrionServers] os 
          WHERE os.[NodeID] IS NOT NULL 
         AND os.[OrionServerID] NOT IN 
             (
             SELECT [OrionServerID] 
             FROM @ServerNodesPairs
             ) 
          END

  • I had no problems upgrading my test environment.  But it is much smaller and Prod.  So, it seemed to be a scaling problem for me.

    The first big problem we had other than the database was missing data.  I found out 3 of my servers were not pulling in CPU and Memory information on my WMI servers.  They were reporting 100% successfully polling as they were pulling response time.  The only way I found to fix this was to remove every single node from one of the pollers and then slowly add them back.  I could only move over 100 at a time as if I tried more, it would break them again.  I would move 100, wait about 10 mins for them to poll and then move another 100.  So, I spent about 2 full days playing musical nodes as I had to move around about 9000 nodes.

    The 2nd issue I found was that my VMAN data was not being pulled in.  This turned out to be an easy fix.  All I did was disable the integration, wait a bit, and re-enable it.  Initially after the upgrade, my VMAN data was fine.  But in the process of fixing the above issue, the integration broke somehow.

    The last issue was my main poller had a build up of messes in the MSMQ queues, specifically the information service ssl one.  We tried a configuration wizard with repair packages, but this didn't help.  The fix ended up being to remove RabbitMQ completely and reinstall it.  I referenced the following link for the removal part.

    https://support.solarwinds.com/Success_Center/Network_Performance_Monitor_(NPM)/Rabbit_MQ

  • If I'm reading this query correctly, it appears to be the Database Performance Analyzer integration into Orion itself that is causing this issue.

  • I disabed the intergration for the past hour and it didn't seem to make a difference.  The machines listed DPA are my pollers, so it appears to be SAM specific.

  • When you disabled the integration, did you restart the services?  Or at least flush the subscription table?  I'm pretty sure that integration is there.  Just a thought

  • I did not.  I will have to wait until after hours to take anything down.  I'll give it a shot tonight and see what happens.

    On a side note, I noticed the DPA integration module is at 11, but our DPA server are at 10.2.  Does anyone know if this will cause a conflict?  We plan on upgrading to 11 next weekend.

  • DPA integration is still disabled.

    I stopped all services last night and flushed the subscriptions (although the flush said zero rows affected).  I then waited a few minutes and let the PLE climb and started back half of my pollers and let them run and then the rest.  The PLE once again tanked.

    So, I stopped all services, rebooted the SQL box then the main poller.  I let the PLE climb some, then rebooted one APE.  Let it run or about 5 minutes then another.  Continued the process and the PLE got to about 2500.  After adding in my 6th APE, the PLE tanked again.

    pastedImage_0.png

    I have a meeting with support this afternoon.  Hopefully they will be able to give me some additional information.