18 Replies Latest reply: Feb 23, 2012 9:20 AM by rhart-ka RSS

Issue with SAM 5.0 RC installation

rhart-ka

Last Friday I finally found the time to upgrade our production APM server to SAM 5.0 RC1.  I've been very excited to get the hardware health monitoring capability.  I had no showstopping issues with the upgrade and Orion is functional, but I did run into some glitches:

  • I performed an upgrade as I usually would with any other APM update.  The install process correctly identified the existing SolarWindsOrion database.  However, after the installation was complete, I found it had installed SQLExpress on the system and started all its services with the SOLARWINDS_ORION instance name.  It did not create a new SolarWindsOrion database there either -- it upgraded the already existing one which is located on another server.  After I noticed this, I disabled all the SQLExpress services and everything seems to be fine.  Is this a known problem?
  • SAM reports being in Evaluation with 50 days left, and I did activate my license during the upgrade.  I'm guessing this is expected as part of being a Release Candidate.  Am I correct in this assumption?
  • I'm able to use the Realtime Process Explorer through the web interface of SAM, but alerts generated using the new "High xxx Utilization with Top 10 Processes" templates fail to include the process list, usually showing a time-out error or it might just be blank.

 

Anybody else experience these? 

Thanks for all the effort that's gone into this new update.  The hardware monitoring capabilities are a huge benefit to us!

 
  • Re: Issue with SAM 5.0 RC installation
    aLTeReGo

    I performed an upgrade as I usually would with any other APM update.  The install process correctly identified the existing SolarWindsOrion database.  However, after the installation was complete, I found it had installed SQLExpress on the system and started all its services with the SOLARWINDS_ORION instance name.  It did not create a new SolarWindsOrion database there either -- it upgraded the already existing one which is located on another server.  After I noticed this, I disabled all the SQLExpress services and everything seems to be fine.  Is this a known problem?

    I can't explain this behavior if the file you downloaded came through the customer portal and did not include the word "eval" in it then SQL express wasn't bundled with the installer. Only files that contain "eval" in the name include SQL Express. It sounds likely that at some point in the history of this machine an evaluation was installed this host. 

    SAM reports being in Evaluation with 50 days left, and I did activate my license during the upgrade.  I'm guessing this is expected as part of being a Release Candidate.  Am I correct in this assumption?

    This is normal RC behavior. RC keys are essentially extended evaluation keys. Once SAM 5.0 GAs a commercial license will be available through your customer portal. RC keys are designed to extend though to and beyond the scheduled RC period so there should be no concern of the license unexpectedly expiring before you receive your commercial license key.

    I'm able to use the Realtime Process Explorer through the web interface of SAM, but alerts generated using the new "High xxx Utilization with Top 10 Processes" templates fail to include the process list, usually showing a time-out error or it might just be blank.

    For Windows SNMP nodes there is a known limitation with the RTPE. Microsoft only updates SNMP statistics every two minutes and CPU can only be calculated after two counter updates. This means that you may need to increase the default wait in the alert to 4-5 minutes for Windows nodes being managed/monitored via SNMP. The easiest/best way to rectify this is to change these Windows nodes to WMI. Alternatively you can ensure you have at least one working/up WMI managed component monitor assigned to the host. Due to this Microsoft SNMP limitation SAM 5.0 will use WMI whenever possible, even when the node is managed via SNMP. The RTPE will search for any working/up WMI component monitors assigned to the host and use those credentials to connect and collect the necessary data. 

    • Re: Issue with SAM 5.0 RC installation
      rhart-ka

      To reply to each point:

      • Thanks.  This host did have an evaluation installed in the past.  I can't rule out that it didn't have SQLExpress all along.
      • Thanks, this is helpful info.
      • I have Windows machines initially discovered and monitored through SNMP, but I also have the "Windows 2003-2008 Services and Counters" template assigned to each, which uses WMI for all (most?) components.  I still, however, get a blank process list from these alerts with these servers.  Are you saying that this shouldn't be the case?

      Thanks again!

      • Re: Issue with SAM 5.0 RC installation
        martin.susil


        To reply to each point:

        • I have Windows machines initially discovered and monitored through SNMP, but I also have the "Windows 2003-2008 Services and Counters" template assigned to each, which uses WMI for all (most?) components.  I still, however, get a blank process list from these alerts with these servers.  Are you saying that this shouldn't be the case?

         



        Are Windows Service Monitors in "Windows 2003-2008 Services and Counters" template in Up/Warning/Critical state? If so, then these credentials should be used for Real Time Process Explorer.

        Could you please open a support ticket for this issue and reference this thread to make sure that it gets escalated to development quickly? 

        Thank you

        • Re: Issue with SAM 5.0 RC installation
          rhart-ka

          To answer your question, yes.  They are currently up.  I just saw that RC2 came out today -- should I hold off on the update until I've created this support ticket?  Could the update possibly fix the issue?

           

          Thanks.

          • Re: Issue with SAM 5.0 RC installation
            aLTeReGo

            If at all possible, please upgrade to SAM 5.0 RC2 and open a support ticket. We don't have reason to believe RC2 will resolve the issue but it would be best if you were on the latest build.

            If you don't have an opportunity to upgrade to RC2 tomorrow then open a support ticket. We're quickly approaching GA so if this is a systemic issue we'd like to identify and resolve it quickly while we're still in the release candidate phase.

            • Re: Issue with SAM 5.0 RC installation
              rhart-ka

              Okay, I will install the upgrade this afternoon and then open a ticket if the problem still exists.  Thanks!

              • Re: Issue with SAM 5.0 RC installation
                rhart-ka

                Opened case 313312.

                • Re: Issue with SAM 5.0 RC installation
                  rhart-ka

                  I just saw something a little different.  One of my alert emails that just came through said this:

                   

                   

                  The Physical Memory on <hostname> is currently running at 91 %. The top 10 processes running at the time of this poll are listed below:

                   

                        Unable to get list of processes - Value was either too large or too small for a UInt64.

                   

                        For more information click the link below.

                        http://<orionservername>:80/Orion/View.aspx?NetObject=N:19

                  • Re: Issue with SAM 5.0 RC installation
                    aLTeReGo

                    Out of curiosity, do you receive this same error when using the Real-Time Process Explorer on this node from within the WebUI?

                  • Re: Issue with SAM 5.0 RC installation
                    martin.susil


                    I just saw something a little different.  One of my alert emails that just came through said this:

                     

                    Normal 0 false false false EN-US X-NONE X-NONE

                    The Physical Memory on <hostname> is currently running at 91 %. The top 10 processes running at the time of this poll are listed below:

                     

                          Unable to get list of processes - Value was either too large or too small for a UInt64.

                     

                          For more information click the link below.

                          http://<orionservername>:80/Orion/View.aspx?NetObject=N:19

                     



                    Could you please confirm if this is a Windows machine monitored as SNMP node? If so, could you please let me know uptime of this box and number of CPUs?

                    It could be a known issue, that Real Time Process Explorer might have issues with Windows boxes, which are running for a very long time.

                    Thank you

                    • Re: Issue with SAM 5.0 RC installation
                      rhart-ka

                      One of the machines that gave me the "Unable to get list of processes - Value was either too large or too small for a UInt64." error is a Dell R910 with 40 CPU cores, running Windows 2008 R2, monitored with SNMP, and using the Windows 2003-2008 WMI template.  It was last rebooted on 2/12/2012.  Another one that gave me the same error is a Hyper-V VM, Win2008R2, SNMP/Windows WMI template, 4 vCPUs, and up since 12/4/2011.

                      I am not just getting these errors in email alerts.  I can see them in the Alerts tab of SAM's web interface as well.

                       

                      I can use the Real-time Process Explorer through the SAM web interface without any issue.

                      • Re: Issue with SAM 5.0 RC installation
                        Petr Vilem

                        It is combination of both - idle uptime and number of cores. On 40 core machine, the period after which it gets to this state can be only approx 6.5 days, then after another 6.5 days it switches back to correct state. This period may get longer with increased CPU load of machine.

                        Difference in behavior of alerts and RTPE GUI can be caused by inconsistency of used polling method (alerts may use SNMP, RTPE can use WMI).

                        Can you try unmanage all applications assigned to that nodes for a while to confirm, that you will receive same error also in RTPE (when it is forced to use SNMP)?