8 Replies Latest reply on Feb 10, 2011 11:32 AM by netlogix

    Upgrading from NPM Ver 10 to  10.1.1

    Ciag

      Hi Folks,

      just a quick shout out to anyone who is going to be upgrading for NPM v10 to ver 10.1.1. We recently carried this out a few days ago, everything is now working great and I'm loving the new features but there are a couple of things about the upgrade I would like to share with people who still haven't taken the plunge yet.

      1. In the release notes it says that .NET Framework 3.5 is required we were running this prior to the upgrade but during the upgrade process I was instructed to update .NET frame work to 3.5 SP1. This requires a server reload, you will have to update this on your primary and any additional pollers. If you are working in an enterprise environment with SLA's and change tickets such as ours, this makes a big difference in how you prepare and execute the upgrade, so you should be away of this.

      2. The new collector services will require that the MSMQ (Microsoft Message Queuing) service is started on all pollers. In our environment a group policy restricted the MSMQ service from staying active as a result the following services could not start on our primary poller:

      The collector Polling Controller

      The collector Data processor

      The solarwinds information service

      As the information service could not be started on the primary poller the upgrade on our 2nd poller could not be completed.

      Hopefully this will save someone a few hours of head scratching. Apart from that everything went well, the actual installation was very smooth and quite quick and painless, it was our group policy that caused us problems but it could have been resolved far quicker if we had known prior to the upgrade about the .NET framework 3.5 Sp1 required update and the message queuing service restriction.

      So far I am very impressed with the new features in NPM and am eager to get stuck into the groups and alert dependancies feature, also our pollers are running much more smoothly now. We would have been running them at near max capacity so we are delighted to see this increase in performance.

      Have fun

        • Re: Upgrading from NPM Ver 10 to  10.1.1
          MFA

          Thanks for the info. I am one of those who have been waiting for a forum post like this. What server specs are you running NPM on, and how many elements are you polling?

            • Re: Upgrading from NPM Ver 10 to  10.1.1
              Ciag

              Think I spoke too soon on this one lol. We're now experiencing an issue were the database is not longer keeping historical statistics for Response time and packet loss. The gauges for resp and pack loss are working but no historical data is being written to the DB. Node status is unaffected by this and is operating as normal.

              This only started happening since around the time of the database mantenance last night and is only happening to devices on our 2nd poller. All other historical statistics for these devices is present in graphs and the DB. It's worth noting also that the CPU utilisation in the 2nd poller has trebled since the upgrade from 30% avg to over 90% avg with frequent peaks topping out at 100%.

              I currently have a case open with SW support about it. I'm sure a simple conf wizard will fix this but would like to find the root cuase so that it doesn't happen again. I will keep you posted on our progress.

              Server specs are: 3 server environment DB server, Primary poller, secondary poller

              Primary poller: Physical box windows server 2003 Enterprise, CPU 3Ghz, Memory 3GB. 3289 elements total

              Secondary Poller: Virtual box, Win 2k3 Ent, CPU 3Ghz, Mem 2GB, 4303 elements total

              DB server: Physical box, Win 2k3 Ent, CPU 3Ghz, 3GB

              Network Elements7592
              Nodes2319
              Interfaces5186         5min interval for statistics
              Volumes

              87

               

              HTH

                • Re: Upgrading from NPM Ver 10 to  10.1.1

                  Ciag--

                  Can you keep us posted on the outcome? Also, it's likely support will have you upgrade to NPM 10.1.1 SP 1--have you already?

                  Thx,

                  M

                  • Re: Upgrading from NPM Ver 10 to  10.1.1
                    MFA

                    Think I spoke too soon on this one lol. We're now experiencing an issue were the database is not longer keeping historical statistics for Response time and packet loss. The gauges for resp and pack loss are working but no historical data is being written to the DB. Node status is unaffected by this and is operating as normal.

                    This only started happening since around the time of the database mantenance last night and is only happening to devices on our 2nd poller. All other historical statistics for these devices is present in graphs and the DB. It's worth noting also that the CPU utilisation in the 2nd poller has trebled since the upgrade from 30% avg to over 90% avg with frequent peaks topping out at 100%.

                    I currently have a case open with SW support about it. I'm sure a simple conf wizard will fix this but would like to find the root cuase so that it doesn't happen again. I will keep you posted on our progress.

                    Server specs are: 3 server environment DB server, Primary poller, secondary poller

                    Primary poller: Physical box windows server 2003 Enterprise, CPU 3Ghz, Memory 3GB. 3289 elements total

                    Secondary Poller: Virtual box, Win 2k3 Ent, CPU 3Ghz, Mem 2GB, 4303 elements total

                    DB server: Physical box, Win 2k3 Ent, CPU 3Ghz, 3GB

                    Network Elements 7592
                    Nodes 2319
                    Interfaces 5186         5min interval for statistics
                    Volumes

                    87

                     

                    HTH

                     

                    Your NPM environment is almost identical to mine. Looking forward to future updates in this thread.

                      • Re: Upgrading from NPM Ver 10 to  10.1.1
                        Ciag

                        Ok folks,

                        so the problem is now resolved, we only got to the bottom of it yesterday, hence why there were no updates before now. This was a bit of a tricky one without a black and white explanation about the cause of the issue which makes creating a comprehensive preparation plan difficult.

                        We had errors with the  Orion8NetPerfmonEngine.dll (found in the install directory on all pollers) on both the primary and secondary poller. These files were removed and the Core services were repaired with the (10.1.1) SP1 hotfix being re-applied. (you can download this directly from your customer portal  under services & hotfixes).

                        The job engine queues had errors indicating some form of corruption. The exact details of this corruption remains unknown also the root cause is still a mystery. Although there is some relation to a handful of nodes that when moved to another poller caused the same problem on that poller too. Removing these nodes would not fix this problem. The cure for this was to create a new copy of the

                        JobEngine35.sdf for Jobengine

                        JobEngine35.sdf for  Jobengine.v2

                        Jobtracker.sdf

                        PollingController.sdf

                        Also the MSMQ queues must be cleared

                        This is only a brief description of the work needed to be executed I would recommend contacting support and referencing Case number 215390 if you experience any combination of the symptoms listed below.

                        - No historical data for any ICMP recorded stats on one or more pollers, but historical stats are recorded for SNMP data on the same poller. ICMP polls were being carried out as confirmed by using a packet sniffer but the results weren't being processed resulting in blank graphs.

                        - State changes of nodes on a poller do not change or change very slowly (4hrs +). Down nodes appear in the nodes with problems resources as 'UP' with interface status as 'Unknown'. (ICMP not working but SNMP was)

                        - CPU on the offending poller was 3 times what it was before upgrade. CPU usage will increase with the upgrade but when our poller was malfunctioning; CPU went from 30% to 90-100%. Since the repair however CPU is averaging around 45-50%.

                  • Re: Upgrading from NPM Ver 10 to  10.1.1
                    dclick

                    Has anyone ran into an issue that the 10.1.1 msi installer is being denied access to launch because of a "software restriction policy"?? I get errors in our event log that show that the install only allows the installation of unrestricted items. 

                    Ive not seen this before with ANY solarwinds products, so I dont know if this is new to Solarwinds, or some sysadmin getting a bit frisky with the group policies.

                    i opened a case about it, found the log, and thought it was all on my side, but as I think about, most of our server side policies have been inplace long before we started using Orion, so, i am going to re-open the case and have them check with Development.

                      • Re: Upgrading from NPM Ver 10 to  10.1.1
                        netlogix

                        I sometimes get funky issues like that which has something to due with the IE Enhanced Security Configuration (in add/remove) and zone settings inside IE (but I am not using IE, I am using windows explorder - you gotta love MS and how far the ingrained IE into their OS)

                        The way I get around it is to open a cmd window and drag the exe into the window (it will just put the full path of it in there) and hit enter.