    NPM 11.5 / SAM 6.2 Fun


      Do you like...


      Having the ability to browse to all of your nodes? (Ticket#771812)


      Having the ability to delete interfaces from the web interface? (Ticket#772609)


      Receiving email alerts with variables that always work? (Ticket#769906)


      If so, consider NOT upgrading to NPM 11.5 / SAM 6.2.

          So many issues in this release.  My MSMG's fill up in about 4 hours and everything stops.  Support has no clue.  I have been basically dead for 3-4 weeks.  We are seriously considering moving to something else.  6 pollers, NPM, NCM, SAM, UDT, NTA..

            I'm in the same boat. Alerts are going crazy with information that no one can understand. I have applied the NPM 11.5 Hotfix 2, SAM 6.2 Hotfix 1 and NTA 4.1 Hotfix 1 and still have problems. I've been asked to look elsewhere too.

                One of my customers is still on an ancient build of SCOM 2007 since 2009. It runs perfectly fine.


                It does things Orion in 2015 will never do out of the box.


                • Alerts for AlwaysOn SQL, log shipping - automated discovery of all SQL objects.
                • Alerts for Clusters out of the box with no need to do time consuming configurations that Orion requires
                • Dynamic management of disks/volumes - Orion has burned me for years with disks added by someone and remains unchecked in List Resources - no, I'm not going to do the discovery, import automatically BS.
                • 2 stage CPU alerts (% + Queue) without crazy customization
                • The ability to report on a group of servers that are dynamically managed based on registry entry - Orion wants you do to custom properties - does anyone realize how stupid this is?
                • The ability to quickly create summary performance reports on a group of servers without the idiotic "custom property" nonsense that Orion uses. Come to think of it, there is no summary server report for Orion, you have to create it yourself. Fun times when a PM wants a report that I can do with SCOM in literally 30 seconds.
                • Over 500 monitors and rules for SQL out of the box - All alerting is automatic, no need to build hundreds of triggers. With Orion you get 2 or 3 SQL alerts out of the box. I feel sorry for people that get sucked into Orion without any clue how deficient AppInsight is from an alerting perspective when you have hundreds and hundreds of instances.
                • If an admin in a remote site adds a server to SCOM there is nothing else he needs to do. With Orion you have to manually assign everything you want monitored. Assign triggers. Manage custom properties. Manage reports by hand. SCOM is all automagic once you have your groups configured,
                • Custom Availability reports with extreme granular views of any object under the sun. To do the same with Orion would take hours to produce that I can do with SCOM in 30 seconds.
                • The ability to override a monitor based on a group or class - Orion gives you very little customization opportunities - see AppInsight for SQL - good luck managing that if you have more than 3 SQL servers. Someone's response to this another thread was you need to learn 4 or 5 languages to manage SolarWinds. No thanks, that is ridiculous, SCOM does it for me automatically.


                I will never forget the day our Admin enabled the component failure emails in Orion and we woke up to 1900 emails. He turned them off and we haven't turned them back on since. The uncontrollable nature of Orion is its worst feature. 90% of or servers are RED due to idiotic monitors that you cannot control easily. If you have more than 10 servers good luck managing all of the worthless alerts and events that SolarWinds spits out every 3 seconds for components.


                Of course SCOM has its weak points as well. TONS OF THEM. But, if you had to give me a choice between Orion and SCOM, and alerting is your number one requirement - I'd choose SCOM any day hands down.


                I believe that SolarWinds give s people a horribly bad false sense of security for alerting. I would love to get rid of SCOM and replace with Orion, but SolarWinds is years away from dynamic management of objects based on classes with easy access to overriding based on dynamic groups of servers.


                SCOM is so much better in so many ways, but its easy to hate on Microsoft just for being Microsoft. I have identical large environments - one with SolarWinds and one with SCOM. SCOM gives me an advantage in so many ways, but these are things SolarWinds sales people do not comprehend, and they are things my managers do not comprehend - again - the false sense of security is epic. I will root for SolarWinds to get better but there is no way I can recommend it for a large SQL centric shop with over 1000+ servers. its to unwieldy to control effectively and the ability to not create reports and alerts based on dynamic groups of servers and having to use the custom property nonsense is just not for me.

                    I don't think this is a fair assessment of NPM product. Many of the areas that you dismiss are indeed the strong areas of Orion. It is just that you are more used to the idiosyncrasies of one product while abhorring the other. Not to mention, you are only focused on server monitoring while Orion covers much bigger ground than that.

                  More fun.


                  SQL server CPU jumps to 100% and stays there until you reboot.  SolarWinds web interface and email alerts wonky meanwhile.  3 times in the past week.  Never happened prior to NPM 11.5 / SAM 6.2.  Ticket #777141


                  Unable to remove nodes from "discovery ignore list".  Ticket #777967


                  I noticed hotfix 3 was released on the 20th.  Patch notes indicate it addresses inability to delete interfaces, one of my current issues mentioned in original post.  Installed HF3 on my 3 SolarWinds environments.  All 3 of them are no longer accessible.  Blank white page.  Launching interface directly on server results in this error.  On hold right now waiting for support before backing out of patch.



                    We are getting a few tickets piling up too...


                    MSMQ backing up due to SWIS - Ticket # 771114

                    Widgets on views that are slow or fail to load - Ticket # 776355

                    Exporting SAM templates fails - Ticket # 777974

                    View UDT Job Status throws UI error - Ticket # 765127

                    Add...'Manage Transaction Monitors' loads extremely slowly with transaction dependencies defined - Ticket # 778319


                    All of these appeared post NPM 11.5 / SAM 6.2 upgrade.

                        We have half a dozen tickets open and pretty much spent three days solid on the phone with support trying to get our deployment operational after the 11.5 upgrade. The systems still are slow as heck, it takes minutes to get to and edit node or list resources sub menu, UDT is dead, not all devices are being polled, the load on our polling engines increased by 30% pushing several of them over 100%, Network Atlas is dead. Comicbook.jpg

                            We are seeing the same at our location with regard to slow performance and increased poller utilization. This has been attributed to a 'long running query' issue that appeared in the 10.X version and was repaired by a hotfix. The fix didn't make it into 11 or 11.5, and we have been waiting for a fix from development for about a week now.

                          HF3 killed my system over the weekend!!! never do updates on a Friday, but had so many issues with 11.5


                          Still having problems with...

                          ‘Database Details’ admin page not loading

                          ‘Views By Application Type’ admin page not loading

                          ‘View UDT Job Status’ page not loading

                          ‘Device Tracker Advanced Settings’ page on the Additional Web server, not showing the ‘Monitored Port Types’

                          ‘Virtualization Summary’ page not applying the ‘Account Limitations’

                          I spent most of the day going through every page and option testing what is working and what is not, so I can report them.

                          my test lab worked so well with NPM 11.5, and all my other modules. shame the upgrade didn't......

                            i am actually getting frustrated myself and have reached out to loop1


                            My NPM and VM integration has been a up and down - i am constantly losing chart data , or my network dials are showing ridiculous amount of network traffic in KB, I am breaking integration and restarting the appliance and its 50/50 shot. - ticket open again


                            Charting is not working correct - cant switch from zoom  1 hr , 12 hr and 24 hr once you pick on you cant pick another with out refreshing the page.

                              Thanks for the headsup, we were considering the upgrade. Will hold-off for now.

                                Most of the issues that we have had have been pretty minor compared to others mentioned in this thread but here is a summary of some of the issues that I have seen in our instance (2 Pollers and additional web server, NPM, SAM and NCM).


                                Universal Device Poller App - Unspecified Error - Ticket# 772548 - hopefully fixed by changing the DB server from name to IPv4 address since the name was looking up as the IPv6 address and reinstall of a few of the processes which seems to have made a decent performance difference also.

                                Manage Sensors not working properly - Ticket# 765173


                                We have had several other minor tickets solved by support that seemed to be more unique to our instance.


                                That all being said the web based alerting is a large step in the right direction and has allowed us to both clean up how the alerts were configured along with providing the right information in the right place for our techs.

                                  Having the unique perspective of not being a part of the Solarwinds company but working with approximately 100 different Orion environments a year, I can say that a majority of the issues I've been seeing have been small things.  Given the track record of Solarwinds updates (taking the sheer number of products and the relative frequency with which they update them into account), this is definitely unusual to have this many issues on an upgrade.  What is also unusual and should be considered when looking at this release from a big-picture perspective is the sheer amount of amazingly cool features this release has.  It has as many new and great features in this one release as they typically have in their products in an entire year (or two maybe), so you can expect more bugs than a normal release.


                                  Because of the large amount of additional features, perhaps the product should have been in beta for a while longer, but they (and the Thwack Community) were so excited to play with these features in production that it may have been pushed a little early (pure speculation of course since I have no affiliation with Solarwinds in any way shape or form.  It could have been something entirely different for all I know).  Most issues that are remaining after Hotfix 3 are minor and not product breaking and it should be pretty stable (obviously there will be those out there that may still have some major issues outstanding but those are the exception and not the norm).  But like stated, this is not usual for Solarwinds and I am giving them the benefit of the doubt based on their wonderful track record.  I've performed approximately 7 or 8 updates/new installs of NPM 11.5/SAM 6.2/VMAN 6.2/SRM 6.0/WPM 2.2 in the last month and have ran into very few major issues.  My co-workers have each done a couple more/less than I have and their results are the same.  Of course you are going to see more complaints on the forums since for the most part people who get normal working results aren't inclined to rush to a forum and state that nothing unusual happened whereas people who have issues are doing everything they can to communicate the fact.

                                      We upgraded out development servers and have had very few issues if any.  We do not have any additional pollers in dev, but the system seems stable.  We plan to upgrade production with 4 pollers NPM, SAM, UDT, NCM, IPAM, VNQM, WPM and toolset April 7th.

                                      I'll do some more indepth testing on dev and post my results here.  The same with production. 

                                          familyofcrowes - suggest you tack onto this thread too as we've uncovered things and disclosed here... 11.5.1 on the way? Just not brave enough to that 11.5.0..
                                          Putting in a greenfield solution first right now. Our other instance will then get upgraded, but after yours.

                                          Issues maybe similar though.

                                              Well we are complete with upgrading NPM 11.0 to 11.5, SAM 6.1 to 6.2 IPAM to 4.3 and WPM to 2.2 along with upgrading to the UDT 3.2.1 RC on a main poller and 3 additional pollers.  We installed every HF we could find.

                                              So far there are some minor alert issues from the conversion and an annoying UDT banner message, but we are looking OK.

                                              CPU and memory are a little bit higher but nothing I didnt expect.

                                              The upgrade was very slow though.  The wizard takes a long time to migrate alerts (over 500) but if your patient it eventually finishes.

                                                  i just purchased a new poller and no when i go to install all the additional poller installs I get sorry this is older then what you have . I assume its cause all the hotfixes and patches we have applied

                                                      If you have the SAM 6.2 Hotfix 1 installed on the server, this will need to be uninstalled (Control Panel -> Programs and Features -> Views Installed Updates) - the SAM Additional Poller won't install as its version is older than the hot fix. Once the additional poller is installed, you can install the hot fix on both servers..


                                                      I haven't installed the Orion Hot Fix 4 or NPM 11.5 Hot Fix 1 on the servers yet so I don't know if this cause the same issue.


                                                      Although I am impressed with the new features in this version, I am reluctant advising customers to upgrade at this point in time.

                                                  We too tested NPM with all the modules and it showed no problems.  All the issues surfaced once when we rolled these into production.  Also, just like you, we do not have additional polling engines in our lab, which may be the reason.  That makes me thing that we may need to purchase a polling engine for our lab as well.  Our managers are not going to like this very much since we've already brought them several "out of budget" requests...


                                                  By the way, we installed HF4 and HF1, but these did not fix problems between Universal Device Pollers and the web alerts.  Our case has been escalated to the developers. I hope to hear from them sometime next week.

                                                Another high profile bug for us, all of our links to SCCM pages we setup for external web site objects on the main tool bar are broken, they will not render in 11.5, getting a Reporting Service error. But if I open the same url in another browser tab, it opens perfectly fine. So this was everyone's favorite way of accessing Windows updates reports status very quickly. Doesn't work if I specify open in a new frame as well. This is truly heartbreaking and a huge loss of a great UI component for us.

                                                  Update for Ticket#771812 mentioned in my original post.


                                                  If you see this error...


                                                  SWIS Error.PNG

                                                  Try removing resources from the view (settings-->manage views-->node details, volume details, etc).  I started by removing the new forecast / app stack resources that are automatically added to some views.  Removing some resources will help reduce the frequency of this error.  Removing a bunch of resources helps even more.  It doesn't appear to be one resource in particular.  It appears to be related to the number of resources in a view.  The higher the number, the more often you see this error.

                                                    God, I wish I had seen this before upgrading... I don't even have SAM, and I have many, many issues with 11.5

                                                      wow, wish I can contribute more but i'm just happy to see so many people have similar issue with me: MSMQ keeps filling up after NPM11.5 and SAM6.2 upgrade and support case for over one month!


                                                      and isn't there a way to better manage the installation of hotfix? it is really frustrating to copy everywhere the files need to be replaced... it is supposed to be simple!

                                                          I had the same issue and finally got a dev to work with me.  It seems they added a new undocumented feature that tracks the down time of interfaces and volumes.

                                                          If you have any Cisco 5K's in your environment, they are very problematic to poll via SNMP.  Lot's of things to poll, and with a lot of 2K's attached you can easily have over 1,000 interfaces.

                                                          What was happening, was that the 5K's would successfully complete a poll and mark the thousands of interfaces as up, the next poll was incomplete, so they were unknown and marked as a state change.

                                                          This was way too much information for the pollers and the database to keep up with.  SW was marking thousands of interfaces and volumes as changed every polling cycle.

                                                          There is no way to turn off this new "feature"

                                                          My solution was to set the global SNMP timeout a little higher and add an additional retry.


                                                          No more MSMQ's filling up.


                                                          Still lots of bugs, but at least it is running now.


                                                          Again, this shows that they apparently do not test these things out on larger real world environments, and especially with a mix of modules.  UDT seems to always be the stepchild.  If you have UDT, forget about a new release for a few months.

                                                          I'm not jumping into any hotfixes until NPM and SAM put out their next release (sam 6.2.1, etc). Some things are broken but nothing is terrible, as a result.

                                                            Sounds like I shouldn't upgrade to NPM 11.5 / SAM6.2, I'm runnning NPM 11.0.1 / SAM 6.1.1.


                                                            Running the latest isn't always the best idea.