I've never really been one to use network monitoring software all that much (that's what NOCs are for :P), but I have tinkered with PRTG and a few linux tools in the past (mainly just for experimentation), so when I was asked if I'd like to review the latest version of Solarwinds NPM for the GNS3 community, I decided to jump at the chance. After all, it's always fun learning something new, and you never know when that knowledge will come in handy, right?
Something I was really impressed of, right off the bat, was just how simple it was to get NPM and NTA up and running. I literally looked at the documentation a total of ONE time, and that was just to confirm that the 11.x version of the F5 LTM VM wouldn't support iControl REST-API polling. Otherwise, everything else was self-explanatory.
Initially, I wrote this review over at the GNS3 community forums, with those users in mind, so I've changed it up a little bit, since I'm sure many of you are either already experienced with NPM 12, or one of its previous versions.
Since this is Thwack, I'll wager that you're already registered with Solarwinds, but just in case you aren't:
1) you can registered HERE
2) you can download the NPM12 trial from HERE.
I installed NPM 12 (and NTA 4.2) in a VMware VM using Win2K8R2, and assigned it 2 cpu cores, and 8GB RAM. After the Win2K8 server was finished installing, I immediately installed the .Net Framework 4.5.2, since that is required. While you could let the NPM12 installer do it for you, it just seemed to me that it went a little faster, since I had already downloaded the offline installer directly from Microsoft.
If you decide that you want to also run the NetFlow Traffic Analzyer trial on the same server as NPM12 (which I did), it'll state that it really should be run on a machine with with a mimimum of 16GB RAM and 4 cores, but you can select the checkbox to acknowledge the messsage, and then proceed as normal, since this is just for virtual lab purposes.
I would like to go ahead and give you a word of warning: Installing NPM12 in a virtual machine, it does have one major hiccup. Changing the vmnet that its eth0 interface has been assigned to, changing its IP address, or even just shutting down and restarting the VM will hose up the NPM web console interface. You'll recieve errors that the correct credentials weren't supplied, or that the target machine actively refused the connection attempt. Once that happens, you're left with two choices:
1) Open up control panel, go to Uninstall program, find and right-click the “Solarwinds Orion Network Performance Monitor” service, and choose repair. If you're lucky, the web console will start working again.
2) If you get the dreaded SQL error, you may as well just reinstall the server VM from scratch. I never did figure out a way to resolve that particular error.
If anyone reading this happen to know how to prevent this issue from occurring in both Workstation 12.1 Pro and the latest Virtualbox, I'll edit this part accordingly, and give you proper credit for it.
At this point, I highly suggest installing and run NPM12 on a physical server, since you won't run into the above issue. I tried a variety of methods using both VMware Workstation 12 Pro, and Virtualbox, but had the same problem every single time. I even searched the Thwack forums, long before I registered here, in the hopes of finding an answer on how to prevent this issue, but came up empty.
With all that out of the way, I actually do in fact like NPM12. I hadn't tried it before being asked to review it, so I made it a point to install that last 11.x version first, so I could compare them, to see what's different between the two. Obviously, the new GUI is MUCH cleaner, and seems to be far more customizable than the 11.x version I tried, plus it also includes extra sensors, as well as two new features: NetPath and NetInsight.
I was unable to review the QoE and StackWise health monitors, due to the lack of extra hardware in my lab (and none of the switch VMs I have possess the required features need to use these).
Quality of Experience (QoE) relies on the NPM12 server being connected to either a SPAN, Mirror, or Tap Aggregation port on a switch, and it then polls agents that have been installed on either Windows or Linux Servers. Using this feature will help you determine if a bottleneck is the application you're running on the server, or if the issue is the network. Just bear in mind that using these sensors to poll the agents will increase the amount of RAM, HDD, and the number of CPU cores needed.
What prevented me from testing this, was that NONE of the switch VMs I have support SPAN, Port Mirroring, or Tap aggregation. Since I was creating this for the GNS3 crowd, I wanted everything to stay as virtualized as possible, as many of them lack the hardware.
Something I would like to see changed on the infographic within NPM 12 regarding QoE is that instead of only showing Microsoft servers, I think they should also include Linux servers, since the documentation does state that the agents will run as daemons on RH Enterprise, CentOS, SUSE Enterprise, Ubuntu 14 (Trusty), and Amazon AMI servers.
StackWise health monitoring sounds like it will perform exactly what it says it will. It monitors and will create alerts for both the health of the stack ring, and each member switch of the ring. I don't have any StackWise capable switches in my home lab, so I unfortunately had to leave this untested.
Two other things I really wished I could've tested out were the VSAN and AppStack Environment features, but I lack the right hardware to set up a SAN in my home office, and I also don't have a spare machine to devote to ESXi or Hyper-V at the moment. AppStack in general looks really promising, since you can track entire groups, virtual clusters, transactions/second. virtual servers, the works. I can easily see that coming in handy in a production DC environment.
Before I cover NetPath and NetInsight, I'd like to go over the changes Solarwinds made to the NPM12 GUI. Once the NPM12 installation has finished, and open up the web gui, you'll be greeted with a login prompt. By default, there's no password (you can always set one in the admin page later), so just click login, and you'll be taken to the network discovery wizard.
At this point, you can add IP ranges, subnets (in CIDR notation), specific IP addresses, or even have it poll a Windows AD Domain Controller. There is an option under subnets to have it pull subnet routes from a seed router, but I never really got that working properly, even though the 7200 IOS image I had directly connected the NPM12 contained a properly populated routing table, and I ensured that all the other nodes MAC addresses where in its ARP cache:
After using whichever method you choose to define the devices you want NPM12 to discover, click next and you'll be taken to screens that will prompt you for agent polling, VMware ESXi polling, SNMP credentials or WMI credentials (for Windows servers that don't support SNMP). Since I didn't use agents, ESXi, or WMI setups in my test topologies, I'm not going to go over that, but I did use SNMP v2c for simplicity sake, as well as NetFlow v9 and sFlow.
By default, it has v1 and v2c public/private credentials set up, but you can always edit those to non-default values, or better yet, click "Add New Credential" to set up SNMP v3. In a virtual lab environment, using just the default v2c public community is fine, just be aware that some VMs are set to use SNMP v3 by default, and obviously you'd want to use that, if you're going to have the NPM12 server deployed in a physical lab or production environment.
Since this was just for a lab demo, I left them mostly at the default values (I did up the Hop Count, though), and proceeded to the next to last screen. You can set the frequency of the discovery scan, and whether or not to run it immediately. While it's running, you can watch the status bars, if you want, but there's also an option to have this run in the background, which I liked, since it let me do some other things, instead of just twiddling my thumbs.
Once NPM12 has finished scanning, it will import all the devices, and amount of ports each one has.
If you need to re-run the discovery process again, you can highlight username of whichever user created the scan, and then click "Discover Now". It doesn't override the devices/interfaces that were previously scanned, it just import the new ones.
Now I'd like to talk about one of the big things SolarWinds has been talking about, in regards to NPM12: Customizability.
Pretty much anything and everything can be customized in the Web Console, so you're no longer faced with having to deal with see a bunch of stuff you don't want to see. Case in point, here's the default view in My Dashboards:
That's all well and good, but what if I have zero desire to have thwack Community or Training in there, and would prefer to just have the options I'm more likely to use on a daily basis? Well, just click Configure down at the bottom, and you'll be taken to this screen:
The Admin bar is a misnomer, since that edits the Home section of My Dashboards.
If you want to edit Network or NTA, you need to edit the Network_TabMenu and NTA_TabMenu.
Now, let's edit that “Admin” bar to get rid of a few things, and add a few things I'd rather have on there. Click Edit, and you'll be taken to this screen:
What appears in the Home section of the Dashboard is on the right, and all the possible entries we can add are on the left. To make changes, just drag and drop between the two lists, in order to add/remove entries. Here's an example I did, where I got rid of thwack, training, and virtualization, and then added Network Discovery, All Nodes, Conversations, and All Interfaces:
Just click Submit, and you're done. Here's what the Home section of My Dashboards looks like now:
You can do the same thing will all the other dashboards, as well. You can also customize each individual page, so you can see only the info you want, without having to scroll past a bunch of junk to get to it. If you click Customize Page on the far right of each page, you get taken to a customization screen, like this:
You can add/remove items, change column widths, or even add additional columns (up to a total of 6). It all makes it so much easier to find the info you're after that much faster.
Solarwinds didn't remove anything you may be used to, they just added more stuff to it. To give you an example, they've added more entries in the section you create reports with, added some new device-specifc hardware health sensors, as well as support for Cisco's EnergyWise initiative.
We can still see top talkers, problem nodes, interfaces, top protocols, etc… Everything you're used to with older versions of NPM are still there, they're just better and there's more of them.
Now, two of the new features I do want to talk about are NetPath and NetInsight. Initially, I tried NetPath using a topology I had built in the GNS3-VM, and also included an exit node using the Internet Appliance, but got some erroneous.
For those of you unaware of what this means, I'll give you a brief overview:
GNS3 is a simulation program that lets students, or even folks like me, build test topologies, and just in general try out scenarios without needing to touch any of the physical hardware. More than a few network equipment vendors do release virtual machines for a least some of their products, which can help get someone up to speed on a system they're otherwise unfamiliar with. To give you just a small sample of what you can run in GNS3 are Arista vEOS images, older vMX/vSRX images from Juniper, Virtual EXOS from Extreme Networks, CumulusVX from Cumulus Network, IOS images off older EOL Cisco routers, as well as exported images available to their VIRL customers (like ASAv, IOSv, IOS-L2v, XR-v, etc...), and even the old IOS on Unix/IOS on Linux images that leaked into the wild . I've even used to create subsets of data center topologies, and also to create and use servers images I've made, to learn about Ansible, Chef, SaltStack, OpenFlow switches, and more.
The sticking point is that a lot of these virtual machines rely on KVM, which is completely unavailable to Windows or OS X users, and only Linux users can run those natively. To get around this, the GNS3 devs created an Ubuntu-based Virtual Machine that can be run in VMware Workstation/ESXi, or in Virtualbox. That allows users to run those images in the GNS3-VM, regardless of what their host OS may be. They can also bridge in other, external VMs, which is what I'd done for this test. I ran NPM12 in a separate VMWare VM from the GNS3-VM, and just told GNS3-VM "Hey, you see this node over here? Yeah, it's another VMware VM, so use vmnet 2 to talk to it". The regular "local server" way of running GNS3 would allow you to use a "cloud object" to bridge your topologies to your host OS, your LAN, or get them on the internet. When you run everything inside the GNS3-VM, however, you use this premade linux vm callled the "Internet Appliance". It uses several NAT'd interfaces, so it can cause some odd results with NetPath
NetPath worked just fine when showing me the path it took from my NPM12 server VM to another endpoint contained within the GNS3-VM. In this screenshot, I created a probe on the NPM server VM that was set to poll ASW1 on a specific port, and it correctly traced the optimal path through my topology to ASW1:
I failed to screencap the topology itself, but it was decent setup using eBGP I artificially decreased the speed on a few of the links, and NetPath showed the best path it was taking, to probe ASW1
On the other hand, here's what happened when created a NetPath probe to GNS3.com using the Internet Appliance as an exit point:
NetPath followed the LAN nodes just fine, but once it hit the Internet Appliance (the 172.16.x.x and 192.168.18.x addresses), the very next hops were the gns3.com servers themselves. I fully expected to see devices from my ISP, as well as transit networks, but nada.
What I did next was to just take GNS3 completely out of the equation, and bridged my NPM12 server VM into my home lab. This time, I saw the types of results I was expecting:
I created a probe to Github, and I started seeing nodes not only for my ISP (which I blanked out), but I also could see nodes belonging to Level-3. The longer I ran the probe, I'd notice nodes pop in and out, but still, it was really cool! If you hover over a specific node, it will also highlight all nodes belonging to that provider! Pretty neat!
If you click on a specific node, you can see the contact info, prefixes, and AS number, like I was expecting:
I really like this tool! I still think it would really be useful in a larger network (like a CDN), but I can imagine some uses for it, for smaller shops.
The other new feature I'd like to talk about is NetInsight. This is used for polling F5 LTM loadbalancers. There's one major caveat you need to be aware of. The free 90 day trial of the 11.x LTM found on F5's website does NOT support iControl REST-API polling, and as such, you cannot use that version to test NetInsight with. You'll either need to get the “low-cost” lab version, or sweet talk one of their sales reps into granting you 2 licenses for a 45-day full-featured demo. That way you can check out HA. I ran these as external VMware VMs, as well, and just bridged them into the GNS3-VM topology I was running, as well as a LAMP server VM.
Enabling NetInsight polling in NPM 12 is actually incredibly easy. You just click on Manage Nodes, select your F5 load-balancer, and check “Poll for F5 iControl”. Add your administrative credentials for that node, and click Test.
By default, the F5 VMs include a self-signed certificate, so you WILL get an error. Just choose to accept the certificate, click Test again, and it will pass the test. Repeat for the second load-balancer, and you'll be ready to check this out. In order for NPM to monitor the health of the F5 balancers, you'll need to have some virtual servers and pools configured. I ended up using a LAMP server built on Xubuntu, that had multiple apache2 websites that purely responded to port 80 and 443 requests (I also had to create several sub-interfaces on that LAMP server, for reachability).
Since those websites were very limited, there was hardly any raffic to see in the “Concurrent connections” section of the load-balancing page.
To check out the health of the F5 devices, click load balancing from the dashboard, and you'll see this:
I happened to have my cursor hovering over one of the LTMs, as you can see, it'll display its role, status, status reason, ip address, FQDN, and it's HA status. If you right-click on either of the LTM nodes, you can either see detailed information on that node, or it will display all the virtual servers, pools, and pool members associated with that LTM.
The below image shows which devices are associated with the selected LTM:
By clicking “show details”, you can see more specific device information:
Ignore that one pool member being down. I'm not sure why LTM2 thinks it's down, while LTM1 thinks it's up. Weird….
This page will also list the software release on the monitored nodes, as well as any other relevant info you might want to see (it also does this for all nodes).
Like I said towards the beginning of this review, I actually like NPM12 and I do like NTA v4.2 (for which I'll be doing an integration guide with GNS3). I just wish NTA was part of NPM, instead of a separate purchase, and I really wish NPM would play nice in VMware/Virtualbox, since I know that a lot of GNS3 users like myself use those hypervisors often.
In conclusion, if you're in a position to recommend or purchase monitoring software for your organization, it would be well worth your time to try out NPM12. I tried it out with Arista, F5 LTM, Extreme Networks, and Cumulus devices, as well as Cisco images from exported from VIRL and NPM had no problems recognizing them.