cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post

NPM 12.4 Is Now Generally Available!

Level 15

NPM 12.4 is available today, December 4, on the Customer Portal! The release notes are a great place to get a broad overview of everything in the release. Here, I'd like to go into greater depth on the brand-new Cisco ACI support. Let’s talk a bit about how software-defined networks are different than traditional networks, what that means for monitoring, and how to get the most out of the new ACI monitoring feature.

What is SDN?

The first time I heard the term Software Defined Network, I thought it was stupid. All networks are defined by software. Software moves packets and frames, or programs the hardware that does it. Software is used to manually configure networks via CLI. Software is used to automatically configure networks with protocols like OSPF, STP, and LLDP. Networks were alreadysoftware-defined!

Whether SDN is a good name or not, it is an important concept. There’s a lot of people trying to define SDN, usually with some ulterior motive of placing themselves in a favorable position. For a slightly less biased view, check out the Wikipedia definition. The thing that stands out to me is:

SDN suggests to centralize network intelligence in one network component by disassociating the forwarding process of network packets (data plane) from the routing process (control plane).

This is a big change. In an SDN environment, network devices like routers and switches become simple devices that just move traffic at a high rate. All the intelligence is in a separate device called the controller. The controller learns how everything is connected, what connectivity applications need, and writes instructions to all of the network devices so they know how to forward traffic.

There are a ton of SDN solutions available today. The two most popular commercial solutions seem to be Cisco ACI and VMware NSX. Cisco ACI is more commonly requested by our customers (see and compared to ), so we’ve built support for it first.

How Do I Monitor SDN?

An SDN fabric consists of a data plane and a control plane. The data plane is comprised of physical devices, Nexus switches, and, in the case of Cisco ACI, cabling. The control plane is comprised of many logical components that fit together to define what endpoints are allowed to send network traffic to each other. The modular nature of the configuration reminds me of Cisco’s MQC. To make sure your SDN environment is running well, you need to monitor both layers.

Data Plane (aka Underlay aka Infrastructure Layer)

AKA the boring stuff. This is not the glamorous part of SDN. It’s the stuff you’ve been doing for years: power supplies, fans, temperatures, CPU, RAM, and interface stats. The fact of the matter is, these things all need to function properly for your SDN environment to be performant and reliable.

The data plane for Cisco ACI environments is made up of the Cisco Nexus model line. Fortunately, NPM 12.3, the release before this one, introduced Network Insight for Nexus. This gave NPM better than ever support for this hardware.

It’s easy to set up. Navigate over to Settings (top menu bar) -> Manage Nodes -> Add Node. Add your spine switches and leaf switches as SNMP nodes. On the last step, make sure to check this box:

Picture1.png

If you already have your switches in NPM, you can find the same checkbox when you edit a node.

You’ll be prompted for your CLI credentials. CLI is the only way some of this very important data is available, so that’s how NPM gets it. This will cover the basics like power supplies, fans, temperature sensors, CPU, RAM, and interface statistics, plus the advanced stuff like VPC.  Those of you with NCM can also get access list version control and analysis. Those of you with NTA will get flow analysis. You can check all of that out on our demo site here.

Okay, let’s get to the new interesting stuff.

Control Plane (aka Overlay aka Control Layer)

In an SDN environment, the controller has all the intelligence. This has a big impact on monitoring. Instead of polling dozens or hundreds of devices that each have their own very narrow view of the network, we can poll the controller directly. It has to know where everything is or it couldn’t control it. This means we can learn a lot from monitoring it.

This part is also easy to set up. Navigate again to Settings (top menu bar) -> Manage Nodes -> Add Node. In a Cisco ACI environment, the controller is called an APIC. Add your controller as SNMP nodes. At the bottom of the first screen you’ll see this checkbox:

Picture2.png

Check it! If you’ve already got your APIC added, edit the node and you can find the same box to check.

Cisco strongly recommends each ACI fabric have three APICs. Since each APIC must be able to control the entire network if necessary, each APIC has a complete view of the network. Polling them all results in a lot of duplication of work and potentially duplicate alerts. You have a choice in how you approach monitoring of these devices:

  1. 1)    Add all three APICs to monitor but enable API-based ACI polling (the checkbox) for only one controller.
    1. a.    Pros: efficient for the APICs and efficient for NPM.
    2. b.    Cons: if the controller you’re doing API-based polling on goes down, you’ll see the APIC is down, but you’ll lose visibility to the control plane until you fix it or enable API-based polling for another controller.
  2. 2)    Add all three APICs to monitoring and enable API based ACI polling for all three controllers.
    1. a.    Pros: Control plane monitoring works, even if one or two of the three APICs go down.
    2. b.    Cons: NPM has to poll the same data three times. APICs have to provide the same data three times. You will get duplicate alerts and reporting data unless you’re careful to write your alerts in consideration of the duplicate data. More on this in a future post.

Our recommendation is to do #1, but either way will work.

The API-based polling runs over TLS. If you have a valid cert on your controllers, everything will add fine and you’ll be good to go. If you have a self-signed cert, you will receive a warning about it and you’ll have to accept the risk or replace it with a properly signed cert before proceeding. You do have a real cert on your APIC, right?

Once you complete the add node wizard, navigate on over to Node Details for one of your APICs with API-based polling enabled. You can click along with me right now on the Online Demo.  On the left side, you’ll see two new views: Members and Map. Let’s look at Members first.

Picture3.png

The Members view shows all of the logical components we have discovered. This includes Tenants, Application Profiles, and EndPoint Groups. It also includes the APIC’s view of the physical components: leaf switches and spine switches.

Picture4.png

This uses the framework’s List View, which is a polished way to deal with large lists. You can do multilevel filtering on the left, like sort, and search. The list contains the name of the component (example: Tenant3), the type of component (example: Tenant), and the distinguished name (example: uni/tn-Tenant3). On the right, we see the health score. Let’s talk about that.

Since the controller has visibility into all components and their relationships, for the first time, part of the network infrastructure is in a position to accurately assess its health. Cisco ACI does this by assigning a health score. The health score is an integer from 1 to 100, where 100 is perfectly healthy and less than 100... isn’t. The health score takes into consideration both parents and descendants in the ACI model. You can check out the exact formula here. Since health scores represent status, they’re polled at the status interval in NPM. As always, you can adjust this interval. All of this data is polled via Cortex, incidentally, our new polling framework that you previously saw powering PerfStack Real-Time Polling.

Health scores will be colored red, yellow, or green according to thresholds. There are thresholds on the APIC already for this that determine what color that score is in the APIC GUI. To stay consistent, NPM learns the thresholds from the APIC and applies those. If you customize the thresholds on the APIC, NPM will learn and apply the new threshold settings.

You can click on a health score to get the history in the PerfStack dashboard:

Picture5.png

Thanks to this being in PerfStack, it’s easy to start correlating other metrics about the APIC, leaf switches, and spine switches. It gets more interesting when you start correlating to end node availability, latency, and other data NPM has. If you own other modules on the Orion Platform, you can correlate that data too; for example, application counters, database wait time, IOPs, logs, and all the rest. Seeing all this data normalized on the same shared timeline is powerful for troubleshooting. If a health score is in bad shape and you think the issue is on the controller, it’s time to log in to the APIC itself. The APIC can tell you what is causing the score to be what it is and has a bunch of additional ways to troubleshoot.

Returning to the sub-view menu on the left, let’s check out the Map tab.

When you first open the map, you’re only going to see the APIC in the center. To get more on the map, select the APIC. On the right side, the inspector panel will open. Here you can check the box next to related entities and press Add at the bottom to add them to the map. You can use this method to continue to spider through your ACI environment. This works well for creating a map of a small ACI environment or of a specific section of a larger ACI environment, like a tenant or an app. Once you’ve got a map you like, you can select to Save as a group in the top right. From that point forward, you can navigate to that group and press the Map tab to see the map again. Here’s an example of one I saved in my lab:

Picture6.png

Pretty slick! One important note: the APIC GUI already has some capability to map an ACI environment. In talking to NPM users who run ACI environments, I frequently heard that they would like to grant read-only access via a common platform for folks who don’t have access to the APIC directly, like NOC engineers. This accomplishes that goal and lets you correlate and visualize with all of the other data currently available in Orion Maps.

Next Steps

To upgrade now, customers with NPM under active maintenance can head over to the Customer Portal and download NPM 12.4. Thanks to the improved Orion Installer, upgrade is faster than ever with centralized upgrade of additional polling engines. Once you’re installed, add those ACI nodes and reply here to let us know how it’s working for you!

24 Comments
Level 15

If you're using some other flavor of SDN, let me know which one!

I've enjoyed reading this post as I'm playing catch-up on SDN, thank you!

PS: the link to the demo site is broken as NodeID 2514 does not exist (Around ..Those of you with NTA will get flow analysis. You can check all of that out on our demo sitehere. )

Level 15

Fixed!  The Online Demo now has all of the new releases so I've added a link to check it out live too: https://oriondemo.solarwinds.com/ui/netman-cisco/apic-members-list?NetObject=N:2579&ViewID=455&opid=...

Level 12

cobrien​,

I must say, I am loving the the Centralized Upgrade feature. I have already used it twice and it was been awesome.

Best Regards,

Derik Pfeffer

Loop1 Systems: SolarWinds Training and Professional Services

    LinkedIN: Loop1 Systems

    Facebook: Loop1 Systems

    Twitter: @Loop1Systems

The infrastructure team I work with have been waiting for this!!!  WoohoO!!

Level 15

Yeah it makes a big difference in environments with a lot of scalability engines.  Thanks to serena​ and her team!

Level 9

Okay so yes you can do Cisco.   Why not SilverPeak?  Why not Talari?  Why not Riverbed?  Why not Fat Pipe?

There is more than Cisco out there.

I have been a huge supporter of Solarwinds since NPN 5. 

I implement it as a primary monitoring solution at every company I go to but this love for just Cisco is getting worse and worse.

I agree there are reasons for running a single vendor network but there are also reasons for vendor diversity.

I am in the vendor diversity camp - SilverPeaks, Palo Altos, Dell switches, Unify APs, Microsoft Teams / SfB.  

Solarwinds is starting to lose its luster from this Cisco only direction and it hurts me.

MVP
MVP

SolarWinds has gotten better about monitoring new products and technologies as they come along. SDN is a game changer - at least according to the sales teams, seemingly everywhere. It's good to see this not just on the road map, but implemented.

Level 12

We use CloudGenix. The ION2000 more specifically.

MVP
MVP

How much have you "tinkered" with the Universal Device Poller? I've been able to monitor most anything with the use of that tool. (It seems a little intimidating at first, but it's a quick learn and once you get the hang of it you can do soooo much)

Level 9

I have done a lot with UDP - I can get a lot of things going there I know.  However, there is a lot of trial and error to get the data you want.   It seems like Solarwinds is doing that for Cisco devices but not much for others.

MVP
MVP

Agreed - I have had to "play" a bit to make it happy with some things.

Level 15

Thanks for the feedback.  It definitely helps us prioritize additional device support for the future.

We chose Cisco because it is the most commonly requested SDN solution.  More customers ask us for Cisco ACI support than all of those you mentioned combined.  For example, ACI monitoring has 164 votes whereas Talari has not been requested as a feature on Thwack: https://thwack.solarwinds.com/search.jspa?q=talari&place=%2Fplaces%2F1329&depth=ALL

That doesn't mean we won't build Talari, but it does mean we won't build it first.  The good news here is that device support is not an on/off switch but a continuum.  The best support is something like a Network Insight: advance, role specific support that requires near zero configuration for in depth monitoring we used to see only experts building.  One step down from that is out of the box support for a huge array of status, time series metrics, and relationship mapping for example with network topology.  That's where most devices fit in.  We're in technology though so every day there are new devices and new metrics.  Out of the box support can't work for "zero day" devices.  That's where Device Studio and Universal Device Poller (UNDP) come in.  These are GUI based tools that let you browse around the data set devices make available via SNMP and pick whatever data you want to monitor, then add on some math parsing logic to get the graphs and charts you need.  Here's an example of someone doing that for Talari: Talari OID Not Valid?.  Once a poller is built it can be shared on thwack so you can often find the customization you need without building it yourself.  At the extreme end of customization, the SDK can be used for data sources outside of SNMP.

The first Network Insight we build was for F5 load balancers and the one we're working on right now is for Palo Alto firewalls.

Hope that helps!

Level 10

I'm struggling to add my Cisco ACI entity to SolarWinds NPM 12.4.  I get a successful test result back from the Poll for Cisco ACI username and Password but I cannot get my SNMP Community name accepted.  All I get back is 'Node does not respond with the supplied read/write community string'

Level 15

Try as an ICMP node to start.  That should give you all the API based data and verify that polling is working.  Then we can sort out the SNMP polling issue.

Level 10

Thanks Cobrien, I've successfully managed to add it as an ICMP node now and it's reporting back as node being up

Level 15

Does the ACI data show up, for example under the Members sub-view?

Level 12

SilverPeek

Level 12

I must agree!!  SolarWinds seems to feel that if you are not a pure Cisco then their monitoring is not needed - Look how long it took them the acknowledge that there is a little company called F5.  Just now getting around to Infoblox after how many years?  CheckPoint, been asking for that for over 8 years.  Yes UnDP is great and that is what has saved me MANY times but the time involved!  It becoming difficult as more platforms are being added to the network.  I am starting to run into platforms that only support API with very minimal information opened to SNMP.  And what about a very simple thing like Serial Number and Model of a device?

MVP
MVP

I am also receiving VMware NSX support questions. It would be great to receive support for this.

Level 13

Is this upgrade going to be as stable as 12.3...?  I'm not saying it was a really bad implementation, but...

Level 10

If it quacks like a duck...

Level 8

Does 12.4 currently support the multi-site functionality in the APIC's (which is a seperate set of controllers at each site) as well?

Level 7

Thanks Cobrien for such a descriptive article. We are also planning to move on Cisco ACI and as per the requirement I already upgraded to NPM 12.4.

I tired to add few APIC devices (spine and leaf) into solarwinds for testing purpose. There are few queries which I need to know :

What about the alerting part on ACI ? Do we need to create alerts specially for ACI devices and what parameters should we consider while creating the same. And thresholds of these health scores needs to be configure at Solarwinds' end or these are defined only on APCI device ?

Also, what reports we are going to cover here for ACI devices ?

About the Author
Lifelong technology enthusiast. Network Engineer turned Product Manager for network products. By geeks, for geeks! I started my career as a call center agent at a wireless ISP. I moved into the Network Operations Center to operationally support their network. I moved to another company to be a Network Engineer, and fulfilled that role at several different companies in different verticals including Healthcare, Software, and Finance. Eventually, I found my calling as a PM, where I work with all of the functions of a business, and particularly Development, to determine what to build next to deliver the most value to our customers.