NPM 12.2 was made available in the Customer Portal on September 13th! The release notes are a great place to get a broad overview of everything in the release. Here, I'd like to go into greater depth on Network Insight for ASA including why we built it and how it works. Knowing that should help you get the most out of the new tech!
We live in amazing times. Every day new technologies are invented that change how we interact, how we build things, how we learn, how we live. Many (most?) of these technologies are only possible because of the relatively new ability for endpoints to talk to each other over a network. Networking is a key enabling technology today like electricity was in the 1800s and 1900s, paving the way for whole wave of new technologies to be built. The better we build the networks, the more we enabling this technological evolution. That's why I believe in building great networks.
A great network does exactly one thing well: connects endpoints. The definition of "well" has evolved through the years, but essentially it means enabling two endpoints to talk in a way that is high performance, reliable, and secure. Turns out this is not an easy thing to do, particularly at scale. When I first started maintaining, and later building networks, I discovered that monitoring was one the most effective tools I could use to build better networks. Monitoring tells you how the network is performing so you can improve it. Monitoring tells you when things are heading south so you can get ahead of the problem. Monitoring tells you if there is an outage so you can fix it, sometimes even before users notice. Monitoring reassures you when there is not an outage so you can sleep at night.
Over the past two decades, I believe as a company and as an industry we have done a good job of building monitoring to cover routers, switches, and wireless gear. That's great, but virtually every network today includes a sprinkling of firewalls, load balancers, and maybe some web proxies or WAN optimizers. These devices are few in number, but absolutely critical. They're not simple devices either. Monitoring tools have not done a great job with these other devices. The problem is that we mostly treat them like just another router or switch. Sure, there are often a few token extra metrics like connection counts, but that doesn't really represent the device properly, does it? The data that you need to understand the health and performance of a firewall or a load balancer is just not the same as the data you need for a switch. This is a huge visibility gap.
Network Insight is designed to fill that gap by finally treating these other devices as first class citizens; acquiring and displaying exactly the right data set to understand the health and performance of these critical devices.
Network Insight for Cisco ASA
Network Insight for Cisco ASA is our second installment in the Network Insight story, following Network Insight for F5. As you saw with F5, Network Insight for ASA takes a clean slate approach. We asked ourselves (and many of you) questions like:
- What role does this device play in connecting endpoints?
- How can you measure the quality with which the device is performing that role?
- What is the right way to visualize that data to make it easiest to understand?
- What are the most common and severe problems that occur with this device?
- Can we detect those problems? Can we predict them?
With these learnings in hand, we built the best monitoring we could from the ground up. Let's take a look at what we came up with.
ACLs define what traffic is allowed or blocked. This is the most essential task of the firewall and monitoring tools generally don't provide any visibility.
The first thing we found here is there's no good way to get all of this data via SNMP. We have to pull the config and analyze it. For that reason, we handed this piece off to the NCM team to work on. You'll see a post here in the product blog shortly covering this!
Site to Site VPN
Site to site VPN tunnels are the next most important service that ASAs provide. They are often used to connect offices to data centers, data centers to cloud providers, or one organization to a partner.
Yesterday, you could monitor these tunnels by testing connectivity to the other side of the tunnel, for example an ICMP monitor to a node that can only be reached through the tunnel. Today, we poll the ASA itself via SNMP and API to show a complete picture including:
- What tunnels are configured?
- Are my tunnels up or down?
- If a tunnel is up:
- How long has the tunnel been up?
- How much bandwidth is being used by the tunnel?
- What protocols are securing the traffic transiting the tunnel?
- If a tunnel is down:
- How long has the tunnel been down?
- What phase did the tunnel negotiation fail at?
This means we automatically detect and add VPN tunnels as they're configured or removed and constantly keep an eye on these very important logical connections. I'll highlight a couple interesting things.
We're introducing a simple new concept called favorites. Marking a tunnel as a favorite by clicking the star on the right does two things. First, you can filter and sort based on this attribute. The page by default shows favorite tunnels first, so you will always see your favorites first until you change the sorting method. Second, it promotes that tunnel's status to the summary screen. We found for most ASAs there were a couple of VPN tunnels that were wildly more important than all of the other tunnels. Here at SolarWinds HQ for example, it's the tunnel to the primary data center. At the primary data center, it's the tunnel to the secondary data center. Favorties provide a super easy way to add extra focus to the tunnels that are so important that a big part of the story of the health and performance of the ASA is the health of the tunnels themselves.
What is the status of the tunnel?
Turns out this is a harder question to answer than it looks. Tunnels are established on-demand. If you just configured a tunnel, but have not sent any interesting traffic so the tunnel is not up, should we show it as down (red)? That doesn't seem right. What if the tunnel was up for 3 months, but interesting traffic stopped coming so the tunnel timed out and went back down, but is prepared to come back up as soon as interesting traffic is seen? The tunnel is definitely "down", but should it be red? Probably not! We spent a lot of time thinking about this and talking to you guys to determine the logic that decides if an administrator considers a tunnel down, up, or something in between. All of that logic is built into the statuses you see presented on this page.
For years, my first troubleshooting step on a tunnel that was down was to review logs and find out what phase negotiation failed at. This tells you what set of variables you need to review for matching against your peer. I'm very pleased that this first data point is now right in the monitoring tool that identified the tunnel as down to start with. I hope it helps you guys get your tunnels back up faster.
Remote Access VPN
When users connect to the office using a software VPN client on their laptop, Cisco calls that Remote Access VPN. As with Network Insight for F5, we are careful here to use the same terms as the manufacturer so it's easy to understand what we're talking about.
Again, we have to use both SNMP and API to get all the data we need to answer the following questions:
- Who's connected?
- Who tried to connect in the past, and what was the result?
- How long have they been connected?
- How much data have they uploaded and downloaded?
- What is their session history?
Again, I'll highlight a few things.
One of the challenges is the sheer number of remote access connections there are. We know we do not do good enough job at dealing with very large lists today and our UI Framework team has been working on solving that. This page is one of the first implementations of the new List View that they created. This list view gives you the tools to easily deal with very large lists. The left side of the screen lets you filter on anything shown on the right. The filters available are considerate of the data and values seen on the right, so we don't have useless filters. You can stack several filters and remove them individually. Finally, after filtering your list you can still sort and search through those filtered results to further hone your list.
You'll see this list view a lot more as time passes.
Whereas interfaces are the main story on a switch or router, they're an important secondary story on an ASA. We rebuilt the interfaces view from the ground up based on the List View. Along the way, we made sure we were building it for a firewall.
As my fellow ASA Administrators know, nameif is not a typo. Nameif is the command you use to specify the name of an interface on an ASA. A nameif must be configured for an interface to function, and from the moment you specify the nameif onward, every other element in the interface references the nameif. ACLs, NAT, you name it. In other words, the identity of an interface on an ASA is its nameif (like CPLANE or OUTSIDE), not it's physical name (like GigabitEthernet0/2). Accordingly, that is the primary name shown here, with the physical interface name shown only if the interface isn't in use and doesn't have a nameif.
If you have NCM to pull access lists from configs, we will identify which access list is applied to each interface and provide a link to review the access list. This is super convenient in practice.
Security levels have some control over what traffic the ASA allows. It also provides a quick indicator of how much the administrator trusts the network connected to a specific interface. Kind of important things for a firewall.
Again, we're using the simple favorites concept. I expect a lot of ASAs to have the interface connected to the Internet favorited!
All of the things described above are technology services that are built on a platform. The platform must be healthy for the services to have any chance of being healthy. The platform sub-view helps you understand the health of the platform.
While high availability is a feature of many platforms, it seems to be particularly popular on ASAs. Additionally, it seems Administrators have to fiddle with it a lot. Administrators have to failover to perform software upgrades, some choose to failover to change circuits, failover to upgrade hardware, failover for all sorts of reasons. While I'm concerned we are all using failover so often, it is clear that NPM has to provide great coverage for H/A.
In speaking with lots of ASA administrators we found several different behaviors. Some administrators were unaware of whether their ASAs were really ready for failover or not. Some check manually every once in a while, but have had an active ASA go down only to discover failover could not occur. Some expert administrators were checking failover status, but were also checking the quality of failover that would occur by verifying configuration synchronization and state synchronization.
Our H/A resources takes the best practices we found were being manually used by expert administrators, automates the monitoring of them, and presents simple conclusions in the UI. If everything is green, you get simple checks and a phrase explaining what is healthy. If something goes wrong, you get a red X more verbose explanation. For example, if the standby is ready but the config is not in sync, failover can occur but the behavior of the firewall may change. Maybe your last ACL change was not copied to the standby, so it doesn't apply if there is a failover. If standby is ready but connection state information is not synced, failover can occur but all of your users will have to restablish their connections. Not good!
Of course you can alert on all of these things.
Firewalls store information about each connection that is actively flowing through them at a given moment. Because of that, there is a limit to how many concurrent connections they can handle, and this is one of the primary values used to determine what size firewall you need to buy. It's obvious then that it should also be a crucial part of how we understand the load of the device in addition to RAM and CPU, so we've included it here.
Aggregating connection failure rates is an interesting way to get an indicator that something is amiss. Perhaps your firewall is blocking a DDOS or maybe a firewall rule change went awry. Watching this one value can be a leading indicator of all sorts of specific problems.
Summary: Putting it all Together
If we've done our job, we're providing comprehensive coverage of the health and performance of an ASA on all of the sub-views. Now, we pull all the information together and summarize it on the Summary page.
One of the things that really weighed down the Node Details page for most nodes was the Details resource. This resource has historically been a catch all for lots of little bits of largely static data users have asked us to show on this page. The problem is that it kept growing and eventually took up nearly half the page with data that actually wasn't that commonly needed. Here we have rebuilt the resource to focus on the most important data, but with the additional data available within the "other details" drop down. This also allowed us to move away from the archaic pattern of Name:value pairs in our UI. Instead, we describe the device as your peer would. You can see how the resource reads more like "this is <hostname>, the <context name> context on a <hardware model> running <software version>".
Also, did you know that what we called "resources" in the previous UI framework are called "widgets" in the new UI Framework? There's your daily dose of useless trivia!
Did you notice it? The Load Summary and Bandwidth widgets on this page are powered by PerfStack charting. Try clicking around on them. It's oh so pleasant. More to come on this later.
The Bandwidth and Favorite Site-to-Site VPN widgets display information about the components you identified as your favorites on the other pages. I think it's about time we recognized that all VPN tunnels and all interfaces are not equally important. Some are so critical that their status alone is a big part of the answer to the question: how is the firewall running? Favorites makes it easy to give them the attention they deserve.
Setup Network Insight for ASA
To get this visibility in your environment, jump on over to the customer portal to download the new version. After upgrading your NPM instance, the new ASA monitoring should "just work," but here's the specifics just in case.
Already monitoring ASAs?
The new monitoring will start up automatically. Give the new version a couple minutes to poll and jump over to Node Details for one of your ASAs. You'll get a bunch of new information out of the box. For complete coverage as seen in the screenshots above, you'll be prompted to edit the node, check the "Advanced ASA" monitoring check box, and enter CLI credentials. Make sure to look at the sub-views (mouse over to the left)!
There is one caveat. If you've assigned a custom view to your ASAs, we will not overwrite that! Instead, you will have to choose to manually change the view for your ASAs over to our new view.
Adding a new ASA?
Simply "Add Node" and select the "Advanced ASA" monitoring check box on the last step to enter CLI credentials. That's it. Give it a few minutes and check out the Node Details page for that ASA.
That does it for now. You can click through the functionality yourself in our online demo. I'd love to hear your feedback once you have it running in your environment!