1 2 3 Previous Next

Product Blog

712 posts

NetFlow Traffic Analyzer

Faster. Leaner. More Secure.

 

The new NetFlow Traffic Analyzer leverages the power of columnstore technology in MS SQL Server to deliver answers to your flow analysis questions faster than ever before. MS SQL 2016 and later runs in a more efficient footprint than previous flow storage technologies, making better use of your infrastructure. Support for TLS 1.2 communication channels and monitoring of TCP and UDP Port 0 traffic helps to secure your environment.

 

Version 4.4 also introduces a new installation process to confirm that you have the necessary prerequisites, and to guide you through the installation and configuration process.

 

NTA 4.4 is now available in the Customer Portal. Check out the Release Notes for an overview of the features.

 

Faster

The latest release of NTA makes use of Microsoft’s latest version of their SQL columnstore based flow storage database.  Columnstore databases organized and query data by column, rather than row index. They are the optimal technology for large-scale data warehouse repositories, like massive volumes of individual flow records. Our testing and our beta customer experiences indicate that columnstore indexes support substantial performance improvements in both querying data, and in data compression efficiency.

 

NTA was an early adopter of columnstore technology to enhance the performance of our flow storage database. As Microsoft’s columnstore solutions have matured, we’ve chosen to adopt the MS SQL 2016 and later versions as the supported flow storage technology. That offers our customers the ability to standardize on MS SQL across the Orion platform, and to manage their monitoring data using a common set of tools with common expertise. We’ve made deployment and support simpler, more robust, and more performant.

 

Leaner

This same columnstore technology also runs more efficiently with the existing resource footprint. This solution builds and maintains columnstore indexes in memory, and then manages bulk record insertions with much less intensive I/O to the disk storage. CPU required to build indexes is also substantially less intensive than our previous versions. As a result, this version will make better use of the same resources to run more efficiently.

 

More Secure

This version of NTA supports TLS 1.2 communication channels, required in many environments to secure communications with client users.

 

Beginning in this version, NTA will explicitly monitor network flows that are destined to TCP or UDP service port 0. Traffic that’s addressed to TCP or UDP port 0 is either malformed – or malicious traffic. This port is reserved for internal use, and network traffic on the wire should never appear addressed to this port. By highlighting and tracking flows addressed to port 0, NTA helps network administrators to identify sources of malicious traffic that may be attacking hosts in their network, and providing the information they need to shut that traffic down.

 

NTA will surface port 0 traffic as a distinct application, so the information is available in all application resources.

NTA Port 0 Traffic

Supported Database Configurations

This version of NTA maintains a separate database for Flow Storage. NPM also maintains the Orion database for device and interface data. Both of these databases are built in MS SQL instances.

 

New installations of NTA and upgrades to version 4.4 and later will require an instance of MS SQL 2016 Service Pack 1 or later version for flow storage. For evaluation, the express edition is supported. For production deployments, we support the Standard and Enterprise editions.

 

When upgrading to this version from older version on the FastBit database, data migration is not supported. This upgrade will build out a new, empty database in the new MS SQL instance.  The existing flow data in the FastBit database will not be deleted or modified in any way. That data can be archived for regulatory requirements, and customers can run older product versions in evaluation mode to temporarily access the data.

 

In the current NTA product, we require a separate dedicated server for Flow Storage. The simplest upgrade would use that dedicated server with the new release to install an instance of MS SQL 2016 SP1 or later for flow storage. Many of our customers will be interested in running both the Orion database and the NTA Flow Storage database in the same MS SQL instance. We support that, but for most customers that will take some planning to consolidate and to appropriately size that instance to support both databases.

 

Here's a more detailed discussion of NTA's New MS SQL Based Flow Storage Database. Also, a knowledge base article on NTA 4.4 Adoption is available, with frequently asked questions.

 

We’re doing some testing now to provide some performance guidance for key performance indicators to monitor. One of the benefits of using MS SQL technology for both of these databases is that there are many common tools and techniques available to monitor and tune MS SQL databases. We plan to provide guidance for both monitoring, and deployment planning.

 

Conclusion

Please visit the NetFlow Traffic Analyzer Forum on THWACK to discuss your experiences and new feature requests for NTA.

I am very excited to announced that Solarwinds NCM 7.8 is available for download in the Customer Portal! This release brings many valuable features and the release notes are a great resource for these.

 

Network Insight for Cisco Nexus
This is the third iteration in our Network Insight series and in this release we have extended those insights to Cisco Nexus. We understand that your Cisco Nexus devices are a sizable investment and come with a host of valuable features and that you also expect deeper insight from your Solarwinds monitoring and management tools as a result. This meant that we had to go back and develop some new features and expand on existing ones to ensure that the relevant information you need is presented properly. It means that your workflows are logical and more time efficient.

 

 

Virtual Port Channels

One of the really awesome features of a Cisco Nexus, that comes with a good deal of complexity, is the ability to create and deploy vPCs. vPCs operate as a single logical interface, but are actually just a group of interfaces working together. What this means is that managing vPCs can become a time drain, as the number of vPCs increases and as the number of interfaces on each vPC pair increases. Network Insight provides a view to show each vPC and the member interfaces in each of those vPCs. This is covered in the NPM v12.3 release blog.

 

In addition to this view, there is another layer of detail that shows the configuration of each vPC and its member interfaces. To see this detail you will click on "View Configs" on the vPC page. This page displays the configuration details for each of the side of the vPC and the configurations of each member interface. This allows you to save time by more efficiently identifying configuration errors within the vPC and the member interfaces. I think we can all agree that not having to hop across multiple windows and execute manual searches or commands to find issues is a major workflow improvement!

 

The example below is a vPC with multiple member interfaces:

 

Virtual Device Contexts

As it is covered here, each VDC is essentially a VM on a Cisco Nexus (also Cisco ASAs!) and each context is configured separately and provides its own set of services. These configurations are downloaded and backed up by NCM. They are also referenced for all the features in this release.

 

To manage a context in NCM, one just needs to click "Monitor Node" and it will walk through node addition process, after that has concluded each configuration is downloaded and stored separately.

 

Access Control Lists

ACLs define what to do with the network traffic. ACLs are very complicated to manage because within each ACL are rules (Access Control Elements) and within these are object groups. The object groups are containers that house specific information for the given rule like the interfaces that you might block a particular MAC address from traversing. The layering creates some problems. Manually you need to verify the rules are handling traffic by examining the hit counts, and that none of the rules are shadowed or redundant. Lastly, to ensure we met all of your needs for ACLs we extended the existing functionality of Access Control Lists (ACLs) beyond Port Access Control Lists (PACLs) and VLAN Access Control Lists (VACLs), to include MAC ACLs and non-contiguous subnet masks.

 

ACLs are super easy to add and once the Nexus nodes are added to NCM, it will automatically discover ACLs and grant you access to all the information available inside those ACLs. You won't need to spend copious amounts of time digging into each ACL, determining if changes occurred, and what changes occurred.

 

To see the list of ACLs for a particular Nexus, mouse over the entities on the side panel and select “Access Lists.”

Access Control List Entity View

 

With this view you are able to see the historical record of ACLs, including the date and time of each revision, and if there are any overlapping rules inside of each version of the ACL. To expose the previous version for viewing just expand the view. From this same screen you are able to view the ACL details and also compare against the next most recent, older revision, or a different nodes ACL.

ACL detail view and rule alerts

 

When you navigate into the ACL, each of the rules in that ACL are displayed including all the syntax for that ACL. In this view each rule provides a hit counter, making it easy to see which rules are impacting traffic and which ones are not. You are also able to drill down into the object groups.

 

Viewing conflicting rules is simple in NCM. Expanding on the alert, you can see the shadowed or redundant rules.

  • Redundant: a rule earlier in the list overlaps this rule, and does the same action to the matched traffic.
  • Shadowed: a rule earlier in the list overlaps this rule, and does the opposite action.

 

Interface Config Snippets???

At some point during the course of your day you will have identified one or many interfaces that warrant deeper inspection. Based on feedback from many of you, we discovered that once you reached this point you needed to see more information. Specifically, information about that interface and the interface configuration information. Normally you would have had to dig into overall running or startup configs requiring you to navigate away from the interface screen. This is why we created where interface config snippets and this is probably one of my favorite features in this Network Insight release.

 

These snippets are the running configurations of the specific interface you are viewing.

Interface Config Snippet


Once you have found the snippet on the page, you are able to verify which configuration this snippet is pulled from and the date and time of when it was downloaded.

Interface Config Snippet details + history

 

Conclusion

That is all I have for now on this release but I recommend you go check out our online demo and visit the customer portal to click through this functionality and see all the great features available in this release. My fellow cohort cobrien put together a great blog on Network Performance Monitor's v12.3 release for Network Insight and I highly recommend that you head over and give it a read! I look forward to hearing your feedback once you have this new release up and running in your environment!

 

Starting with NPM 12.2, SolarWinds has embarked on a journey to transform your Orion deployment experience with fast and frequent releases of key deployment components. The first step was revamping the legacy installer to the new and improved SolarWinds Orion installer. The installer was able to deploy new or upgrade an entire main poller in one seamless session. The second iteration of the installer released the capability to do the same for your scalability engines. In this release NTA has been updated to utilize a MSSQL database, allowing us to happily say that the SolarWinds Orion installer is truly an All-in-One installer solution for your Orion deployment. For NPM 12.3, we have made tremendous scalability improvements that allow you to utilize even more scalability engines. As a result, your Orion deployment upgrades gain in complexity, so the installer team is providing additional updates to how you can stage your environment for minimal upgrade time.

 

Normal Upgrade Process

 

Using the All-in-One SolarWinds Orion installer, your upgrade process will look like the following.

 

Step one:

 

Review all system requirements, back up your database and if possible snapshot the Orion deployment. This will be especially important in this release, as the NTA Flow Storage database requirements have changed. Note: Flow Storage database refers to the database instance that stores NTA collected flow data. In previous versions this was utilizing a Fastbit database, but in this release has been updated to use MSSQL with a minimum version of 2016. An Orion database is the primary database that stores all polled data from NPM and other Orion products.

 

Step two:

 

Download the NPM 12.3 installer, selecting either the online or the offline variant according to your system requirements. Note: the SolarWinds Orion installer is

 

Step three:

 

Run the installer on your main poller and upgrade it to completion. If you have any other Orion product modules installed, the installer will upgrade this instance to the latest versions of those modules at the same time to maintain compatibility with the new Orion Platform 2018.2. If there are new database instances to be configured, that will be handled during the Configuration Wizard stage of the main poller upgrade. This release of the installer has a new type of preflight check that requires confirmation from you before proceeding. In the example below, is one for the NTA upgrade. Click for details to see the confirmation dialog and select yes or no.

 

Configuration Wizard step for NTA:

 

Step four:

 

If you don’t have any scalability engines, e.g Additional Polling Engines, Additional Websites or HA Backups you’re ready to explore all of the new features available in this version!

 

Scalability Engines

 

For those environments utilizing scalability engines or for those who are looking to try them out, this section will guide you through the process of deployment. Even if you have not utilized scalability engines previously, trying them out to test the scale improvements is incredibly easy. Like every SolarWinds Orion product, they are available for an unlimited 30-day free evaluation.

 

Deploying a fresh scalability engine is handled with the same installer that you downloaded for the main poller.

 

1. Copy the installer to your intended server and Click to “Run as Administrator”

 

Note: If you downloaded the offline installer, which is about 2 GB, the download process to your server can take some time and does not currently stage the scalability engine for faster upgrade. In the future, this is something we’d like to improve but is not an available feature for this release.  if you’d like to shorten the initial download of installer file to server, you can always use the online installer to set up your scalability engine. This installer file is about 40 MB so the download of installer file time to the server is much shorter. This will still meet offline requirements because when selecting the “Add a Scalability Engine” option, it will choose to download from the main poller to maintain version compatibility and does not require internet access. As always, the 40 MB scalability engines installer is also available for download from the All Settings -> Polling Engines page.

 

2. Select the “Add a Scalability Engine” option.

 

first screen of installer

 

3. Similar to the main poller upgrade process, at this point system checks that are specific to scalability engines will be run.

 

Note: Anything tagged as a blocker may need confirmation or action from you before proceeding.  If this is the case, address those issues and run the installer again. Things that are tagged as a warning or informational message are simply for your awareness and will not prevent your installation from proceeding.

 

4. Select the type of scalability engine that you are looking to deploy, and then complete the steps in the wizard to finish your installation per your normal process.

 

 

Upgrading a scalability engine, is also handled through the same installer. However, this is where you have an opportunity to utilize our staging feature.

Note: If you were to proceed with your normal practice of putting the scalability engines installer on each server you need to upgrade, and then manually upgrading, that process will work perfectly well with no changes. Please read through the “Staging Your Environment for your Scalability Engines Upgrade” section below to see the alternative workflow that allows you to stage your environment.

 

Staging Your Environment for Your Scalability Engines Upgrade

 

For customers with more than a handful of scalability engines or with some distributed over WAN links, we noticed that they were occasionally experiencing extremely high download times from their main poller to their scalability engines. In addition, there was no centralized area where one could see the upgraded state of the scalability engines. Navigate to "All Settings", and click "High Availability Deployment Summary" and you will see the foundational pieces for an Orion deployment view.

 

The Servers tab contains the original High Availability Deployment Summary content, and is where you can continue to set up additional HA pools and HA environment.

 

Check out the new Deployment Health tab! You may not have heard of our Active Diagnostics tool, but it comes prepackaged with every install of the Orion Platform with test suites designed to test for our most common support issues. We've brought that in depth knowledge to your web console in the new Deployment Health view. With nightly run tests across your Orion Deployment, every time you come to this page you will see if there are any issues that could be a factor in the performance of Orion or your upgrades.

 

You are able to refresh a check if you're working on an issue and wish to see an updated test result. If there are tests that you don't want to address, silence them to hide the results from the web console. Click on the caret to the right and you'll be able to see more details and a link to a KB article that will give you remediation advice.

 

On the Updates tab is where you will be able to stage your scalability engines.

 

The first page of the wizard will let you know if there are updates that are available to be installed on your scalability engines. At this point you've upgraded your main poller, so there are definitely updates available!  Click "Start" to get started!

 

The second page is where we are testing the connection to each of the scalability engines. If we are able to determine the status of these engines, we'll give you the green light to proceed to the next step. Common issues that could prevent this from being successful could be that the SolarWinds Administration Service has not been updated to the correct version or is not up and running at this point. Click "Start Preflight Checks" to proceed.

 

Similar to the Deployment Health tab, these are running preflight checks across your Orion Deployment. You'll be able to see all of the same preflight checks that were available through the installer client, except centralized to one view. If there are blockers present on this screen, you can still proceed in this flow if at least one scalability engine is ready to go, but please note down those scalability engines with blockers. You will need to address those blockers before an upgrade can occur on those servers. Click "Start download" to start the staging process.

 

 

 

At this point, we are starting the download process of every msi needed to upgrade your scalability engines. In this example, I'm only staging one scalability engine, but if you  have multiple, you can see the benefits in time savings right away! All of the downloads will be triggered in parallel.

Sit back and relax as we stage your environment for you. You can even open up RDP sessions to those servers with one click from this page.

 

When everything has finished downloading, we will let you know which servers are ready to install. Click on the "RDP' icon to open your RDP session to the server.

 

On your desktop, you should see the SolarWinds scalability engines installer waiting for you to click on and finish the upgrade.

 

Visually you will run through the same steps that you normally would in clicking through the installer wizard. However, when you actually get to the installation part, you'll notice that there is no download appears in the progress bar. Finish your upgrade and move on to the next!

 

I hope you enjoy this update to how you can upgrade your Orion Deployment. I'm always looking for feedback on how we make this as streamlined as possible for you.

NPM 12.3 is available today, May 31st, on the Customer Portal!  The release notes are a great place to get a broad overview of everything in the release.  Here, I'd like to go into greater depth on Network Insight for Cisco Nexus including why we built it and how it works.  Knowing that should help you get the most out of the new tech!

 

Network Insight

What's all this "Network Insight" talk?  If you haven't heard of this big theme we've been building on a few years, start here.  If you know the story, skip ahead to the Network Insight for Cisco Nexus section.

 

We live in amazing times.  Every day new technologies are invented that change how we interact, how we build things, how we learn, how we live.  Many (most?) of these technologies are only possible because of the relatively new ability for endpoints to talk to each other over a network.  Networking is a key enabling technology today like electricity was in the 1800s and 1900s, paving the way for whole wave of new technologies to be built.  The better we build the networks, the more we enabling this technological evolution.  That's why we believe in building great networks.

 

A great network does exactly one thing well: connects endpoints.  The definition of "well" has evolved through the years, but essentially it means enabling two endpoints to talk in a way that is high performance, reliable, and secure.  Turns out this is not an easy thing to do, particularly at scale.  When I first started maintaining, and later building networks, I discovered that monitoring was one the most effective tools I could use to build better networks.  Monitoring tells you how the network is performing so you can improve it.  Monitoring tells you when things are heading south so you can get ahead of the problem.  Monitoring tells you if there is an outage so you can fix it, sometimes even before users notice.  Monitoring reassures you when there is not an outage so you can sleep at night.

 

Over the past two decades, we believe as a company and as an industry we have done a good job of building monitoring to cover routers, switches, and wireless gear.  That's great, but virtually every network today includes a sprinkling of firewalls, load balancers, chassis switches, and maybe some web proxies or WAN optimizers.  These devices are few in number, but absolutely critical.  They're not simple devices either.  Monitoring tools have not done a great job with these other devices.  The problem is that we mostly treat them like just another router or switch.  Sure, there are often a few token extra metrics like connection counts, but that doesn't really represent the device properly, does it?  The data that you need to understand the health and performance of a firewall or a load balancer is just not the same as the data you need for a switch.  This is a huge visibility gap.

 

Network Insight is designed to fill that gap by finally treating these other devices as first class citizens; acquiring and displaying exactly the right data set to understand the health and performance of these critical devices.

 

Network Insight for Cisco Nexus

Network Insight for Cisco Nexus is our third installment in the Network Insight story, following Network Insight for F5 and Network Insight for ASA.  Nexus chassis switches are used to build high performance, scalable, and virtually indestructible data center networks.  Thats why Nexus are at the heart of many of the largest data centers.  Nexus are switches so our traditional switching data is still important, but a $300k chassis switch has a lot of additional capabilities that a $5k switch does not.   As you saw with F5 and ASA, Network Insight for Cisco Nexus takes a clean slate approach.  We asked ourselves (and many of you) questions like:

 

  • What role does this device play in connecting endpoints?
  • How can you measure the quality with which the device is performing that role?
  • What is the right way to visualize that data to make it easiest to understand?
  • What are the most common problems that occur with this device?  What are the most severe?
  • Can we detect those problems?  Can we predict them?

 

With these learnings in hand, we built the best monitoring we could from the ground up.

 

VDC Aware

 

Similar to ASA's, Nexus can be split into virtual instances.  Nexus calls them Virtual Device Contexts while ASA calls them Contexts.  VDCs are to Nexus what VMs are to servers, allowing a single piece of hardware to be split into several logical nodes.  Each logical node, or VDC, is configured separately and provides a full set of technology services.  All of the features you read about below discover complete information about each VDC.

 

Adding the Admin VDC for a Nexus to monitoring lets NPM map out all of the VDCs, which will then appear on the Node Details screen:

 

Anytime you go to Node Details for any of the VDCs, you'll get this new resource so it's easy to navigate between them.  NCM users will also find it easier than ever to make sure all of their VDCs are backed up.  If you're well setup for catastrophic failures, they're less likely to occur, right?  More info on what NCM is doing for VDCs can be found here.

 

So Many Interfaces

 

The first big difference between Cisco Nexus and most other devices is simple interface count.  Thanks to the distributed nature of a Nexus deployment, particularly Fabric Extenders, a single Nexus 7k is likely to have hundreds or even thousands of ports.  Dealing with thousands of ports on a single device is different than dealing with the usual couple dozen, and we wanted to make sure this fundamental part of Nexus monitoring was done right.

 

First, the Node Details page now contains a simple summary of all of the interfaces:

 

Like Network Insight for ASA, we have a new sub-view for each major technology service provided by the device.  Clicking on Interfaces, in the above resource or on the sub-view tabs on the left, will bring you to the Interfaces sub-view showing all interfaces.  Clicking on any of the status icons or numbers will bring you to a list of only those interfaces.

 

This is built on the relatively new List View that's part of our Unified Interface Framework.  UIF is an important component to make sure the UI across all Orion Platform based tools from SolarWinds have a consistent UI experience so when you learn how to do something in one tool, you know how to do it in all tools.  The list view is made for management of large lists, including:

  • Multi-level filtering, for example, interfaces with status Up AND (utilization Warning OR Critical).
  • Colored highlighting of values over your thresholds for that specific entity.
  • Sorting
  • Searching
  • Pagination control with up to 100 items per page.

 

I particularly like the search function for looking up ports on a certain module.  Entering a "1/" in the search field will show you all the ports on slot 1.  Easy.

 

These are straight forward improvements but I think you'll find it much more pleasant dealing with the large interface counts on your Nexus devices.  And good news: we extended this sub-view to all nodes so you have a super polished interface interaction model on your smaller switches too.

 

Virtual Port Channels

 

A big part of why people are willing to shell out for the huge cost of a Nexus is more reliable connectivity to endpoints like servers.  Nexus should provide an order of magnitude higher reliability connectivity to servers.  Cisco accomplishes with vPCs, a Multi-Chassis Etherchannel (MCEC) technology that allows a single endpoint to uplink to multiple switches.  Traditional port channels can only connect a single upstream switch, resulting in a single point of failure.

 

Believe it or not, vPCs are a serious departure from how networking works.  In fact, a pair of Nexus have to "conspire" (a fancy word for lie) to present themselves as a single switch to the endpoint they're connected to.  Cisco has a bunch of technology to make it work, and in our research we found this was making it hard for administrators to understand, monitoring, and troubleshoot their vPCs.  When we dug into this, we found that expert administrators will spend several minutes to understand the health of a single VPC.  They do things like:

  • Login to Nexus
  • "show vpc"
  • "show interface port-channel..."
  • "show interface...", repeat 2-4 times
  • "show run interface...", repeat 2-4 times
  • Find peer switch, login, and do all the commands again.

 

When all is said and done, they've mapped 5, 7, 9, or even more different components, each with its own status, performance, and config.  Our goal was to have this expert level data set available to experts and non-expert users in seconds.  The vPC tab accomplishes that:

On the left we see the vPCs.  Each vPC is mapped to the local port-channel.  We find the peer switch and map the vPC to the port-channel on the peer.  Mousing over allows you to see the member ports of each port-channel and navigate to them:

Again we're using the List View, so you have filtering, sorting, searching, pagination, and so forth as expected.  Click to drill into any interface for all the details we have about that interface.  Of course all of this is can be alerted upon and reported on to keep you ahead of problems without staring at monitoring all day.  There's some really cool additional stuff you can do with NCM specific to vPCs.  If you're interested, check out their upcoming post.

 

During beta and RC we found environments where folks had spent hundreds of thousands to more than a million dollars and countless hours setting up high resiliency.  Once they pointed NPM at their Nexus, they found that resiliency had deteriorated over time.  They had failures and the redundancy saved them, but it also meant they didn't know the problem existed so they never restored redundancy.  This leaves them one failure away from a catastrophe in a multi-million dollar high redundancy environment.

 

If you're in IT, you're strapped for time.  Our monitoring tools have to help us do better here.  I'm happy that NPM will now help you keep your vPCs running clean!

 

Access Lists?!

 

One thing that surprised me is how many of you are running ACLs on your Nexus.  There's a trend of moving security closer to the endpoint, and Nexus devices are the access layer for many data center environments.  This results in lots of Port Access Control Lists (PACLs) and VLAN Access Control Lists (VACLs).  Fortunately, we recently worked on this for Cisco ASA.  The latest NCM release extends and enhances the ACL backup and analysis capability, including new support for MAC ACLs and non-contiguous subnet masks.  All of the Access List functionality is based on pulling and analyzing configs, so you'll need the NCM tool to get this feature.  Check it out NCM's post  - and also, bonus, my favorite part: Interface Config Snippets!

 

Traditional Routing and Switching

 

While working on the enhanced capabilities, we also revisited some core technology of ours to make sure it was covering Nexus well.  Things like routing protocol monitoring and hardware health should work better than ever.  We think we've got everything covered but there's a huge number of combinations of hardware (platform and modules) and software (trains and versions).  If you notice any gaps please shoot me a private message with the data that's not showing up for you and a SNMP walk of your device.

 

Setup

 

I would have started this guide with setup if not for the fact that setup is so darn easy.  To get this feature working, add a node as usual and you'll notice a new check box on the last step of the Add Node Wizard:

 

 

Check that box, enter your CLI creds (read only is fine) and you're good to go.  If you have existing Nexus under monitoring and you'd like to get the enhanced monitoring, head over to manage nodes.  You can edit an individual node and check this box, or you can find all of them with Machine Type and/or search and enable all at once.

 

There's nothing else you need to configure or define.  Simple right?

 

Other Deep Dives

 

We've got a couple other deep dives for new Orion Platform features included in NPM 12.3.  Check 'em out!

 

Orion Platform 2018.2 Improvements - Chapter One

Orion Platform 2018.2 Improvements - Chapter Two - Intelligent Mapping

Orion Platform 2018.2 Improvements - Chapter Three

 

Conclusion

 

That does it for now.  You'll be able to click through the functionality yourself in our online demo starting around June 6th.  If you're on active maintenance for NPM, head over to the Customer Portal to get your upgrade now.  I'd love to hear your feedback once you have it running in your environment!

Starting with VMAN 8.0, and continuing with 8.1  we've streamlined how you deploy and use VMAN.  Virtualization Manager 8.2 , the latest edition of these efforts, is now available on your Customer Portal.

 

One of the biggest pain points that surfaced over the last 2 releases was that the process for adding virtualization nodes to be monitored was not intuitive. This is solved with a new simplified workflow!

 

Whether choosing to add a Node or setting up a Discovery job, we've updated those entry points to direct you to the new, separate workflow.

 

Add a Node - Select VMware vCenter or Hyper-V devices
Network Discovery - Add VMware vCenter or Hyper-V devices
All Settings -  Add VMware vCenter or Hyper-V devices

 

Once you click on any of those entry points you'll be able to get started monitoring your environment with a few simple clicks.

 

Add a Virtual Object for Monitoring
See the thresholds that apply to your virtualization manager entities
Click Finish and you're successfully on your way to monitoring your virtualization environment

 

If you identified any thresholds that you'd like to tweak, simply navigate to All Settings -> Virtualization Settings to update your thresholds. Within a few clicks, you're ready to take advantage of capacity planning, recommendations and much more!

 

Get Started with Documentation

VMAN 8.2 Release Notes

VMAN 8.2 Getting Started Guide

VMAN 8.2 Administrator Guide

VMAN 8.2 Deployment Sizing Guide

Applications talk to each other, and you should know who they are talking to

 

Applications constantly rely on communication between different servers to deliver data to end-users. The more applications end-users require to do their job, the greater the complexity of application environments and those communication based relationships.

With the release of Server & Application Monitor 6.6, we introduced an Orion Agent based feature, called Application Dependencies, which enables system administrators to quickly gain an understanding of which applications servers are talking to one another, as well as see related metrics, to help with troubleshooting application performance issues.

 

How do you enable it?

The ability to discover and map Application Dependencies is enabled by default. This allows SAM to actively collect inbound and outbound communication at the application process level. This is paired with an ability to collect connection related metrics (latency and packet loss), which is disabled by default. You can find all of the configuration options in the Application Connection Settings section of the Main Settings & Administration screen.

 

What does it show you?

At its core, Application Dependencies help you understand if application performance issues are associated with server resource utilization or network communication. For example, Microsoft Exchange is heavily dependent on Active Directory for authentication and other services. Application Dependencies show you the relationship, and the communication, by adding a few new resources in SAM.

 

The two main areas where you can see the Application Dependency information. One area is in a new widget that is available on application and node details pages. This widget will show you the discovered application dependencies, specific to that monitored application or node. Notice in the screen below that you can see where multiple Exchange servers have a dependency on the Active Directory server, ENG-AUS-SAM-62, and more specifically the Active Directory service that is running on it.

 

The second area where you can see Application Dependency information is in the connection details page, which is linked from the above mentioned connections widget. This will allow you to see all of the application monitors, and associated processes, process resources metrics, and ports, responsible for the discovered communication, between two specific nodes. You will also see the latency and packet loss data, if you have enabled the Connection Quality Polling component. The screen below shows the relationship between ENG-AUS-SAM-62 (Active Directory) and ENG-AUS-SAM63 (Exchange), in greater detail.

What’s going on under the covers?

There are two, new Orion Agent plug-ins that help deliver this new functionality. One is the Application Dependency Mapping plug-in, and the other is the Connection Quality Polling plug-in.

The Application Dependency Mapping plug-in is responsible for collecting the active connection data from the server. That information is then sent back to the Orion Server, where it is correlated with component monitor and node data, already being collected by SAM (Note: You must have at least one component monitor, like the process monitor, applied to the server). As SAM matches the collected data from the different application servers, it creates the connection details pages and populates the connection widget.

 

The Connection Quality Polling plug-in is responsible for a synthetic probe, which measures latency and packet loss. This accomplished by sending TCP packets to the destination server, on the specific port identified by the active connection information collected by the Application Dependency Mapping plug-in. It is important to note that the Connection Quality Polling plug-in includes the NPCAP driver for use with this synthetic probe.

 

If you would like to read more about how this feature works, you can find more information in the SAM administrator guide.

 

Is that it?

Application Dependencies is not the only feature that was released in SAM 6.6. You can read more about the other features in the release notes. You can also check out Application Dependencies, live in action, in the online demo.

I am happy to announce General Availability of Storage Resource Monitor 6.6.   This release continues the momentum of supporting FLASH and HYBRID arrays that were highly requested by you on THWACK!  We've also updated to SRM to the latest version of the Orion® Platform and installer, so you'll enjoy the benefits of easier upgrades and participation in all the latest Orion® features.  Check out the SRM 6.6 Release Notes for more information about installing, upgrading and new features and fixes.

 

New Array Support

Support includes all the standard features you love: capacity utilization and forecast, performance, end to end mapping in AppStack, and integrated performance troubleshooting in PerfStack.  We also were able to squeeze in Hardware Health for all these arrays too!

  • EMC Unity
  • HPE Nimble
  • INFINIDAT InfiniBox
  • IBM V9000

Now for some screenshots for your viewing pleasure!

 

Monitoring EMC Unity

SummaryBlock StorageFile StorageHardware Health
EMC Unity Monitoring SummaryEMC Unity Block Storage Monitoring EMC Unity File Storage Monitoring EMC Unity Hardware Health Monitoring

 

Monitoring HPE Nimble Monitoring

SummaryBlock Storage File Storage Hardware Health
HPE Nimble Monitoring SummaryHPE Nimble Monitoring Block HPE Nimble Monitoring FileHPE Nimble Monitoring Hardware Health

 

Monitoring INFINIDAT InfiniBox

Summary
Block Storage
File Storage
Hardware Health
INFINIDAT InfiniBox Monitoring SummaryINFINIDAT InfiniBox Monitoring Block INFINIDAT InfiniBox Monitoring FileINFINIDAT InfiniBox Monitoring Hardware Health

 

Monitoring IBM V9000

You'll have to try this one yourself, looks the same as our monitoring for IBM SVC.

 

WHAT'S NEXT?

Don't see what you are looking for here? Check out the What we are working on for SRM after v6.6 -- Updated on Apr 2, 2018 post for what our dedicated team of storage nerds and code jockeys are already looking at.  If you don't see everything you've been wishing for there, add it to the Storage Manager (Storage Profiler) Feature Requests.

 

Note: The SolarWinds trademarks, service marks, and logos are the exclusive property of SolarWinds Worldwide, LLC or its affiliates.  All other trademarks are the property of their respective owners.

No, you haven't entered a multidimensional time warp. Nor are you having a 90's flashback. While the industry hype cycle is primarily focused on hot new trends like hybrid IT, SaaS, and containers, there lurks an unsung hero in the darkest dwellings of many of today's most established organizations. It oftentimes doesn't get the attention or appreciation it deserves, because to most, its existence is completely transparent. It sits there in the corner, just plugging away, day after day, hour after countless hour, without complaint or need for recognition. Yet these systems remain at the very core of the business, handling the most critical transactions. From maintaining patients medical records, keeping all your banking transactions in order, to running some of today's largest companies CRM and ERP applications, AIX is still very much around us every day, touching our lives in ways you probably haven't even considered.

 

For as important as these systems remain even today, the monitoring of their performance and application health is far too often overlooked or completely forgotten. Perhaps it's because these workhorses were built to last and seldom fail at their important duties, making them fall into the dangerous category of out-of-sight-out-of-mind. More likely, however, is that these systems have traditionally been extremely difficult to gain visibility into using modern day multi-vendor monitoring solutions. That may be because a long time ago, IBM seemingly stole a page out of Sony's playbook of market dominance, which had propelled their proprietary Betamax and MiniDisk formats into the iPhone like successes of their day. Oh, wait. That's not what happened! That's not what happened at all!!

 

Unfortunately, despite strict and very well defined industry standards which would govern how key operating system metrics should be exposed, and allowing third-party monitoring solutions to provide necessary insight into their health and performance, IBM decided that standards didn't necessarily apply to them. This decision has historically made monitoring AIX systems challenging for both their customers, as well as, 3rd parties seeking to provide a monitoring solution for those organizations most critical systems. Compounding this problem is the fact that the few monitoring solutions available to those customers have traditionally been wildly complex, difficult to deploy and configure, and even more challenging to maintain.  A new solution was needed. One which could bring with it unexpected simplicity, where none existed before. Its time has come, and that time is now.

 

 

AIX Agent Deployment

 

As one would expect from SolarWinds, deployment of the AIX agent is a simple turnkey affair, no different than deploying Agents to other operating systems, such as Windows or Linux. That's right, deploying an Agent to AIX is just as simple as it is for Windows and you don't need to be an expert in AIX. In fact, you don't even need any experience using AIX to be successful monitoring these systems with Server & Application Monitor (SAM) 6.6. If you can add a Node in Orion, then you, too, can monitor your AIX 7.1 and 7.2 systems.

 

Add Node Wizard - Push Deployment

 

To begin, navigate to [Settings -> All Settings -> Add Node], enter the IP address or fully qualified hostname of the AIX host you'd like managed in the "Polling Hostname or IP Address" field and select the "Windows & Unix/Linux Servers: Agent" radio button from the available "Polling Method" options. Next, enter the credentials that will be used to both connect to the AIX host and install the agent software on the 'Unix/Linux tab. The credentials provided here should have 'root' or equivalent level permissions. Note that the credentials provided here are used only for initial deployment of the agent. Future password changes of the account credentials provided here will have no impact on the agent once it is deployed. Alternatively, if you authenticate to your AIX host via SSH using a client certificate rather than a username and password, click the 'Certificate Credential' radio button and upload your certificate in PEM format through the Orion web interface. This certificate will then be used to authenticate to the AIX host for the purpose of installing the Agent.

 

You can also optionally add SNMP credentials to the Agent if SNMP has already been configured properly on the AIX system. Rest assured, though, that this isn't needed and is used only if you're wanting to utilize SAM's SNMP Component Monitors against the AIX system. Configuring this option will also populate the 'Location' and 'Contact' fields located on the 'Node Details' resource if those values have been properly populated in your SNMP configuration. Everything else will be polled natively through the AIX Agent with zero reliance upon SNMP.

 

 

Once you've entered your AIX credentials, click the 'Next' button at the bottom of the page. The Agent will then be deployed to the AIX host using a combination of SSH and SFTP requiring TCP port 22 be open from the Orion server (or additional polling engine) to the AIX endpoint you wish to manage for push deployment to function properly.

 

Install Agent PromptInstall Agent Progress IndicatorList Resources

 

Manual - Pull Deployment

 

In some scenarios, it may not be possible for the Orion server to push the agent to the AIX host over SSH. This is not uncommon when the host you wish to manage resides behind a NAT or access control lists restrict access to the AIX system via SSH from the network segment where the Orion server resides.  While firewall policy changes, port forwarding, or one-to-one address translations could be made to facilitate push deployment of the agent, in many cases, it may be far easier to perform a manual deployment of the agent to those hosts.

 

The Agent package can be downloaded from the Orion web interface to the AIX host by going to [Settings -> All Settings -> Agent Settings -> Download Agent Software] and selecting "Unix/Linux" from the options provided and clicking "Next".

 

Download Agent Software - Machine Type
Download Agent Software - Deployment Method

 

In the following step of the Wizard select "Manual Install" and click "Next". Finally, in the third and final step of the wizard is where you will select 'IBM AIX 7.x' from the 'Distribution' drop down. Here you can also configure any advanced options the agent will use when it is installed, such as which polling engine the Agent should be associated with in Agent Initiated (Active) mode, or the listening port the Agent will use when running in Server Initiated (Passive) mode. Additionally, you can also specify a proxy server the Agent should use to communicate to the Orion server or Additional Polling Engine in Agent Initiated (Active) mode. If you're deploying in an environment where proxy servers are used, fret not. The Agent's proxy configuration fully supports the use of authenticated proxies.

 

 

After selecting all the appropriate configuration options, click the "Generate Command" button at the bottom of the page. This will generate a dynamic installation command based upon the settings chosen above, which can then be copied and pasted into an SSH or X-Windows session on the AIX host. The AIX machine will then download and install the appropriate agent software from the Orion server using those pre-configured options.

 

Copy Generated Agent Installation CommandPaste Command into SSH TerminalAgent Installation Success

 

As soon as the Agent is registered with the Orion server, select your newly added agent node and click "Choose Resources" from the 'Manage Agents' view to select items on the node you would like to monitor.

 

 

Agent Advantages

 

So what's so great about Agents anyway? What's wrong with using the tried and true agentless methods for monitoring AIX hosts, like SNMP?

 

Encryption

 

Well, as anyone who has the misfortune of using SNMP to monitor their AIX hosts can tell you, it's not all sunshine and lollipops, starting with configuring SNMP. Most environments today have strict security standards which mandate the use of encryption for virtually all network communication. While configuring SNMP v1/v2 on AIX isn't especially difficult for an experienced AIX administrator, neither of those versions of SNMP utilizes encryption. That would necessitate that users utilize SNMPv3, which comparatively speaking, practically requires users obtain a Ph.D. from Big Blue University in AIX to properly enable and configure.  By comparison, the Orion AIX Agent natively utilizes highly secure 2048 bit TLS encryption for all network communication.

 

Visibility

 

IBM's proprietary SNMP daemon leaves much to be desired when compared to other standard based SNMP daemons running on alternative operating systems. Chief among the complaints I hear regularly is that IBM's SNMP daemon doesn't support important standard MIBs, such as the HOST-RESOURCES-MIB which exposes key pieces of information regarding running processes on the server and their respective resource consumption. This remains the primary reason why so many customers have chosen to replace IBM's proprietary SNMP daemon with NET-SNMP.  SImilar to NET-SNMP, though, are issues of reflecting critical metrics accurately, such as memory utilization. It seems odd that something so basic would present so many challenges and be pervasive across both Linux and AIX when monitored via SNMP.

 

Reliability

 

Like all Agents in Orion, the AIX Agent runs independently of the Orion server. This means the Agent continues to monitor the host where it's installed, even if the Orion server is down, or otherwise inaccessible due to a network outage. Once connectivity is restored or the Orion server is brought back online, the data collected by the AIX agent is then uploaded to the Orion server, filling gaps in the historical time series charts that would have otherwise existed if that node was being monitored via SNMP. This ensures that availability reporting is accurate, even if the server running Orion experiences a catastrophic failure.

 

Reachability

 

In today's highly secure and heavily firewalled environments which are riddled with network obstacles such as network address translation, port address translation, access control lists, and proxies, it's sometimes amazing that anything works at all. More and more the things that need to be monitored are oftentimes the most difficult to monitor. With the AIX agent, overcoming these obstacles is a snap, allowing users to monitor their AIX systems regardless of where they might be located in the environment. Have your Orion system deployed in the Cloud and running in Amazon's AWS or Microsoft's Azure? Not a problem. Deploy the Agent in Agent Initiated (Active) Mode and forget about VPN tunnels or 1:1 NAT mappings. Does all traffic leaving the network go through a proxy server? No problem. The Agent natively supports the use of authenticated proxies to access the Orion server, while conversely, Agent communication within Orion can be configured to utilize a proxy server to reach an Agent that might not be accessed directly. These are possibilities you previously could only dream about when using SNMP.

 

 

AIX Agent Exclusive Features

 

There have been several Orion features released throughout the years which had previously only been available for nodes running other operating systems, such as Linux or Windows. AIX had largely been left out in the cold. That is, until today.

 

Network Interface Monitoring

 

In Server & Application Monitor 6.6, network interfaces on your AIX server can now be monitored without needing Network Performance Monitor installed. This functionality is available exclusively through the AIX Agent and does not count against your SAM component monitor usage or NPM element license count, in the event you also have NPM installed. That means this functionality is provided essentially free and potentially even allows you to free up some of those valuable NPM element licenses for other nodes in your environment.

 

Volume Performance Monitoring

 

Today, storage is the leading cause of server and application performance issues. Having visibility into storage volume performance, such as disk reads/writes per second, and queued disk I/O from within the Orion web console alongside other key performance indicators, allows you to isolate where performance bottlenecks are occurring on your server and which applications are affected. With the AIX Agent, you now have visibility into the storage volume performance, similar to those that you've grown accustomed to on your Windows and Linux volumes.

Total Disk IOPSDisk Queue Length

 

 

Real-Time Process Explorer

 

When applications aren't running right, inevitably there's a culprit. It may be the processes that make up the application you're already monitoring, or it might be with those you aren't. The Real-Time process explorer provides you visibility into all processes and daemons running on your AIX server, along with their respective resource utilization. It's like your web-based command center where you can quickly determine which processes are running amuck. No more firing up your SSH client, logging in and running 'topas' to troubleshoot application issues. Now you can do it all from the comfort of your web browser. Spot a runaway process or one that's leaking memory like a sieve? You can also terminate those processes directly within that same web interface. Simply select the process(s) you want to 'kill' and click 'End Process'. Voila! It's just that easy.

 

You can also now select processes you wish to monitor on your AIX server directly through the Real-TIme Process Explorer. To do so, simply select the process you're interested in monitoring and click 'Start Monitoring'. You'll then be walked through SAM's application template wizard where you can choose to add this process to one of your existing Application Templates or create a new one.

 

 

 

Reboot Management Action

 

If you find yourself in a situation where terminating processes alone does not resolve your application issue, there's always the tried and true reboot. While usually the option of absolute last resort, it's comforting to have it easily at hand when and if you've exhausted all other options. Simply click the 'Reboot' button in the 'Management' resource on the 'Node Details' view and you'll be prompted to confirm that you really mean business.

 

 

Application Component Monitors

 

Last, and unquestionably most important, are the wide array of various SAM Application Component Monitors supported by the AIX Agent. From these components, you can create templates to monitor virtually any application, commercial, open source, or homegrown.

 

AIX Agent Supported SAM Component Monitor Types

Directory Size Monitor

File Count Monitor

HTTPS Monitor

ODBC User Experience Monitor

DNS User Experience Monitor

File Existence Monitor

JMX Component Monitor

Process Monitor

File Age Monitor

File Size Monitor

Linux/Unix Script Monitor

SOAP Monitor

File Change MonitorHTTP MonitorNagios Script Monitor

SNMP Component Monitor

TCP Port Monitor

Tomcat Server Monitor

animelov

SWQL Walkthrough

Posted by animelov Employee Mar 13, 2018

Hi, all!  And welcome back to our discussion on the SDK and all things SWQL. So, at this point, we’ve briefly introduced SWQL, but today we’re going to get down to building queries with SWQL and talk about what you can make with them.

 

But first, let’s discuss why this is important. The Orion® Platform does an excellent job of giving you a single-pane-of-glass view for all of your products, while giving you the freedom to pick and choose which modules you wish to purchase. Because of this, the back-end database is fairly segregated, save for a handful of canned widgets and reports. For database experts, this would be done by waving a magic wand and using the correct incantations of “Inner join!” or “Outer Join!”  For the rest of this, SWQL can do the trick!

 

Now, what this is NOT going to be is a general guide on structured query language (SQL). This does require some level of knowledge of how to construct a basic query, but don’t be scared off just yet.  SWQL, let alone SQL, is not something I knew before I started, but I picked up quite easily.

 

Also, before we begin, if you haven’t picked up the latest SWQL studio, I highly recommend you do so and check out our most recent post here. This can be installed anywhere that has an Orion server connection, including your laptop.

 

Now that that is out of the way, let’s get to the meat of the subject. SWQL, in and of itself, is very similar to SQL, but with a few restrictions. For example, SWQL and its studio is read-only. There are SWIS calls you can make with it for API/PowerShell®, but we’ll get into that in a later post. For a quick rundown, check out this post.

 

If I wanted to see a list of applications that I’m monitoring on my SAM instance, I could write a query that looks like this:

select Displayname from Orion.APM.Application

 

And I go to Query à Execute (or F5),

 

that will get me an output that looks like this:

 

Pretty simple, right?  Let’s look and see what this does, though:

 

 

“select” and “from” should be self-explanatory if you know a little SQL. “select” means we want to retrieve/read information, and “from” is stating which table we’re getting that information from.  Orion.APM.Application is the table name (so, we’re getting information “from” “Orion.APM.Application”), and “DisplayName” is the title of the column we’re getting that information from.  Now, where did we get that column name from?  If you look at SWQL studio and find the table name and expand on it, you’ll get this:

 

 

Those icons with the blue icons next to them? Those are the columns we can pick from. For the other icons, check out our previous post here.

 

Let’s add some more to that query. If we want to see the status of the applications (up/down/warning/etc.), we can just add the status to the query, like so:

select Displayname, Status from Orion.APM.Application

 

This will give us:

 

More info on the status values can be found here, but, 1 is up, 2 is down, etc.

 

Now, what if we wanted to select ALL of the columns in this table to see what we get. Unfortunately, this is one of the first things that differs from SQL to SWQL that you cannot wildcard with *.  In other words:

 

select * from <tablename> does NOT work!!!

 

If you want all columns, you’ll have to select each one of them, separated by a comma. Luckily, SWQL will do this for you. If you right-click on the table name, you have the option of creating a general select statement:

That will generate a query for you with EVERY column possible for that table.

 

Pretty neat, right? Now, let’s get to the fun part of SWQL. One of the attributes of SWQL over SQL is its ability to automatically link related tables. If you are familiar with SQL, this is where things would normally get hairy with inner/outer join statements. Here we make it easier.

 

Let’s use our application example again. Having the list of applications is great, but to me, it’s nothing unless I know which node it is tied to. There is a node id in that application table, but it returns a number, which means nothing to me.  Remember those chain link icons from earlier?  Those are linked tables, and if you look, there is a link to the Orion.Nodes table:

 

 

To get the linkage to work, we first need to give the application table an alias. To do so, I just need to give it a name after the table declaration.  So, let’s call it “OAA,” which is short for Orion APM Application. Note: you can name it anything EXCEPT for a reserved word, like “select” or “from.”

select Displayname, Status from Orion.APM.Application OAA

 

Now, we need to make sure our columns are referenced to OAA by adding that to the beginning of the column names:

select OAA.Displayname, OAA.Status from Orion.APM.Application OAA

 

Finally we add the node name.  We can do this with the linked table from earlier by using dot notation.  In other words, if I write in “OAA.Node.”, I’m now allowed to pick any column from the node table, including the name of the node (or “caption”).  Now my query looks like this:

select OAA.Displayname, OAA.Status, OAA.Node.Caption from Orion.APM.Application OAA

 

And now, this is my output:

 

This is where things get interesting. Remember how I said that we can tie multiple products together?  The AppStack dashboard with Virtualization Manager and Storage Resource Monitor is extremely powerful, especially in terms of reporting. SWQL can help us with that.

 

If we keep going with our Application example above, we can continue tying information together from other tables.  So far, we’ve linked the nodes table, but let’s see what ESX host these belong to. From the Applications table, there isn’t a link to any of the VIM tables:

 

But it is related to the “Nodes” table, and if we look at the Nodes table:

Then we go to the Virtual Machines table and from Virtual Machines table…

… there’s the Hosts table!  So, let’s link that to our query using dot notation:

select OAA.Displayname, OAA.Status, OAA.Node.Caption, OAA.Node.VirtualMachine.Host.HostName from Orion.APM.Application OAA

 

Note, the host names that are NULL refer to machines that are not monitored via Virtualization Manager; they do not have a host.

 

That’s it for now. Later, we’re going to learn some more tricks for formatting with SWQL, and then how to apply this to Orion®. Stayed tuned for the next one!

We are happy to announce the release of SolarWinds® Traceroute NG, a standalone free tool that finds network paths and measures their performance.

The original traceroute is one of the world’s most popular network troubleshooting tools but it works poorly in today’s networks. You can read about its shortfalls in this whitepaper.

SolarWinds® fixed these shortfalls with NetPath, a feature of NPM.  People love NetPath but there are two problems.  First, NetPath takes a couple minutes to find all possible paths in complex networks, much longer than a quick tool like traditional traceroute. Second, most people don’t own SolarWinds® NPM and so don’t have access to NetPath.

Traceroute is too important of a tool to allow it to languish.  That’s why we’ve taken what we’ve learned with NetPath and fixed traceroute.  We call it Traceroute NG.

Traceroute NG is a super fast way to get accurate performance results for a network path in a text format that’s easy to share.

Compared to traceroute, TracerouteNG is:

  • Super-fast
  • Rarely blocked by firewalls
  • More accurate, thanks to path control
  • Updates latency/loss continuously
  • Detects path changes
  • TCP or ICMP

 

You can download Traceroute NG here and launch the tool by double-clicking the traceng.exe.

You’ll be presented with a help screen and the application will wait for your input. Type the domain name to start a trace.

 

You can also launch the free tool from Windows command prompt:

traceng www.google.com

Let’s look at some results.

 

Scenario 1: Endpoint is blocking TCP port

We all know that HTTP uses TCP 80 by default. What would traceroute show you if someone blocks that port on a firewall or webserver?

All good, it’s not the network. You know it’s not your issue. But what is the issue?  That’s where Traceroute NG will help:

Traceroute NG can mimic TCP application traffic, so packets are treated as the application traffic is. In this case it detected that port TCP 80 on the destination webserver is closed. You know, it’s not the network. But you can be more precise and tell your sysadmin to enable this port on his webserver.

 

Scenario #2: Network path change

To illustrate this scenario, I have created a simple network using GNS3.

 

I also have a loopback adapter configured, to point all IPv6 traffic to this lab:

 

I’d like to trace from my machine in Cloud1 to the PC (fc90::3). If the OSPF routing works, I should go through routers R1, R7 and R3. Traceroute confirms:

 

Traceroute NG as well:

 

What if I do maintenance on router R7? Will traceroute tell me, when router R7 becomes unavailable and detect the new path? No. It runs once and then you need to run it again. Manually.

With Traceroute NG, detecting a change is simple. You can tell Traceroute NG to warn you if the path changes and optionally log the output. An example command would be: -a warn -l -p 23 fc90::3

And this is the result:

 

So you know that your router is down and once you hit enter, Traceroute NG will show you the new path. In the GNS3 lab we expect the new path will go through R1, R5, R2 and R3. Traceroute NG confirms:

 

And the log file as well, showing you original path and the new path:

In this use case, we have leveraged several features of Traceroute NG. First, it runs continuously. Second, it detects, when a path is no longer available. Third, it can log results in a text format, that’s easy to share.

Now, enough boring reading, it’s time to try it out! You can download Traceroute NG here: https://www.solarwinds.com/free-tools/traceroute-ng/

 

We’re super excited to share this tool with the world and hope you find it useful.  Let us know your thoughts!

INTRODUCING SOLARWINDS BACKUP

February 6, 2018 is a historic day. 

After months of planning, collaboration, and late night efforts, the journey of SolarWinds Backup is ready for primetime!  We are pleased to announce that SolarWinds Backup is officially launching and Generally Available today to our core market of IT professionals.  This launch is one more way we’re driving even more innovation in our Orion portfolio of products, and backup is a natural fit with our existing systems management capabilities.  You rely on SolarWinds for comprehensive monitoring of your servers and applications, for remote control and administration of these assets, and now, we can help you solve your backup and recovery challenges as well.

 

While most in-house IT departments have some type of backup solution, there is a lot of dissatisfaction with the options they currently use.  In November, we conducted a survey on THWACK, and heard from our customer base that:

  • Backups are too expensive, and managing them is too complex
  • Reliability of backups is an issue, requiring time-consuming manual checking
  • Managing and forecasting local storage requirements for backups is painful

 

You can read more on the full survey results here: Is Unnecessary Complexity Making Backups a Headache For You?

 

SolarWinds Backup changes the game by bringing a simple, powerful, and affordable alternative to the market.  It’s a modern, fully managed, cloud-first backup service, already chosen by thousands of managed service providers, now packaged for the enterprise and direct use by IT professionals.

 

It's tough to consume an entire product in a single blog post, so this inaugural posting will provide the highlights of our new hotness!  In future posts, we'll follow up with more "How-To" guidance on specific topics and get you rapidly backing up your IT environment in minutes.  If you are chomping at the bit, feel free to browse our new SolarWinds Backup forum on THWACK with the latest in release notes, documentation and step-by-step training videos.

 

THE ARCHITECTURE

There are two primary interfaces you'll find when you start your journey with SolarWinds Backup, and generally you should be able to use them without training or instruction.  As you move into more advanced topics, we will provide you with the resources you need.   On day 1, two parts are all you need to know to get started.

Part 1: The Console: http://backup.management

Deploy, monitor, report, manage users, create profiles, and remotely control all your backups from this unified, self-service console.

the best console in backup

Part 2: Backup Manager (the agent)

The Backup Manager agent does all the heavy lifting on protected end-points (i.e., devices, servers, instances).  The key functions of the agent are to scan data, find block level changes, deduplicate, compress, encrypt, optimize data file size for speed, and much more all behind the scenes.

 

 

UNDER THE HOOD

TrueDelta technology

SolarWinds Backup includes a unique TrueDelta technology that tracks block-level changes between backups, so you only back up (or restore) what has changed – not the entire file. This keeps backup windows short, only transmits the minimal amount of changed data to be backed up over the network, and improves performance overall.

 

Direct-to-cloud backup

The SolarWinds Backup services were designed from the ground up for fast, efficient, remote backups. You can skip the hassles of configuring local and remote backups, storage provisioning, and storage capacity planning. Instead, your backups go safely to our global purpose-built private cloud, with backup windows measured typically in minutes, not hours.  Manage your storage pool and capacity intuitively from the Console.

 

End-to-end security

Encryption is built into our backup process. Backup data is encrypted at the source, stays encrypted while in transit, and while at rest.

 

Single, unified management console

Protect physical, virtual, and cloud servers, including all major operating systems and hypervisors, with a single product. One unified web-based dashboard shows you backup status at a glance, and frees you to check systems, and even do restores, from any location – even from your mobile device.  Do you have the need for backing up workstations, laptops or just certain documents?  Not a problem, as our solution covers these capabilities also!

 

Recovery options

Whether you need to recover an entire server or VM, an application, or just a portion of a file, SolarWinds Backup handles it.

  • Recover at WAN speed or LAN speed, using the optional Local SpeedVault™ 
  • Physical-to-virtual and virtual-to-virtual recovery automates a full system restore to VMware vSphere® or Microsoft® Hyper-V®

  • Bare-metal recovery simplifies and reduces recovery time for Windows® servers, and can be used for migration to new hardware
  • Cloud recovery targets provide even more flexibility by allowing recovery to Microsoft® Azure®, or to any other virtual environment of your choice

 

Secure remote storage for your backup data, worldwide

SolarWinds Backup provides world-class storage for your backup data with around-the-clock security in our data centers located on four continents. For the Backup Services, our data centers’ certifications meet requirements for HIPAA compliance and similar legal and regulatory standards. The Backup Services are scalable and can grow with your business.

 

LICENSING AND ROI

Pricing for SolarWinds Backup is straightforward and based on an annual subscription with tiers based on the number of operating system instances (servers) being protected. Each tier includes a block of cloud storage, so there are no extra charges or hidden costs.

SolarWinds Backup is the simple, powerful, affordable option.  The product’s simplicity and ease of management translate to even greater savings, as personnel do not need extensive training or certification. There is no need to buy expensive local storage to support backups, or even to pay for a separate contract for cloud provided storage.  It’s all included.

 

WHAT'S INCLUDED

BACKUP

 

File System / System State (Microsoft)

Windows Server® 2008/ 2008 R2/ 2012/ 2012 R2 & Windows SBS 2011, Windows Server 2016.  Windows Vista/ 7/ 8.x/ 10

File System (GNU/Linux)

CentOS® 5/ 6/ 7, Debian® 5/6/ 7, OpenSUSE® 11/ 12

File System (Apple)

Mac® OS X® 10.9 Mavericks/ 10.10 Yosemite®/ 10.11 EL Capitan

Network Shares

Remotely protect network shares and Network Attached Storage (NAS) devices

Application Protection

Microsoft Exchange 2007/ 2010/ 2013/ 2016, MS SharePoint® 2007/ 2010/ 2013

Database Protection

Microsoft SQL Server® 2005/ 2008/ 2012/ 2014/ 2016, MySQL 5.0/ 5.1/ 5.5/ 5.6, Oracle® Database Standard Edition 11g for Windows

Open File Protection

Leverage Microsoft Volume Shadow copy Services (VSS) for open file and application aware backups

Pre/ Post Backup Scripts

Back up third-party application and databases through custom scripting

Backup Filters & Exclusions

Exclude specific extensions, files, paths, or volumes from File System and Network Share backups

Backup Scheduling

Automate protection with multiple recurring backup schedules

 

 

RECOVERY

 

Search and Restore

Individual files across all recovery points from Backup Manager (excluding VSS plugin)

Self Service Restores

Use Virtual Drive technology to present historic backup sessions as a browseable file system

Application Restores

Full application- and database-level restores

Continuous Restore

Use the Recovery Console to automatically create / update standby images or remote recovery copies of selected data

Physical to Virtual (Source)

Windows Vista®/ 7/ 8.x/ 10, Windows Server 2008/ 2008 R2/ 2012/ 2012 R2/ 2016 & Windows SBS 2011

Physical to Virtual (Virtual Disk Target)

Create (.VHD/X or .VMDK) files for use in the virtual environment of your choice

Physical to Virtual (Hypervisor Target)

Microsoft Hyper-V Server 2008 R2/ 2012/ 2012 R2 (Hyper-V 2.0 and 3.0) VMware vSphere (ESXi) versions: 4.1, 5.0, 5.1, 5.5 and 6.0.

Physical to Virtual (Cloud Target)

Microsoft Azure is supported

Recovery Testing

Automated recovery with email confirmation and screenshots (VMware and Hyper-V)

 

Bare Metal Recovery (BMR)

Create bootable CD or USB media to recover systems without a reliable OS. Supports dissimilar hardware, dissimilar drives, single pass application recovery (VSS) and granular file / volume selections

 

 

MANAGEMENT

 

Web-based Management Console

Single view to monitor and manage all resources from anywhere

User Defined Roles

Multiple roles with defined permissions

Reports / Alerts

Custom views, Daily dashboards, consolidated backup reports, real-time alerts, and disaster recovery

testing

Audit Logs

User-accessible log of changes made on protected device

Remote Management & Control

Remotely launch the Backup Manager directly from the management console

Automated Deployment

Silent install, command-line options, and/or use your favorite software deployment tool.  A single command is good for the unlimited number of installations.

Remote Commands

Allows for remote operating commands for a device or a group of devices

API / Command Line

Simple Object Access Protocol (SOAP)

Automatic Updates

Monthly scheduled updates with version control

File Versioning & Retention

Three (3) versions minimum; 90-day retention.

Data Archiving

Extend beyond the standard retention model and help ensure compliance with multiple data archiving policies

Password Protection

Password protection for backup data to restrict changes or limit functions to restore only

Industry Compliance

Helps you meet compliance requirements, such as HIPAA, SOX, PCI-DSS and others, in regard to encryption  through our SSL for backup data - in-motion and at-rest

Encryption Key Length

AES 256-bit encryption 

Data Center Security

Backup data stored in one of (7) SSAE16 SOC-1 Type II and ISO-27001 certified private data centers worldwide

Data Center Locations

USA, Canada, United Kingdom, Netherlands, Germany, Italy, Switzerland, Norway, France, Spain, Australia, and South Africa.

 

 

WHAT ELSE?

Don't see what you are looking for here?

 

Visit the SolarWinds Backup Forum

Check out the What We're Working On for SolarWinds Backup post for what our dedicated team of backup developers and code jockeys are already looking at. 

If you don't see everything you've been wishing for there, add it to the SolarWinds Backup Feature Requests.

Greetings! And welcome to the second in our series of primers on customising the Orion® Platform. Today’s post will focus on first installing and then navigating through SWQL Studio. This free utility is simple to use, but can drastically lower the barrier to creating and editing SWQL queries.

Help Resources

Before we go any further, it’s worth highlighting two key resources for using the SDK.

  • The GitHub®
  • And the THWACK®

 

The GitHub site is the main resource, where you can download the installer, browse code samples, and review the schema and wiki for the SDK itself. It’s also where issues are tracked, so if you do find a bug you can flag it as an issue there.

The other main resource is the Orion SDK forum here on THWACK. SolarWinds does not provide pre- or post-sales support on any customisations, including code using the Orion SDK. However, that does not mean you are alone. You can often find code samples that may address your use case by simply searching the forum. And the site is also frequented by not just SolarWinds staff and THWACK MVPs, but other users who can also provide feedback and guidance on code that you may be working on.

Installing SWQL Studio

SWQL Studio is a Windows®-based utility, and is included with the Orion SDK. The installation is simple. Just navigate to the “releases” area of the GitHub site (https://github.com/solarwinds/OrionSDK/releases) and download and install “OrionSDK.msi” on your Windows workstation.

Once installed, connect it to your Orion server (of course, you will be working on your special dev server).

 

   SWQL Studio

 

Navigating through SWQL Studio

The connection itself is made to SWIS on the server (the same Information Service covered in the previous post). By accessing via SWIS, instead of SWQL itself:

  • Separate logins to the actual SQL database itself are not needed
  • Account limitations are still applied
  • Any changes to the database schema are abstracted from the user

Once connected, you will see the server you connected to, as well the credentials you used, and if you expand on this connection icon, you will see the various data sources represented. Note, the list of objects will vary depending on the modules installed and the versions of those modules.

 

 

To the right of the object list you will see an editor screen. From here, you can type in SWQL, and execute the query to see the results. Running the query is very simple:

  • You can click Query-> Execute in the menu
  • Or simply hit F5 or Ctrl-E

In the below example, I typed out a simple query. We will talk more in the next post about how to write queries. I just want to draw your attention to the fact that in this latest version of SWQL Studio (2.3.0.123), autocompletion is also possible.

 

A lot of the power of SWQL Studio however, lies in the fact that the necessary data can be viewed graphically.

Let’s expand on the Orion item in the list:

 

 

Then scroll down to Orion.Nodes, where we will see the resource referenced in the hand-written SWQL statement I used previously. If I expand this resources I will then see all of the properties of this Orion.Nodes object, two of which I had referenced in the aforementioned SWQL statement (caption, and IP_Address).

 

 

A key difference between SQL and SWQL is that there is no equivalent to select* in SWQL. However, just by right-clicking on the entity within SWQL studio, you can “Generate Select Statement,” which will populate the editor window, with all the fields populated.

 

 

Another useful aspect is the ability to identify the property type by its icon. To explain a bit further, let’s refer to Orion.Syslog as an example.

 

 

 

First up, we have the keys, which are indicated by the golden key icon. These are essentially the unique IDs used to reference each individual record in the data source.

 

 

The most common properties are the standard fields, indicated as blue rectangular prisms:

 

 

 

Whereas properties that are inherited are display as green cubes:

 

 

 

A very interesting aspect here is that related entities (depicted with the chain icon below) are also shown here. That means you can see which of these can be used to implicitly join data, without having to write an explicit join.

 

 

And as a final point, verbs (represented as pink prisms) that can be used in scripts are listed here, as well the parameters needed for each verb. We will cover this very cool feature in a future post.

 

 

 

Wrap Up

  Now that we have taken a solid look at SWQL studio, you should be in an excellent position to dive in and look at the data available within Orion, for use in reports and scripts. Not only does it allow you to easily create SWQL that you can use for various purposes, it also allows you to parse the entities within the database, including related objects.  With the next post, we will actually work with SWQL queries, and some of the constructs that differentiate SWQL from SQL. And in follow-up posts, we will see how the verbs can be used to provide management capabilities from

If you haven't enabled NBAR2 in your routers, you're not getting all that Netflow offers.  You're missing the Application data that's passing through your L3 interfaces.

 

And you're probably getting Alerts from NTA, telling you that it's receiving Netflow data that's missing NBAR2 information from an NBAR2-compatible device.

 

There are at least four places you'll see that Alert.  One is at the top of your Main NPM page, with the white alarm bell and a red instance counter.  Click on it and you can see the alerts:

 

 

A second place you'll see these errors is in the Events page:

 

 

A third place you'll find it is on the NetFlow Traffic Analyzer Summary page, if you have added in the "Last XX Traffic Analyzer Events" Resource

 

A fourth place it appears is in the main NPM page for an L3 device's Node Details / Summary:

 

Obviously, Solarwinds thinks not getting your full NBAR2 information is pretty important.  Nobody needs unnecessary alerts, and it's easy to change a router to use NBAR2.  Just do it.

 

 

While I was cleaning up configurations on routers or L3 switches that originally had "plain" NetFlow, and that needed NBAR2 settings added.  I thought "Maybe someone on Thwack could benefit from this information."  I built this "before & after" comparison of their configs so you can see the extra commands needed:

Items in yellow are not part of the original Netflow "non-NBAR2" config on the left.  Don't be thrown off by different Flow Names--they're just names, and can be whatever you want, as long as you follow the right syntax.  Solarwinds puts some GREAT technical support links into their product that bring you right to the information you need to build Netflow properly.  Use them and you'll be happy.

 

If you have a router or L3 switch that's missing NBAR2 info, you won't be able to edit the existing Netflow settings until you remove the "ip flow monitor" statements (left column, bottom section) from every interface on which they are installed.  But once you take them out, it's easy to just remove all the old flow settings completely using the "no" command, and then you're starting with a clean slate.

 

After the old Netflow commands are removed, I can  edit the right column's "destination x.x.x.x" to point at the APE I want receiving the Netflow NBAR 2 data, and then paste the entire column into the router--EXCEPT for the bottom two lines:  "ip flow monitor NTAmon input" and "ip flow monitor NTAmon output".  Those lines must be inserted into the L3 interface(s) on the router or L3 switch.

 

You might want to only monitor Netflow NBAR2 data on the North-South interfaces going upstream to a Distribution or Core switch.  Or you might want to catch North-South AND East-West Netflow NBAR2 data by putting flow monitor statements on all sub-interfaces or VLAN interfaces (SVI's).

 

Once you've completed your work, instead of seeing nothing in the "Top 5 Applications" area on any L3 device's NPM Device Summary page, you'll start seeing data being added every ten minutes.  Data that tells you what applications are using that interface's bandwidth.  And that can be the secret ingredient to finding a bandwidth hog and correcting it!

 

Greetings All!

 

On the SolarWinds Sales Engineering team, my colleagues and I often get requests from customers regarding how to do something custom, whether it is simply viewing certain data about a node on its node details page, or perhaps it is something more complex, such as automating putting devices into maintenance mode, as part of a workflow, or used to create runbooks.

 

Over the coming weeks we will have a series of “primers”, to equip you with the skills needed to create and adapt scripts & queries, within the Orion® Platform.  If you need to address any of these use cases, or similar, then this is the series for you!

  1. When you need to include information that’s not covered in an out of the box Orion report
  2. If you need to automate the addition of node to Orion for monitoring, as part of an onboarding process for new VMs
  3. If you require usage metrics for particular devices, so you can chargeback to other departments or customers

 

To begin with, we will introduce some of the terms and concepts involved, starting with some architecture basics, and building through to more hands on examples, looking at custom reports & scripts.

Topics will include:

  1. Intro to API, SDK, & SWQL
  2. SWQL studio
  3. SWQL Walkthrough
  4. Examples of SWQL in reports/alerts/web/ etc
  5. Automating Orion using PowerShell®
  6. Automating Orion from Linux® & some bonus tips ‘n tricks

 

The overall goal here is enable you to work through, and find solutions for your particular use cases. What this series will not be is:

  • An introduction to SQL
  • An introduction to scripting/programming
  • Pre-built solutions for every custom use case

 

We want to help you help yourself.

 

Read First!

Before we look at a single piece of code or query, let’s just take a moment to cover some important housekeeping. As with any customisation, especially when scripting there is always a possibility that things may go wrong. While automating manual processes is a most excellent endeavour, accidently deleting all your nodes is not! So before you begin working on any customisation, let’s just take a moment to cover a few simple best practices.

  • Set up a dev instance of the Orion Platform for experimentation
  • Don’t try untested scripts on production systems
  • Make a backup of your Orion database

 

So, with that, let’s get on with the show.

 

Terms and Concepts

First up we will introduce a few terms. If you are an advanced user you can probably skip ahead at this stage, but if new to writing queries and scripts, having a strong understanding of these at the very beginning can save a lot of hardship further on down the line.

 

SolarWinds Query Language (SWQL). SWQL (pronounced “swick-le”) is essentially a read-only subset of SQL with some SolarWinds conveniences added, and will be core to many of the topics that will be covered in the following posts. The third post in our series in particular will dive into SWQL in more detail, but at this stage we will look at some high-level points.

Application Programming Interface (API). In software development terms, an API is can be thought of as the access point for one piece of software to access another. In an N-tier application it allows different parts of an application to be developed independently. Orion, for example is N-tier, and web, polling, reporting, and coordination components communicate via service layers.

In the context of Orion, the API is what allows to read data using SWQL, as well as adding, deleting and updating data “invoking” commands (which we will examine in more detail in our 5th and 6th posts.)

SolarWinds Information Service (SWIS). The actual implementation of the API within the Orion Platform is embodied as SWIS, which manifests a Windows® service, the SolarWinds® Information Service.  It is via SWIS that other Orion Platform products (such as Network Atlas, Enterprise Operations Console (EOC) and Additional Web Servers) communicate. It is also via SWIS that various scripting and programming technologies can be used to access Orion.  From a technical perspective, it can be accessed over two ports:

  • 17777 – net.tcp: high performance but Microsoft® only-
  • 17778 – JSON or SOAP  over HTTPS - interoperability with other programming languages

 

Software Development Kit (SDK). An SDK is a set of tools and libraries, provided by a vendor, to allow others to more easily consume their API. In relation to Orion, the Orion SDK can be installed on Windows, and provides not only the files needed to use PowerShell scripts, but also includes SWQL Studio, which can be used to build custom SWQL queries and visually browse the available data. It is worth noting that since it’s possible to access the API using REST, you don’t need to have the Orion SDK deployed. Our next post will cover installing the SDK, and some tips for its use.

 

Intro to SWQL

SWQL can be hand-written, or more commonly, the SWQL studio can be used to generate queries. For simplicity, at this early stage, it’s worth noting that constructs from standard SQL such as

  • Select x from y
  • Where
  • Group by
  • Order by
  • Join
  • Union

 

All exist in SWQL, along with functions such as

  • SUM
  • Max
  • Min
  • Avg
  • Count
  • Isnull
  • Abs

 

A key point to note here however, is that update, insert and delete are not supported via SWQL itself. Those use cases are supported outside of SWQL and will be covered at a later point.

A major differentiator however is that SWQL automatically links many related objects without joins. This makes writing queries much simpler and more efficient.

 

For example, if we want to select the caption of the nodes in an Orion instance, and also list the interface names for each interface on those devices, using traditional SQL we would end up with something similar to

 

SELECT TOP (5)

    N.[caption]     

      ,[InterfaceName]

       FROM [Interfaces] I

       left join [Nodes] N on N.NodeID = I.NodeID

 

Running this would output

Caption                InterfaceName

ORION11            vmxnet3 Ethernet Adapter

mysql01              eno16777984

mysql01              lo

mysql01              eno16777984

bas-2851.local   VoIP-Null0

 

With SWQL, this simply becomes

SELECT TOP 5 Caption ,N.Interfaces.Name

       FROM Orion.Nodes N

 

Gives the same results! Moreover, because it’s read-only, you cannot really break anything.

 

Wrap Up

With today’s post we’ve laid the foundations of the customizing the Orion Platform. We’ve identified some use cases where the API can be used to both read information from, or make changes to your Orion Platform  And to make the series “real”, we’ve seen a short SWQL example, that gives a good introduction to the power of using SWQL over SQL within the Orion Platform.  In the next post we will begin to get hands-on, by installing and navigating through, the Orion SDK. But in the meantime, you can discover more about the topics covered in the SolarWinds Lab episode SWIS API PROGRAMMING CLASS.

Companies are moving their email to cloud in droves.

Let's face it, administering Microsoft Exchange is one of those jobs that when everything goes right, no one knows you exist.  And when things go wrong, everyone knows you exist. The good news is that many companies are offloading their Exchange to Microsoft through the use of Microsoft Office 365.  If you doubt that Office 365 is big, consider that in July of this year Office 365 online workplace tools brought in more revenue than the traditional version of Office that’s installed on people’s computers. When you think about it, e-mail server replacement is the perfect SaaS application.  It's well defined without huge deviation from one organization to the next, scales well across multiple servers, needs to be accessible from anywhere and often needs permanent retention of records.  All things that the cloud is good at.

 

Moving to the cloud means I'll never have to worry about email again, right?

It's important to remember that while moving to the cloud alleviates your responsibility for the servers that run e-mail, you still are responsible for monitoring the e-mail itself and your company's connectivity to the cloud.  Monitoring cloud-based applications is different than monitoring on-premises applications.  Where you may have been concerned with memory and disk capacity on your servers, or server-to-server communication in the past.  Those are not concerns with SaaS.  But some potential issues still exist.    Here are just a few of the metrics you may need to be concerned with in an Office 365 environment:

 

  • Portal Access - Rather than server availability, it's important to know portal availability. This includes the user portal, the administration portal and the billing portal.  These may each be used by different users in your company but are all important.
  • Forwarded Exchange Users -  Are these mailboxes really necessary?  Are they violating company or government policies? What if a healthcare worker is forwarding messages containing patient information to a personal account, for example? 
  • Inactive Exchange Users - While sometimes you may keep a user's mailbox for a period of time after they are gone, sometimes you just forget to delete them and are paying for unneeded accounts.
  • Groups Accepting External Email - Do you really want external entities to be able to bulk mail these groups?
  • Top Senders - This is a handy metric for telling if your accounts have been hacked and are being used by spammers.
  • Administrative Roles - Did the number of administrators change unexpectedly? 
  • License Usage - Get a handle on how quickly your license usage grows.  How many licenses are being used?  What percentage of my total?  You still need capacity planning for SaaS, just a different type of capacity.
  • Last Password Change - Number of users with a password that is 90 days old or more.  How many users have a password that never expires?
  • User Mailbox Security - How many users have access to a large number of mailboxes?  Should they? 

 

Earlier this year, in collaboration with Loop1 Systems, we developed a set of templates for Microsoft Office 365 to monitor these and much more.  The templates have been very popular with customers, but there are a few things you can do to improve their implementation and function. Since these templates monitor Software as a Service, they aren't exactly like other templates that we typically provide.

 

Microsoft Office 365 is Software as a Service and it doesn't run on any of your servers.  What node should you apply the template to?

Since these templates are PowerShell scripts that run against a Microsoft URL, the best solution is to create an external node and apply the templates to it.  You can use "outlook.office365.com" as the node.   This is the URL for the mail API requests.  Technically for the Portal, Subscription, Security Statistics and License Statistics templates the scripts use "api.admin.microsoftonline.com", but splitting the Office 365 templates between two nodes can be confusing and forces the SAM user to understand which components of the service reside on each node.

 

You can also use an ICMP node rather than an external node

External nodes don't report status.  By using an ICMP node, you will get a rudimentary status indication on the node icon based on a ping of the URL.  External nodes give no status and always display a purple "arrow" icon without status.  However the URL "api.admin.microsoftonline.com" doesn't seem to respond to ping requests so it will always appear to be down if you point an ICMP node there.  Here is the external icon vs. the ICMP node icon.

Get a real picture of Office 365 availability with NetPath

Another way to determine the responsiveness of the Office 365 application is to set up a NetPath service for "outlook.office365.com".  If you have NetPath, you can use it to get a detailed view of the bottlenecks between your site and the application portal.

 

Improving responsiveness to queries by polling less frequently

Depending on the number of mailboxes in your environment and the number of templates implemented, you can experience throttling of your API requests from the Office 365 API.  If you are throttled, the choices are to either run less component monitors or reduce your polling frequency on some templates.  Most users can actually reduce the polling frequency substantially on most or all Office 365 templates since the majority of the metrics don't change frequently.  One thing to keep in mind is that if you want to ensure enough data points to avoid gaps in history, you might want to use less than an hour for your polling frequency, so try setting the frequency to 1200 (20 minutes) rather than the default of 300 (5 minutes). If you want to know more about Microsoft API throttling, see Avoid getting throttled or blocked in SharePoint Online | Microsoft Docs for a description.  The article is about Sharepoint but the concept is the same for Office 365.

 

I don't like the output of the detailed data from the templates.  Can I make it more readable?

The data comes back from the API in a comma-delimited format which is great for programming but not so readable.  To make the data more readable, you can modify your own copies of the scripts as follows:

Replace:

[string]::Join( ", ", $users) 

With

[string]::Join( "< br/>", $users)

NOTE: You should be aware that this modification is injecting HTML directly into the output from the PowerShell script.  When viewed on the SAM console it will display correctly.  However, this change could create unexpected results in other areas of SAM that are not displayed on a web server, such as reports.

 

Comparing Exchange 2013/2016 templates with the Office 365 template.  They are both Exchange, why are they so different?

Since Office 365 is SaaS, many of the metrics in our previous Exchange templates is either not available or not meaningful.  Metrics like disk I/O and disk latency aren't available for a cloud service where the hardware is abstracted away from the user.  Similarly attempting to monitor processes and services on the hosts is not possible.  Primarily with Office 365 we monitor application data, which is available through the Office 365 API.

 

There was a MAPI round trip template available for Exchange.  Can I run this template against Office 365?

The MAPI round trip template was intended to check connectivity between multiple Exchange servers.  Since Office 365 is SaaS, you don't control the physical servers that are used for your accounts.  With cloud-based applications, you should check connectivity between your network and the Office 365 website.  You can get a sense of this connectivity by using the portal templates and the ICMP option discussed above.  Also as mentioned above, you can use NetPath to show the actual path your connections take to Microsoft.  Another option is to use Web Performance Monitor to record a typical mail transaction and get perspective on each part of the session.

 

A comprehensive approach to monitoring Microsoft Office 365

Hopefully, this post has given you some ideas about why and how to monitor Office 365. SolarWinds offers many tools to help you from SAM templates to network tools to user simulation.

Filter Blog

By date: By tag:

SolarWinds uses cookies on its websites to make your online experience easier and better. By using our website, you consent to our use of cookies. For more information on cookies, see our cookie policy.