Skip navigation

Storage Manager relies on querying the array vendor's SMI-S providers for any and all data pertaining to the array. The Storage Manager development team researches new metrics that are added by the vendors in the new versions of their providers in order to expose these to our end-users through the Storage Manager GUI. As I mentioned in the release blog post about Storage Manager 5.3.1, we added a ton of new metrics for IBM SVC, EMC VMAX (including Latency!), and EMC VNX. I'd like to take a little time in this post to outline these new metrics.

 

IBM SVC

 

The IBM SVC architecture presents a front-end interface to virtualize the storage arrays that are sitting behind the SVC. Key to making this architecture work is the ability to efficiently manage the cache, especially the write cache. This enables hosts to get acknowledgement that a write has completed without needing to wait for that write to be destaged to back-end storage. Monitoring the efficiency of the cache is therefore important to managing the overall responsiveness of the entire SVC storage system.

 

Cache Performance Metrics

 

These cache metrics can all be found in the Node Performance report under the Performance tab

SVC1.PNG

 

The following statistics are collected for the overall cache on a per node basis:

 

  • Cache CDCB's - Demote Ready List - Current count of the number of cache directory control blocks (CDCB’s) on the demote ready list
  • Cache CDCB's - Global Copies List - Current count of the number of CDCB’s on the global copies list
  • Cache CDCB's - Modified List - Current count of the number of CDCB’s on the modified list including cache partitions
  • Average Write Cache Destage Latency - Write cache destage latency average in milliseconds for the statistics collection period
  • Lowest Write Cache Destage Latency -  Lowest write cache destage latency in milliseconds for the statistics collection period
  • Highest Write Cache Destage Latency - Highest write cache destage latency in milliseconds for the statistics collection period
  • Average Prestage/Readahead Latency - Prestage/read-ahead latency average in milliseconds for the statistics collection period
  • Lowest Prestage/Readahead Latency - Lowest prestage/read-ahead latency in milliseconds for the statistics collection period
  • Highest Prestage/Readahead Latency - Highest prestage/read-ahead latency in milliseconds for the statistics collection period
  • Average Read Cache Stage Latency - Read cache stage latency for read miss IO’s average in milliseconds for the statistics collection period
  • Lowest Read Cache Stage - Lowest read cache stage latency for read miss IO’s in milliseconds for the statistics collection period
  • Highest Read Cache Stage - Highest read cache stage latency for read miss IO’s in milliseconds for the statistics collection period
  • Average Cache Fullness - Average cache fullness in percent for the statistics collection period
  • Lowest Cache Fullness - Lowest cache fullness in percent for the statistics collection period
  • Highest Cache Fullness - Highest cache fullness in percent for the statistics collection period
  • Average Write Cache Fullness - Average write cache fullness in percent for the statistics collection period
  • Lowest Write Cache Fullness -  Lowest write cache fullness in percent for the statistics collection period
  • Highest Write Cache Fullness - Highest write cache fullness in percent for the statistics collection period
  • Average Data Transfer Latency - Data transfer average latency in milliseconds for the statistics collection period.
  • Lowest Data Transfer Latency - Data transfer lowest latency in milliseconds for the statistics collection period.
  • Highest Data Transfer Latency - Data transfer highest latency in milliseconds for the statistics collection period.
  • Average Track Access Latency - Begin track access average latency in milliseconds for the statistics collection period.
  • Lowest Track Access Latency - Begin track access lowest latency in milliseconds for the statistics collection period.
  • Highest Track Access Latency - Begin track access highest latency in milliseconds for the statistics collection period
  • Average Track Lock Latency -  Track lock average latency in milliseconds for the statistics collection period.
  • Lowest Track Lock Latency - Track lock lowest latency in milliseconds for the statistics collection period.
  • Highest Track Lock Latency - Track lock highest latency in milliseconds for the statistics collection period.

 

CPU

 

Monitoring the CPU on any array can be important to understand if the array itself can manage the load it is being put under by the hosts. The CPU on the IBM SVC will always read 100% if monitored from the OS because it spends its spare cycles scanning the Fibre Channel fabric. IBM has presented the following counter that allow you see how much time the CPU is actually spending servicing IO requests.

 

CPU Utilization - This statistic reports the pseudo-CPU utilization. It can also be found under Node Performance with the Cache Performance statistics.

 

Port Performance

 

1-28-2013 5-59-52 PM.png

The following stats are reported per nodes for each of the 4 ports of an SVC node and can be found under the Port Performance report under the Performance tab:

 

Commands Initiated to Controllers - Commands initiated to controllers (targets) [always zero but provided for completeness]

Commands Initiated to Hosts - Commands initiated to hosts (initiators)

Commands Received from Controllers - Commands received from controllers (targets) [probably always zero but provided for completeness]

 

The following stats are provided primarily for debug of suspected fabric issues. Each of the statistics below (except for Zero Buffer-Buffer Credit Timer) are the cumulative number of occurrences since last node reset.

 

Link Failure Count

Loss-of-synchronization Count

Loss-of-signal Count

Primitive Sequence Protocol Error Count

Invalid Transmission Word Count

Invalid CRC Count

Zero Buffer-Buffer Credit Timer - This is reset after each collection interval. This should be shown as a percentage of the interval time, so (bbcz/(interval in microseconds))*100

 

MDisk Group Performance and Utilization

 

 

IBM SVC MDisk groups are a collection of MDisks that are all tied to a set of associated VDisks. They are much like Storage Pools in other array architectures. Understanding the load on the entire MDisk group can be incredibly important when tracking down issues in your storage environment. This is because although one VDisk may not be busy, it can easily be starved by a "noisy neighbor." Without a full understanding of what MDisks (and VDisks) are sitting on shared resources, it can be nearly impossible to make those connections.

1-29-2013 10-33-23 AM.png

In Storage Manager 5.3.1, we created new performance reports specifically for reporting on the MDisk Group. Within the MDisk Group Performance chart you can report on all of the standard performance metrics for that group including:

  • Overall Response Time
  • Read Block
  • Read Blocks/Sec
  • Read IOs
  • Read IOs/Sec
  • Read Response Time
  • Total IOs
  • Total IOs/Sec
  • Worst Read Response Time
  • Worst Write Response Time
  • Write Blocks
  • Write Blocks/Sec
  • Write IOs
  • Write IOs/Sec
  • Write Response Time

 

We also added a specific report and charts for MDisk Group Utilization to track capacity information for the MDisk Group.

 

1-29-2013 10-46-45 AM.png

1-29-2013 10-47-22 AM.png

The MDisk Group Utilization reports allow you to track the following key capacity metrics:

  • % Free
  • % Used
  • Free (GB)
  • Total (GB)
  • Used (GB)

 

For a detailed overview of all the performance statistics provided by the IBM SVC SMI-S Provider, please see this document provided by IBM.

 

EMC Symmetrix VMAX

 

For VMAX, we added metrics for Device Performance and for Disk Performance.

 

Device Performance

 

One concern we've received from customers in the past was the inability to get latency metrics from VMAX LUNs. As of Storage Manager 5.3.1, we now incorporate read and write latency metrics for LUNs. These do not yet roll up into a top-level metric in the Top 10 LUNs report on the Main Console, but we will be looking at addressing that in future releases.

 

1-29-2013 12-37-42 PM.png

The two latency metrics we've added for read and write latency are:

  • Samples Average Writes Time (ms) and
  • Samples Average Reads Time (ms)

 

Hopefully, customers will find this information extremely valuable as they troubleshoot latency problems on their VMAX arrays.

 

Disk Performance

 

1-29-2013 12-30-12 PM.png

In addition to the existing I/O metrics, we added these metrics:

  • MB Read
  • MB Read/sec
  • MB Written
  • MB Written/sec
  • MB Transferred
  • MB Transferred/sec

 

EMC VNX

 

We actually did quite a bit of work to add new metrics for VNX FAST Cache in Storage Manager 5.2.4. However, because 5.2.4 was a service release, we didn't make a lot of noise about the new metrics. This doesn't mean that they are unimportant though! VNX Fast Cache is an incredibly useful technology for helping to improve the overall responsiveness of your storage subsystem and therefore it's important to know how your cache is performing. FAST Cache metrics added in Storage Manager 5.2.4 include:

 

FAST Cache Metrics

 

Asset Info Report - Cache High/Low Water Mark

 

1-29-2013 1-22-13 PM.png

Asset - FAST Cache (new report)

 

1-29-2013 1-27-25 PM.png

 

Cache Enabled (Shows on 2 reports)

 

Storage - LUN's Report

 

1-29-2013 1-23-36 PM.png

Storage - Storage Pool/RAID Group Utilization

 

1-29-2013 1-25-48 PM.png

Also, we added performance metrics for Cache to multiple Performance charts in the product including:

 

Array Performance

1-29-2013 1-31-29 PM.png

Here we added support for these metrics:

  • FAST cache MB Flushed SPA
  • FAST cache MB Flushed SPA/sec
  • FAST cache MB Flushed SPB
  • FAST cache MB Flushed SPB/sec
  • %Dirty FAST Cache SPA
  • %Dirty FAST Cache SPB

 

LUN Performance

 

1-29-2013 1-43-22 PM.png

Here we added support for these metrics:

  • FAST Cache Read Hits
  • FAST Cache Read Hits/sec
  • FAST Cache Read Misses
  • FAST Cache Read Misses/sec
  • FAST Cache Write Hits
  • FAST Cache Write Hits/sec
  • FAST Cache Write Misses
  • FAST Cache Write Misses/sec
  • % FAST Cache Read Hits
  • % FAST Cache Write Hits

 

Storage Pool/RAID Group Performance

 

1-29-2013 1-46-59 PM.png

  • FAST Cache Read Hits
  • FAST Cache Read Hits/sec
  • FAST Cache Read Misses
  • FAST Cache Read Misses/sec
  • FAST Cache Write Hits
  • FAST Cache Write Hits/sec
  • FAST Cache Write Misses
  • FAST Cache Write Misses/sec

 

For a full overview of all the metrics available through the EMC SMI-S Provider, please reference these two guides provided by EMC:

EMC® SMI-S Provider Version 4.4.0 Programmer’s Guide with Additional Extensions

EMC SMI-S Provider Version 4.5 Programmer’s Guide

Here is the content that the Syslog dev team is currently looking at, for the next version of Syslog (current is v9.3.4). We will update this post with details once we get through the planning phase.

  • Moving to a new web server (UltiDev Web Server Pro)
  • Active Directory authentication for web access
  • Bug fixes

 

Disclaimer:  Comments given in this forum should not be interpreted as a commitment that SolarWinds will deliver any specific feature in any particular time frame. All discussions of future plans or product roadmaps are based on the product teams intentions, but those plans can change at any time.

Here is the content that the CatTools dev team is currently working on, for the next version of CatTools (current is v3.8).

  • Migration to SolarWinds Licensing Framework
  • Support for MikroTik devices
  • Improved support for several other devices
  • Bug Fixes

 

Disclaimer:  Comments given in this forum should not be interpreted as a commitment that SolarWinds will deliver any specific feature in any particular time frame. All discussions of future plans or product roadmaps are based on the product teams intentions, but those plans can change at any time.

Probably every network administrator knows this problem: How do I know what equipment in my network is going End-of-Life (EoL) and when?

This blog post discusses various aspects of this issue and offers some possibilities how the EoL problem can be solved using the newly updated EOL Lookup service from SolarWinds together with Network Configuration Manager.

 

Why Do I Need to Know the EoL Status of My Devices?

Basically, you need to know EoL status for planning purposes. Let's see why. The EoL process usually has a few common stages, although specific details may depend on the type of product and vendor:

 

  • End-of-Life Announcement -- The vendor publishes an EoL schedule for a particular product (or product line).
  • End-of-Sale Date -- After this date, you will not be able to purchase additional units of the product.
  • Last Shipment Date -- Depends on the previous item. Not all vendors publish it.
  • End of Software Maintenance -- No new bug fixes will be provided after this date.
  • End of Hardware Maintenance -- There is no guarantee you will be able to obtain spare parts after this date.
  • Last Date of Support -- This is the actual end of life. The product will no longer be supported by its vendor. Some companies offer support after this date based on a special contract.

 

For your business-critical production environment you obviously cannot afford to use unsupported products. You need your equipment to use the most up-to-date firmware version, have the possibility to order spare parts, have someone from technical support to help you troubleshoot problems, etc. That is why you need to check regularly which devices are going end-of-life and plan appropriate replacement. The planning is important not only because of the technical aspects but also because of budget. Imagine you have many pieces of certain hardware on your network and this hardware goes end-of-life; such replacement may require considerable investment.

 

How Can I find EoL Information?

Now we know why EoL information is important. The question is: How do I find it? There are two options:

 

Pay Someone Else

This is the easiest but also the most expensive option. Besides a big fat wallet you need a software tool that will create a complete inventory report for all your devices. Then you can send the report to the contractor and he should return back the list of equipment going end-of-life in a predefined time interval. This service may also be provided by the vendor, but then it is limited to the vendor's products. Additionally, the vendor and/or contractor may only accept inventory reports from his own asset management software, which can incur extra costs.

 

Do It Yourself

If you do not like the previous option, you can always manage the EoL information yourself. When you try to do that, you will quickly find out that it is not so easy. Let's take Cisco as an example. Their EoL summary page lists lots of products -- routers, switches, firmware, extension modules etc. However for many items, the EoL information is not actually included or the linked page does not contain the required details.

 

Cisco EoL Summary Page          Cisco Card No EoL

 

You must also realize that a device often consists of several parts some of which reach end of life sooner than the rest (e.g., certain firmware version). Last but not least, different vendors publish EoL statements in different formats or even require a partner account for access. Google may sometimes help but not in 100 % cases.

 

Why Is It a Problem to Get a Good EoL Report?

You finally managed to collect various pieces of EoL information and you have inventoried your devices. What comes next? Well, you have to match those two. And that is really not simple. On one hand, your device inventory may include data such as System OIDs or serial numbers that identify the model very well. On the other hand, vendors publish their EoL statements in terms of product names or some kind of internal codes. These pieces of information are not always exposed to the usual SNMP inventory data collection. The next section shows how our new EOL Lookup service can be used together with inventory data from NCM to get as much information as possible.

 

EOL Lookup and NCM

The EOL Lookup space does not have the ambition to replace the 100% accurate service that you can purchase from the vendors or professional services providers. It is intended as a tool for the do-it-yourself approach. How do you use it with NCM? First, you must select an inventory report that enumerates all your devices. Good candidates are 'All Nodes' and 'System Information of Each Device'. (By default, these can be found in the 'Node Details' category.)

 

NCM Inventory All Nodes

 

NCM Inventory System Info

 

Let's assume you want to find out the EoL status of the 'Core-3640' router. In the EOL Lookup tool, you enter the appropriate information:

 

EOL Lookup Enter Details

 

and get the result:

 

EOL Lookup 3640 Details

 

You have various options how to record the EoL information for your devices, e.g.:

  • Export the NCM inventory report in Excel format and add the information there. As you will probably have multiple devices of the same kind, you can group the result according to Machine Type and attach EoL data in bulk.
  • Create a custom property and fill in the EoL information. Again you can select all nodes of the same type and define the EoL info at once.

 

Further Resources

You can watch the following video to learn about the EoL/EoS feature available in NCM: How To Use the EoL Feature in NCM - YouTube.

After SolarWinds acquired DNSstuff, we interviewed many long standing, recent, and brand new customers to find out what they did with the site and where it was lacking. We heard loud and clear that Toolset - DNSstuff's toolbox of DNS, network, and mail troubleshooting tools - is where most people spend most of their time, but it was suffering from some long-standing usability issues. People reported:

  1. They frequently ran 1 or 2 of the same tools, sometimes from bookmarks
  2. They had to memorize where the tools were on the page, or CTRL+F to find them every time they visited (especially for the less frequent user)
  3. It wasn't clear when new tools were added, since there were so many things on the page it was hard to tell what was what - people just focused on what they needed to use
  4. Tools were not logically grouped to the way people think

 

We took your feedback to heart, came up with some new designs, reviewed them with some of those same customers, and went live with the first in a set of changes today. These changes look awesome but more importantly make toolset easier to use.

 

What's Changed?

 

Tabbed Categories

We've separated the tools out into their relative categories in tabs, so you can narrow down to just the troubleshooting you're interested in.

 

Inline Collapsible Categories

 

 

Those same categories appear on the page, and you can expand/collapse different sections. Just hit the + or - to expand or collapse. You can also use the grab on the left (the dots) to move the sections around.

 

You'll also see inline links to SolarWinds tools or products that relate to each section.

 

Consistent Look & Feel


 

Each tool has:

  1. A hint to expected input in the input box (does this tool expect a domain, an IP, something else, all of the above?)
  2. Hover help that tells you what each tool does
  3. A "Learn More" link to tell you more about that tool
  4. Grab handles (dots on the left) to rearrange tools
  5. The same size, color, button arrangement, fonts, and other things that probably drove you nuts on the old site.

 

What We Didn't Change

We have been addressing issues in each tool as we continue to receive feedback, but outside of issues, we did not make major changes to any tools or their output - just how they look on the page. You can still bookmark each tool's results page, the results pages are still formatted the same as they were before, and we didn't change MSTC, RBL Alerts, or Domain Doctor. You'll still need a (free trial) account to run the tools after so many runs.

 

What's Next?

Here's a peek at where we're going from here with Toolset.

 

Favorites. In order to make it easy for you to find your most used tools, we're going to add a "favorites" section that lets you pin your frequently used tools to the top.

More page customization. We want you to be able to reorder items on the page, even customize a page theme, and store that in your user settings.

Multi-tools. Some customers frequently run groups of tools and the new DNSstuff architecture makes it possible for us to present to you a "set" of tools that can be ran together.

 

In addition, we'll keep working on issues you've reported.

 

Feedback on Toolset, Thoughts on Future Features?

We've put up a survey on SurveyMonkey related to the new toolset features and what you'd like to see next. You can also post over on the DNSstuff forum here on Thwack, where product mangement, development, and support all respond to issues.

 

If you think you've encountered a bug or issue you can't work around (either with a specific tool or the site in general), or have an idea for a new DNSstuff tool or site feature that would really help you, post it over in the DNSstuff forum here on Thwack, too.

Java- a name that strikes annoyance into the hearts of sysadmins everywhere. With the recent rash of 0-day exploits affecting it, there has been a lot of media attention focused on patching Java. It can be said that many users spend more time patching Java then using it. Fortunately for Java however, it doesn't get lonely- there's plenty of exploit code available to run right from the sketchy torrent site your users like to visit. The most secure method of fixing this problem would be to disable Java entirely, but likely you are unable to do this without generating a ton of helpdesk calls / breaking internal applications. And for your admin systems, none of those legacy admin consoles will work any longer. (And by admin console, I mean Minecraft.) For the rest of us, we need to patch Java. If you are relying on Java auto updates to sort this mess out: A) you are a brave soul B) systems on your network are sending me unsolicited offers of knock-off handbags as we speak.  There are many good solutions for patching Java, like <shameless plug> SolarWinds Patch Manager </shameless plug> but let's focus on step 1: finding out what systems on our network are vulnerable. If you have DameWare DRS installed, you can use the Exporter function to make this a quick and easy task.

 

How To:

 

First let's open Exporter, and select the machines we want to poll from the populated AD list (or import your own), and check "Software" under "Standard properties":

 

DameWare_Exporter_Oracle_Patching1.jpg

 

Then we change to CSV format, and export to a single file

 

DameWare_Exporter_Oracle_Patching2.jpg

And let's start the fun:

 

DameWare_Exporter_Oracle_Patching3.jpg

Once complete, open the resulting CSV file in a spreadsheet application of your choosing, and we can locate any systems that may not be in compliance. The Excel geeks amongst us will note a pivot table would make short work of the analysis, but let's use a quick sort to find the machines with unpatched Oracle versions:

 

DameWare_Exporter_Oracle_Patching4.jpg

 

Compare the results against the latest Oracle security bulletin: http://www.oracle.com/technetwork/topics/security/alerts-086861.html

SPOILER ALERT: Virtually every version of 7 earlier than 7 U11 is impacted, and at the time of this writing there is at least one unpatched vulnerability.

 

DameWare_Exporter_Oracle_Patching4.jpg

Armed with this latest information, you can set about the fun task of remediating these errant systems, and your boss can stop asking you about this "Java thing." Just leave some extra time to fix all the apps the update just broke.

We feel your pain.

We've been developing our centralized IT alert management/escalation system for a while now (see: Say Goodbye to Your Pager: We’re working on a new, multi-vendor, centralized alert management product) and the good news is we're ready to welcome everyone who's interested in participating in the beta to do so. There's more info in this post, but if you already know you want in, visit the Alert Central website and sign up.

 

What's Alert Central?

Alert Central is a product intended to help you get the right alerts to the right people at the right time. Core features include:

  • Alert centralization and escalation
  • Group-centric management (with Active Directory integration)
  • Multiple on-call calendars per-group
  • Out of the box two-way integration with Orion alerting
  • Support for almost any third party and non-Orion source via email
  • Acknowledgement and escalation via both email and web interface
  • Advanced routing rules that let you slice and dice routing policies based on properties of the email/alerts
  • A weekly summary report with important stats
  • A web UI that works in browsers and mobile devices (as long as you can reach the AC server via VPN or internal network)
  • Plain text, rich text, and short text (for use with SMS) email templates

 

It's different than a helpdesk or ticket tracking system in that Alert Central's focus is around On Call management and escalation. When you need to wake someone up to deal with an issue, when you need to be sure that something is handled promptly, when something is affecting business/people, when an issue is time sensitive, it's a good candidate for Alert Central. When you're tracking ongoing issues, requests for help or new equipment, and things that aren't necessarily time-sensitive, a helpdesk system is a great fit (we happen to know of a good one - Web Help Desk).

 

Alert Central is deployed as a standalone virtual appliance, not an Orion module or add-on. Anyone can download and install it, and integrate it with SolarWinds and non-SolarWinds products alike. As long as your product sends emails and you want to route them to the right people, Alert Central will work for you.

 

Guess what, it's free!

You heard that right - Alert Central is free. Not just the beta. The product. Free. $0. Also zero in Euros, Canadian dollars, and all other currencies. Except maybe your feedback, that is a currency we really appreciate.

 

How does it work?

This handy infographic is a great visual of how Alert Central works (borrowed from the Alert Central website).

 

http://content.solarwinds.com/creative/richcontent/alertcentral/images/infographic-alertcentral.png

 

This video (humorously) shows you why Alert Central is awesome:

 

 

These 3 images are also a really good summary of the highlights and top features (also on the Alert Central website) - left side has feature callouts, right side high-res for the curious among you:

 

Managing Alerts:

http://content.solarwinds.com/creative/richcontent/alertcentral/images/01-ac-screenshot.jpgAlertCentral-AlertsScreenWithData.png

On Call Scheduling:

http://content.solarwinds.com/creative/richcontent/alertcentral/images/02-ac-screenshot.jpg AlertCentral-WhosOnCallNowWithGroups.png

Alert Routing:

http://content.solarwinds.com/creative/richcontent/alertcentral/images/03-ac-screenshot.jpg AlertCentral-SourcesWithHelpdeskExample.png

 

How do I get access to the beta?

 

Easy! Go to the Alert Central website and click the big "Sign up for the beta" links. Be sure to check out the beta contest, the winner gets a pretty sweet trip to Austin.

 

Give us feedback - we want to know what you think

 

Speaking of feedback... Take a look at the website, install the beta, give us your thoughts. Anything got you stumped? Think you ran into a bug? Think this is the best UI since Netscape Navigator busted open the web? Tell us what you think.

 

To report bugs, issues, confusion, or praise for the beta, use the Alert Central Beta group. There's an important post there with known issues that you should be sure to check first - Alert Central 1/2013 Beta Notes.

 

If you have a suggestion for something we didn't get in v1/beta that you think would make Alert Central even more awesome for you, please post (and vote) in the Ideas/Feature Requests area of the beta group. A shortcut: http://thwack.solarwinds.com/groups/alert-central-beta/content?filterID=content~objecttype~idea

Microsoft's Windows Management Instrumentation, better known as WMI, is powerful remote API that's available with all Windows desktop and server operating systems since Windows NT 4.0. It's enabled by default and requires no configuration to utilize. It's this simplicity, coupled with the vast array of valuable system information that makes it invaluable for agentlessly monitoring Windows performance and collecting system inventory information remotely.

 

However, WMI is not all rainbows and lollipops. There are a few notable areas lacking in its implementation. For example, unlike SNMP, WMI is not very latency friendly. It's a protocol best suited to local area networks, or WANs where latency normally averages less than 100ms. That makes the WMI protocol less than ideal for remotely monitoring hosts via satellite, or heavily congested and highly latent, long distance, low bandwidth WAN links. In these scenarios it's recommended to monitor these remote hosts using SNMP, or deploy a dedicated Orion Instance to this remote location where polling can occur locally; leveraging the Enterprise Operations Console to provide a single pane of glass view for your entire organization.

 

WMI Protocol Security

 

I'm often asked about the security of the WMI protocol. While it's true that the majority of implementations lack payload encryption, the most meaningful component, the credentials used to gain access to the remote computer are fully encrypted using NTLM, or Kerberos. This means that even in the most insecure environment, the most sensitive information a potential hacker intercepting your packets is likely to uncover is your servers' current CPU or memory utilization. Hardly top secret stuff. The image below shows what a raw WMI packet looks like when SAM is monitoring the IIS World Wide Publishing Service "W3SVC" remotely.

 

Service Name.pngPacket Capture RAW.png

 

In the packet capture above you can see the actual WMI query starting with "SELECT", all the way down to the name of the service being monitored by SAM. Again, this isn't really anything to be too concerned about. After all, 294 billion emails, many of which contain far more sensitive information than how much free disk space you have remaining on your server C:\ drive, are sent completely unencrypted via the internet every day.

 

However, for environments where regulatory mandate dictates the highest standard of network encryption requirements, such as the Department of Defense, there are a variety of different methods for encrypting WMI communications. Most commonly used are IPSEC Policies built into Windows, which serve to ensure that each and every packet communicated between these two hosts are fully encrypted, and safe from prying eyes. It's not something I'd recommend for every environment, but chances are good that if you fall under a similar regulatory mandate then you're already all too familiar with many of the options available to you for encrypting communications between two hosts.

 

Restricting WMI Port Usage

 

Undoubtedly the most common question posed to me as Product Manager of both Server & Application Monitor, and Patch Manager, both of which heavily utilize WMI, is what port WMI uses to communicate with a remote host over the network? This is a somewhat loaded question, since WMI isn't limited to any particular port at all. In fact, the closest, best, and most accurate answer I can provide is that WMI is built on DCOM, and leverages the RPC Portmapper service for dynamic port allocation. Put simply, this means that WMI uses TCP Port 135 to initiate communication with the remotely managed host, then switches to any random high port anywhere between TCP port 1024-65535.

 

WMI Packet Capture.png

 

This is normally the point in the conversation where the person I'm speaking with grimaces and moans, uttering that there's no way the individual in charge of network security is going to be "okay" with opening 64512 TCP ports on the Firewall or router's access control list, simply to monitor their servers or push patches. Essentially this would be the equivalent of having no firewall at all!

 

Not to worry, there are a variety of different methods that can be used to limit the range of ports WMI uses to a much more meager number. It's important to note however, that Microsoft recommends that you don't limit your WMI port range down too tightly. Each allowed port in the designated port range allows one concurrent WMI connection. So the more applications you have that leverage WMI, the more ports you're likely to consume. I've personally had long-term success limiting the WMI port range down on SAM managed nodes to as few as 10 ports, but your mileage may vary. So for the purposes of this blog post I'll stick with Microsoft's recommended 100 ports. There are some additional notes you should keep in mind while reading this post and before implementing any changes.

 

Prerequisites

 

  • Consider your port range carefully before proceeding to ensure there are no conflicts with ports already in use by critical business applications in your environment. For instance, if you plan to use ports 5000-5100 as provided in these examples, you might not realize that some of these ports overlap with other applications such as FileMaker Pro. While this shouldn't be a major issue, it will reduce the total number of ports available to WMI for concurrent connections. This however does become vitally important to pay close attention to if you have plans to reduce the WMI port range to only a handful of ports. It's probably worth spending a little time determining which ports are being used in your environment by monitoring your network traffic using something like our NetFlow Traffic Analyzer. Similarly, you could use the Network Configuration Manager to report on all open ports that your Windows hosts are listening on.

 

  • Group policy can take as long as 90 minutes to update a clients configuration. If you plan to use method #2, or implement method #3 below via group policy, it's important to remain patient. You can always verify a clients configuration using Regedit and validating the contents of the HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Rpc\Internet key. If you have easy access to the machines receiving their configuration settings via group policy, you can also force a policy refresh by opening a command prompt on the host and running the "gpupdate /force" command.

 

  • These changes require rebooting the remote host before they take effect. In the case of workstations, you may be able to rely on your users to shutdown their machines at the end of the day and turn them back on when they return. For servers, you will more than likely have to wait for a maintenance window, such as Patch Tuesday before you're allowed to reboot and these changes take effect.

 

  • Neither SAM nor Patch Manager require WMI. It's a powerful, but completely optional component to both products. In the case of SAM, there are SNMP and RPC alternatives for almost every instance where WMI is available. Patch Manager includes optional WMI Providers that allow for the immediate deployment of patches to end hosts, eliminating the need to wait for a regularly scheduled phone home event from the client to check for new updates. Regularly scheduled patch deployments however, have no reliance on WMI.

 

  • WMI Ports are opened (listening) only as needed, based on an initial request sent via TCP 135. So while the examples below demonstrate allocating a range of 100 ports to WMI, a port will only be opened for use once the connection has been properly authorized and authenticated during that initial RPC request.

 

  • Be sure you've made provisions for allowing TCP Port 135, as well as, the range of ports you've chosen to allocate to WMI to the Windows Firewall exclusion list. For added security, you can optionally limit this exclusion to the source IP address of your SAM or Patch Manager server.

Windows Firewall Protocols and Ports.png Windows Firewall Scope.png

 

 

Method 1 - Modify The Registry Directly

 

Perhaps the simplest, and most straightforward way to limit the port range WMI uses is directly through the registry of the remotely managed host. This can be done manually following the steps below.

 

  • Add the Internet key under:  HKEY_LOCAL_MACHINE\Software\Microsoft\Rpc
  • Under the Internet key, add the values "Ports" (MULTI_SZ), "PortsInternetAvailable" (REG_SZ), and "UseInternetPorts" (REG_SZ).

In this example ports 5000 through 5100 inclusive have been arbitrarily selected to help illustrate how the new registry key can be configured. For example, the new registry key appears as follows:

 

Ports: REG_MULTI_SZ: 5000-5100
PortsInternetAvailable: REG_SZ: Y
UseInternetPorts: REG_SZ: Y

 

WMI_Regedit.png

  • Restart the server. All applications that use RPC dynamic port allocation use ports 5000 through 5100, inclusive. In most environments, a minimum of 100 ports should be opened, because several system services rely on these RPC ports to communicate with each other.

 

I’ve also attached the WMI Ports 5000-5100.zip that packages these changes into an easy to use single registry file. Simply extract the archive, and double click on the "WMI Ports 5000-5100.reg" file to make the above changes on the machine and reboot.

 

Method 2 - Active Directory Group Policy Administrative Template

 

Alternatively, these settings can be centrally configured and deployed via group policy to any machine within the domain. While the manual method referenced above may be suitable for a small handful of machines, such as those in your DMZ, it simply doesn't scale should you need to make these changes to hundreds or possibly thousands of managed hosts.

 

For instances such as this I’ve packaged an Active Directory Administrative Template, WMI Group Policy.zip,that can be used to configure the WMI port range through Active Directory. It's important to note that this Administrative Template can only be used with Windows 2008 and greater functional role level domains, though any machine joined to the domain, from Windows 2000, XP, through Windows 8 and Server 2012 will respect these policy settings.

 

Once extracted, copy the “WMI.ADMX” file to the “%systemroot%\PolicyDefinitions” directory on your domain controller. Then copy the “WMI.ADML” file to the “%systemroot%\PolicyDefinitions\en-US” directory. If you share your administrative templates across multiple domain controllers and would like these to be available to all DCs you can alternatively place WMI.ADMX in the “%systemroot%\sysvol\domain\policies\PolicyDefinitions” and the “WMI.ADML” in the “%systemroot%\sysvol\domain\policies\PolicyDefinitions\EN-US” directories.

 

Now that you've extracted the Administrative Template into the appropriate directories, you can utilize it in the same manner as any other group policy setting. From within the Group Policy Management Editor, either open an existing group policy that’s already applied to the devices you’d like this change applied to, or create a new group policy. Right Click on the Group Policy and click “Edit”.

Group Policy Management.png

Within the Group Policy Management Editor expand “Administrative Templates” – “Networking” – “RPC Static Pool” and double click on “RPC Static Pool Definition”.

Group Policy Editor.png

Inside the RPC Static Pool Definition select the “Enabled” radio button to apply the following settings to the objects this group policy applies to.

 

“Use Internet Ports” = “Y”

“Ports Internet Available” = “Y”

“Define Internet ports for RPC below” = “5000-5100”

RPC Static Pool Definition.png

When done click “Apply”. This group policy change should take effect within 90 minutes for all nodes contained in the OU.

 

Method 3 - Create Your Own Group Policy Setting

 

The last method for distributing these policy changes that I'll discuss in this blog post is creating your very own Group Policy setting. It's perhaps not as elegant or neatly packaged as the Administrative Template method above, but it does have the distinct advantage of working on older Windows 2003 Domain Controllers. Also, unlike Administrative Templates you can follow essentially the very same steps below on any Windows machine that's handy and define a local RPC port range for that workstation. This means you can play around using this method without needing to implement any changes on your Domain Controller. Once you're comfortable with these settings however, you can implement them via Group Policy in a similar manner to distribute them across your domain.

 

  • Before we begin you'll first want to make a backup copy of the "sceregvl.inf" file located in "C:\WINDOWS\INF". This will allow you to quickly and easily revert any changes made should you need to for any reason.
  • Next, you'll need to take ownership of the "sceregvl.inf" file by right clicking on the file, selecting "Properties", then selecting the "Security" tab and clicking "Advanced". Inside the "Advanced Security Settings", select the "Owner" tab click the "Edit" button to select your Domain "Administrators" group, for which you should already be a member. Lastly, click "Apply" and "OK" to save these settings.
  • While still inside the security properties of the file we need to ensure the Domain Administrators group has all the necessary permissions required to modify the file. To do so, select your Domain Administrators group from the list of "Group or user names" and select both "Allow" check boxes next to "Modify" and "Write" under "Permissions for Administrators" as pictured below.

Take Ownership sceregvl.inf.pngModify Permissions sceregvl.inf.png

 

  • Now that security permissions have been properly modified, it's time to get down to business. Start with launching Notepad as the Administrator by holding down the "Control" key on your keyboard while right clicking on Notepad and selecting "Run as administrator". Once launched, open "C:\WINDOWS\INF\sceregvl.inf" from within Notepad using the "File" menu.

Open Notepad as Administrator.png

  • Next we'll need to add a few lines to the top section of the "sceregvl.inf"configuration file, just above the other registry keys as pictured below. These configuration settings can be copied as written here, then pasted directly into the "sceregvl.inf" configuration file. Once completed don't forget to save the file.

 

MACHINE\SOFTWARE\Microsoft\Rpc\Internet\Ports,7,RPC_Ports,4

MACHINE\SOFTWARE\Microsoft\Rpc\Internet\PortsInternetAvailable,1,RPC_PortsInternetAvailable,2

MACHINE\SOFTWARE\Microsoft\Rpc\Internet\UseInternetPorts,1,RPC_UseInternetPorts,2    

Edit sceregvl.inf.png

 

  • Before we can start playing with our new policy settings we first need to re-register the "scecli.dll" with Windows. To do this, you'll first need to open a command prompt window as Administrator in the same manner we launched Notepad in the previous step. Right click on the "Command Prompt" while holding down the "Control" key and select "Run as administrator". Then run the command "regsvr32 scecli.dll' to re-register "scecli.dll". When executed, you should receive confirmation that the DLL was successfully registered.

regsvr32 scecli.dll.png

 

  • Similar to the Administrative Template method above you can now configure these settings in the same manner as any other group policy. From within the Group Policy Management Editor, either open an existing group policy to an organizational unit that already contains the devices you’d like this change applied to, or create a new group policy. Right Click on the Group Policy and click “Edit”.  Within the Group Policy Management Editor expand “Windows Settings” – “Security Settings” - "Local Policies"  and select "Security Options". There you will find these new configuration options which can be defined using the following suggested settings.

 

“RPC_Ports” = “Y”

“RPC_PortsInternetAvailable” = “Y”

“RPC_Ports” = “5000-5100”

RPC_Ports.pngGroup Policy Settings.png

 

Show me the Money

 

Regardless of which method you choose, the end result is essentially the same. Instead of WMI using any random TCP port greater than 1024, it's now confined to a limited range of ports that both you and the security czar will unanimously agree strikes the perfect balance of safe and functional. The image below demonstrates this result in a packet capture. You can still see the initial request via TCP Port 135 then switch to TCP Port 5002. This is because there are already two other WMI connections to this host using TCP Port 5000 and 5001. The WMI port range recycles ports as connections are terminated or timeout; therefore, provided your port range is adequately sized, there should be no concern of port exhaustion.

 

Packet Capture Result.png

Filter Blog

By date: By tag: