1 14 15 16 17 18 Previous Next

Geek Speak

1,629 posts

If you have been troubleshooting networks for any length of time (or hanging out on the SolarWinds Thwack GeekSpeak forum) it should be obvious that packet inspection is a technique well worth learning. The depth of insight which packet capture tools like WireShark provide is hard to understate.


It's also hard to learn how to do.


(although galactic props to glenkemp for walking through some implementation basics recently on GeekSpeak.


But unless you have a specific use case, or there's a crisis and your boss is breathing down your neck, it's not easy to find the motivation to actually PERFORM a packet capture and analyze the results. Finding the right data sources, identifying the protocol and port, and calculating the time-to-first-bye or TCP-Three-Way-handshake are - for all but the geekiest of the geeks - simply not something we do for kicks and giggles.


That was the driving motivation for us to include Deep Packet Inspection in our latest version of SolarWinds NPM. But even though NPM 11 does a lot of the heavy lifting, as an IT Pro you still need to know what you are looking at and why you would look there versus any of the other amazing data displays in the tool.


Which is why we have created the FREE email course on Deep Packet Inspection for Quality of Experience monitoring. Detailed lessons will explain not only HOW to perform various tasks, but WHY you should do them. Lessons are self-contained - no cliffhangers or "please sign up for our next course to find out more", and broken down into manageable amounts so you aren't overwhelmed. Finally, delivery to your inbox means you don't have to remember to go to some website and open a course, and you can work on each lesson at your pace and on your schedule.


Monitoring tools have advanced past ping and simple SNMP and can now perform packet inspection. Shouldn't you?


Use this link to find out more and sign up: DPI Online Course & Study Group

IT admins are constantly being challenged to do more with less in their production environments. An example is performance optimization of an n-tier virtualized application stack while minimizing cost. IT admins need to deliver the best possible quality of service (QoS) and return on investment (ROI) with limited time and resources.


Plan on Optimal Performance: A 6-Step Framework

The first step starts with a plan that needs to produce consistent and repeatable results while maintaining cost-efficiency at scale. One approach is to:

    1. Establish a baseline measurement for the application’s performance with accompanying performance data.
    2. Monitor and log any changes and key performance counters over time.
    3. Define data-driven criteria for good as well as bad, worse, and critical.
    4. Create alerts for bad, worse, and critical.
    5. Integrate a feedback loop with fixes for the degraded states.
    6. Repeat steps 1-5.

This 6-step framework provides a disciplined methodology to troubleshoot performance bottlenecks like disk contention and noisy-neighbors.

Back to Good From Contention and Noisy-Neighbors

Virtualization performance issues usually involve the storage infrastructure and announce their presence in the form of degraded application performance. Storage performance issues could stem from disk contention and noisy neighbor virtual machines (VMs). Disk contention is when multiple VMs try to simultaneous access the same disk leading to high response times and potential application timeouts. On the other hand, noisy-neighbor VMs periodically monopolize storage IO resources to the detriment of any other VMs on the shared storage.


Leveraging the framework, any application slowness or abnormalities should generate an alert based on triggers like high disk latencies (avg. sec per R/W), high application response times (millisecond), low disk throughput (Bytes per second) and high CPU & memory utilization. Next, the environment should be examined from a current top-down view of the resources and applications. That way, the degraded state can be compared to the known good state. Afterward, a drill-down should be done on the specific application or storage subsystem.


If the bottleneck is disk contention, there will be high IOPs and high response times on a disk. If the bottleneck is from noisy-neighbor VMs, those VMs will have high IO metrics (IOPs, bandwidth) while other VMs on the shared storage will be starved with low IO metrics. Once the issue is identified, counter and preventative measures can be taken.

Three Tips for Contention and Noisy-Neighbors

Tip #1: As a general rule of thumb, RAID 5 can sustain 150 IOPs per spindle in its group and RAID 10 can sustain 200 IOPs per spindle in its group. So distribute all of the VMs’ IOPs across the RAID groups according to these rules to avoid disk contention.

Tip #2: If the disk contention is occurring on a VMFS datastore, an IT Admin can adjust the disk shares for all VMs accessing the datastore from the same ESXi host and/or move some of the VM’s from the VMFS datastore to another datastore on a different LUN to resolve the contention.           

Tip #3: To address noisy-neighbor VMs, IOPs or bandwidth restrictions can be applied to the VMs via features like VMware’s SDRS and SIOC or Storage Quality of Service for Hyper-V.


A plan in hand coupled with tools like SolarWinds Server & Application Monitor and Virtualization Manager provides an IT admin a complete solution to manage, optimize, and troubleshoot virtualized application performance through its entire lifecycle. Let me know what you think in the Comments section. Plus, join the Community conversations on thwack.

This past year American retailers have seen their fair share of data breaches. To name a few, Target®, Michaels®, Neiman Marcus®, Goodwill®, Home Depot®. The prominent fact is that there have been so many breaches in such a short period of time!


The impact of such incidents has not only been that they cause a serious dent in the company’s revenue, but also on the company’s reputation. It is obvious that there has been an increase in cybercriminal activity and that these lawbreakers are becoming more and more aggressive. So, with the holiday season underway, are you worried that more data breaches may occur and are companies prepared?


This blog will focus on Data breaches, PCI DSS, challenges retailers face with compliance, and finally some useful tips to achieve PCI compliance.


Data Breach – What possibly could have gone wrong?

Some vendors have flawed or outdated security systems which allow customer information to be stolen. Therefore, little attention is paid to ensure that all devices are updated and patched. Moreover, administrators have limited provision to monitor for suspicious behavior and fail to take the necessary steps to check for existing security holes by performing regular vulnerability scans. To top it off, there could be minimal to no documentation of network changes or simply poor communication between various IT departments.



To help companies who deal with financial information and to protect their customer data, the Payment Card Industry Data Security Standard (PCI DSS) defines a set of security controls to help companies process, store, or transmit credit card information in a secure environment. In order to help network administrators maintain such a network, PCI DSS ver3 broadly defines the following controls specifically for network routers and switches:

  1. Build and maintain a secure network and systems
  2. Protect cardholder data
  3. Maintain a vulnerability management program
  4. Implement strong access control measures
  5. Regularly monitor and test networks
  6. Maintain an information security policy

Challenge with Compliance

PCI DSS defines security-specific objectives, but doesn’t lay down specific security controls or a method for these controls to be implemented. Simply using firewalls, intrusion detection, anti-virus, patch management, and related technologies may not be sufficient unless they are used with necessary operational controls specified by PCI DSS policies. Provided below are a few challenges administrators face when trying to implement and maintain compliance:

  • Uncertainty about what’s on the network
  • Insufficient mechanisms for vulnerability assessment and immediate remediation
  • Absence of compliance reporting and continuous monitoring
  • Not just implementing PCI DSS 3.0, but also continuously maintaining compliance

Best Practices to Achieve PCI Compliance

PCI DSS objectives are satisfied when firewalls, intrusion detection, anti-virus, and patch management are used together with the necessary operational controls. Here are a few best practices to achieve compliance while saving valuable time:

  • Segment specific parts of your network to define controls and protection where sensitive data resides
  • Ensure the use of right protocols and security best practices to plug possible network vulnerabilities
  • Implement and follow supporting operational controls like device inventory management, configuration change approvals, regular backups, automation of tasks in addition to compliance with internal and external standards

These practices become more important in networks where there are hundreds of multi-vendor devices and device types operating over many locations. The overall time, effort, and costs involved in achieving compliance is very high. However, the cost of being non-compliant cannot be ignored. Stay tuned for my next post where I will provide a detailed list of tips to achieve PCI compliance while saving time and money.

It goes without saying that patching and updating your systems is a necessity.  No one wants to deal with the aftermath of a security breach because you forgot to manually patch your servers over the weekend, or your SCCM/WSUS/YUM solution wasn't configured correctly.  So how do you craft a solid plan of attack for patching?  There are many different ways you can approach patching, in previous posts I talked about what you are patching, and how to patch Linux systems, but we need to discuss creating a strategic plan for ensuring patch and update management don't let you down.  What I've done is laid out a step by step process in which you will learn how to create a Patching Plan of Attack or PPoA (not really an acronym but looks like a real one).


Step 1: Do you even know what needs to be patched?

The first step in our PPoA would be to do an assessment or inventory to see what is out there in your environment that needs to be patched.  Servers, networking gear, firewalls, desktop systems, etc.  If you don't know what's out there in your environment then how can you be confident in creating a PPoA??  You can't!  For some this might be easy due to the smaller size of their environment, but for others who work in a large enterprise with 100s of devices it can get tricky.  Thankfully tools like SolarWinds LAN Surveyor and and SNMP v3 can help you map out your network and see what's out there.  Hopefully you are already doing regular datacenter health checks where you actually set your Cheetos and Mt. Dew aside, get our of your chair and walk to the actual datacenter (please clean the orange dust off your fingers first!).


Step 2:  Being like everyone else is sometimes easier!

How many flavors of Linux are in your environment?  How many different versions are you supporting?  Do you have Win7, XP and Win8 all in your environment?  It can get tricky if you have a bunch of different operating systems out there and even trickier if they are all at different service pack levels.  Keep everything the same, if everything is the same, then you'll have an easier time putting together your PPoA and streamlining the process of patching.  Patching is mind numbing and painful, you don't want to add complexity to patching if you can avoid it.


Step 3:  Beep, beep, beep.... Back it up!  Please!

Before you even think about applying any patches, your PPoA must include a process for backing up all of your systems prior to and after patching.  The last thing anyone wants to do is have a RGE on their hands!  We shouldn't even be talking about this, if you aren't backing up your systems, run and hide and don't tell anyone else (I'll keep your secret).  If you don't have the storage space to back up your systems, find it.  If you are already backing up your systems, good for you, here's a virtual pat on the back!


Step 4:  Assess, Mitigate, Allow

I'm sure I've got you all out there reading this super excited and jonesing to go out and patch away, calm down, I know it's exciting, but let me ask you a question first.  Do you need to apply every patch that comes out?  Are all of your systems "mission critical"?  Before applying patches and creating an elaborate PPoA, do a risk assessment to see if you really  need to patch everything that you have.  The overhead that comes with patching can sometimes get out of hand if you apply every patch available to every systems you have.  For some, i.e. federal, you have to apply them all, but for others it might not be so necessary.  Can you mitigate the risk before patching it?  Are there things you can do ahead of time to reduce the risk or exposure of a certain system or group of systems?  Finally what kind of risks are you going to allow in your environment?  These are all aspects of good risk management that you can apply to your planning.


Step 5:  Patch away!

Now you have your PPoA and you are ready to get patching, go for it.  If you have a good plan of attack and you feel confident that everything has been backed up and all risks have be assessed and mitigated, then have at it.  Occasionally you are going to run into a patch that your systems aren't going to like, and they will stop working.  Hopefully you've backed up your systems or better yet, you are working with VMs and you can revert back to an earlier snapshot.  Keep these 5 steps in mind when building out your PPoA so you can feel confident tackling probably the most annoying task in all of IT.

The technology industry is changing at a rapid pace, but let me put that into perspective. It took radio 38 years to reach 50 million users, it took Television 13yrs, the Internet 4 years, Facebook 1,096 days, and  Google+ only 88 days. So what's this rapid change in how business is done mean for us in the IT industry? It means our skill sets are going to need to also change rapidly. It can no longer be "I only do storage, so that's not my problem", there has to be a fundamental shift in how we move forward in the IT industry. We have to be in tune with others outside our group and be more aligned with the goals of the business. I've felt in some of my previous roles that the infrastructure and operations teams gained little visibility into the goals of the business. This is not to say that we don't want to know, or that we have no clue what they are, but a lot of context is lost once it make it to these teams. For example: You may have management ask you to go research this or go do this,  but with little context around how it affects the business or other teams.


So how do we change this? Well, part if it's a culture change, but a big part is changing some of our skills. Below I've listed a few skills I think are important to IT operations management.


  • Understands business goals
  • Problem solving skills with organizational challenge orientation
  • Effective communicator across teams
  • Holistic view across various technology silos


I would love to hear what others think about the skills needed for this role and how you've changed some of your skill set in this rapidly changing IT industry.


A quick Google image search of Christopher Kusek will show you the most important thing you need to know about him: He’s yet to find a set of fake ears he doesn’t look great in! Of course, you might also be able to discern a few other arguably less impressive things about him: CISSP, vExpert, VCP, BDA, EMCTA, EMCCA, CCNP, CCNA, MCSE, MCTS, MCITP, NCIE, NCSA, NACE, SCP.


He’s also the proud owner of PKGuild.com, which is why he’s the focus of this month’s IT blogger spotlight. And in case you’re one of the two people left on the Internet not yet following him on Twitter, where he goes by @cxi, you should be!


Read on to learn a little more about Christopher, including his affinity for unique headwear, what it’s like to be an IT pro in the middle of Afghanistan and his thoughts on the most significant trends in IT right now, including SDS, SDN and more.


Also, if you have your own questions for Christopher, feel free to leave a comment.


SW: OK, so I’ve got to ask, what’s up with the fake ears?


CK: There’s a whole other backstory to that, but this section would end up being far too long if I were to tell it, so I’ll stick to the one that most closely relates to the images you see in the aforementioned Google image search. It all began in May 2011 when I was hosting a party in Las Vegas for EMC World. The invitation page for the event had a section requesting a “Logo Image.” I sat there thinking to myself, “Doh, what would be a good image?” So, I scoured through my hard drive and found some pictures I thought were just ridiculous enough. They happened to be of me wearing cat ears. You see, when I was writing my first VMware book back in 2011, I would go to a Starbucks on North Avenue in downtown Chicago and just sit there pegging away at the pages and chapters. I thought, what better way to both get my work done and bring joy to people who would pass through over the course of the hours I’d sit working there than by wearing some cat ears. I mean, everyone loves cats, right?! Eventually the whole idea evolved into brainwave controlled cat ears.


SW: Brainwave controlled cat ears, huh? I don’t think there’s enough space here to cover such an in-depth and socially important topic. So, let’s talk about PKGuild. What are you writing about over there?


CK: For the most part I tend to write about things that I’m either really passionate about—something that solves a problem, is just absolutely awesome or will benefit other people. It turns out that a lot of the time that tends to fall into the realms of virtualization, storage, security, cloud, certification, education and things that broach realms of Innovation. I’m not limited exclusively to writing about those subjects, but a majority of the stuff I write about tends to cross those spectrums.


SW: OK, shifting gears again—outside of the blog, how do you pay the bills?


CK: I just recently returned from a two year stint in Afghanistan and am now in a new role as CTO at Xiologix, an IT solutions provider headquartered in Portland. I’m responsible for the technical direction and engineering of the business and for helping customers solve complex technology and IT problems.


SW: What was the two year stint in Afghanistan all about?


CK: I was the senior technical director for datacenter, storage and virtualization for Combined Joint Operations-Afghanistan. Honestly—and this is something covered at length in various blog posts I’ve written—it was a unique opportunity to do something I enjoy and that I do very well as a way of serving my country. While I may not be able to pick up a gun or run down a group of insurgents, I was able to build some of the most comprehensive, resilient and versatile networks in the world, and help lead others to achieve those same results.


SW: So, what was it like being an IT pro in the middle of a warzone?


CK: The first thought that comes to mind is, “It sucked!” Because, quite frankly, it did. I mean the living accommodations and food were horrible, there was risk at every avenue and the chances for you to be hurt, maimed or worse were all very real. But let’s consider the facts, I didn’t go there because I was expecting there to be good food, living quarters or for it to be relatively safe. Once you get past all that and realize you have a job to do, it was pretty much just that. Go in and try to make everything and anything you do better than you found it, and better for the person who comes after you. I found lots of decisions were made on 6, 9 or 12 month “plans,” as in someone rotated in and would be there for a certain duration and would “do stuff,” whether right or wrong, and then rotate out. This was true whether it came to licensing software, attempting to procure something to solve a problem or maintaining operational environments for an enduring environment that had been there for 10 years prior to them and would continue to be there long after they were gone. This differs greatly from how corporations or nearly any other mission critical environment is run.


SW: Based on your impressive collection of certifications, which includes SolarWinds Certified Professional, I‘m guessing this whole IT gig isn’t new to you.


CK: Not exactly, no. I’ve been working in IT for over 20 years. Back in the early 1990s, I was a security researcher. During that time, I would also build and simulate large corporate networks—yes, for fun…and to assist friends who worked at consulting companies. After I returned from a memory buying trip to Japan in 1996 to support my home labs, I decided to get a job at a consultancy in the Chicagoland area, where I went on to work for 13 years before moving onto the vendor life at NetApp and EMC.


SW: OK, so when you’re not working or blogging—or keeping our armed forces digital backbone up and running—what are some of your hobbies?


CK: When I’m not working or blogging, I’m usually working or blogging! But seriously, I enjoy reading. I even write a book on occasion. I spend a lot of time with my family, and as a vegan foodie I also enjoy discovering new food options. I also enjoy the occasional trip to Las Vegas because I love applying the principle of chaos theory to games of chance. Being that I now live in the Pacific Northwest, I also look forward to the opportunity to get out and explore nature. Finally, I really enjoy getting out there in the community, working with others and helping them grow themselves and their careers, whether that be through mentorship, presenting at conferences and user groups or other kinds of involvement and engagement.


SW: OK, for my last question, I want you to really put your thinking cap on—what are the most significant trends you’re seeing in the IT industry?


CK: With the maturity and wide scale adoption of virtualization, there are related changes happening in the IT landscape that we’re only beginning to realize the benefits of. This includes software defined storage and software defined networking. SDS and SDN provide such potential benefits that the market hasn’t been ready for them up until this point, but eventually we’ll get there. Cloud is another, though the term is so often repeated, it really isn’t worth talking about outside of the further extension of internal datacenters into public-side datacenters with hybrid cloud services. Lastly, the further commoditization of flash Storage, which is driving prices down significantly, is increasingly making “speeds and feeds” a problem of the past; in turn making the value of data far exceed the speed of data access on disk.

IP address conflicts are usually temporary, but you can’t always expect them to resolve themselves. In my previous blog, we looked at the various causes of IP conflicts and the difficulties administrators face when determining the source of a network issue and whether it’s actually an IP conflict. In this post, I would like to peruse troubleshooting IP conflicts and the fastest methods of resolution to minimize network downtime.


So when you see the blatant message staring at you from the screen, “There is an IP address conflict with another system on the network.” Network administrators would typically want to know, as quickly as possible, what system owns that address and where is it located? A relatively easy way is to find the MAC address of an IP address within the same network or subnet, ping the IP address, and then immediately inspect the local ARP table on the router. If you use a Windows PC, the following steps will guide you through this search:


  • Click on Windows ‘ Start’ , type ‘cmd,’ and ‘Enter’ to open the command prompt
  • At the command prompt, ping the reachability of the IP address that you want to locate
    • For example, ping xxx.xxx.xx.xx. If the ping is successful, you should see a reply from the remote deviceif the ping request doesn’t locate the host then you won’t be able to proceed with the next step
  • Now, at the command prompt type arp –a. The command should return a table listing all IP addresses your PC is able to contact. Within this table you can locate the IP address you’re looking for, then the corresponding column will show you the MAC address


This method for finding the MAC address with ‘ping’ and ‘arp’ typically works. However, if it does not, then you will have to take more time and effort to locate the offending MAC address. If you do not find what you are looking for on the first attempt, you will need to repeat this process on all routers until you find the offending IP and MAC address. Once you are successful in locating the MAC address, you need to find the switch and switch port that the offending IP address/device is connected to. Knowing this will help you to disconnect the device from the network. The following steps help locate the MAC addresses connected to a switch.


  • Issue this command on each switch in your network- ‘show mac-address-table’ (this is for Cisco IOS or compatible switches).
  • The command returns a list of MAC addresses associated with each active switch port. Check if this table contains the MAC address that you are looking for.
  • If you find the MAC address, then immediately consider creating new ACL rules or temporarily blocking the MAC address. In critical cases, you might want to shut down the switch port and physically disconnect the offending device from the network.
  • If you do not find the MAC address, repeat the command on the next switch till you find your device.


While these procedures help you locate a device on the network, they can be very time-consuming, require technical expertise and login access to network switches and routers.

There are two factors that complicate the effort of locating a device on the network. The first one is network complexity and the other directly relates to historical data availability. The above technique heavily relies on ARP caches. Unfortunately, these caches are cleared from time to time. If this data is not available, it is impossible to determine the location of a system. During a crisis, you would want a system that can help you locate issues fast and easy. Being alerted about an IP conflict before users start complaining or a critical application going down, is important to network reliability. To be able to quickly search for a device with its IP address or MAC address and locating it on the network reduces the time and effort involved in troubleshooting and eliminating issues caused by IP conflicts.


Today many IT solutions are offered that aid in effective monitoring and resolution of problems like IP Conflicts. These methods are much faster than manually searching for offending devices. Solutions such as these should offer the ability to:


  • Constantly monitor the network for IP Conflicts by setting up alert mechanisms
  • Quickly search, identify, and verify details of the offending device
  • Locate the offending device and immediately issue remediation measures to prevent further problems


So, what method do you find to be the most effective for troubleshooting IP conflicts? If it’s an automated solution, which do you use?


Patching the Penguin!

Posted by gregwstuart Dec 8, 2014

Let's talk about patching for our good friend Tux the Linux Penguin (if you don't know about Tux, click here.).  How many of us out there work in a Linux heavy environment?  In the past it might have been a much smaller number, however with the emergence of virtualization and the ability to run Linux and Windows VMs on the same hardware, it's become a common occurrence to support both OS platforms.  Today I thought we'd talk about patching techniques and methods specifically related to Linux systems.   Below I've compiled a list of the 3 most common methods I've used for patching for Linux systems.  After reading the list you may have a dozen more way successful and easy to use methods that the ones that I've listed here, I encourage you to share your list with the forum in order to gain the best coverage of methods to use for patching Linux systems.


Open Source Patching Tools

There are a few good open source tools out there for use in patching your Linux systems.  One tool that I've tested with in the past is called Spacewalk.  Spacewalk is used to patch systems that are derivatives of RedHat such as Fedora and CentOS.  Most federal government Linux systems are running Red Hat Enterprise Linux, in this case you would be better off utilizing the Red Hat Satellite suite of tools to manage patches and updates for your Red Hat system.  In the case, your government client or commercial client allows Fedora/CenOS as well as open source tools for managing updates, then Spacewalk is a viable option.  For a decent tutorial and article on Spacewalk and it's capabilities, click here.



YUMmy for my tummy!

No, this has nothing to do with Cheetos, everybody calm down.  Configuring a YUM repository is another good method for managing patches in a Linux environment.  If you have the space, or even if you don't you should make the space to configure a YUM repository.  Once you have this repository created you can then build some of your own scripts in order to pull down and apply them on demand or with a configured schedule.  It's easy to set up a YUM repository, especially when utilizing the createpro tool.  For a great tutorial on setting up a YUM repository, check out this video.



Manual Patching from Vendor Sites

Obviously the last method I'm going to talk about is manual patching.  For the record, I abhor manual patching, it's a long process and it can become quite tedious if you have a large environment.  I will preface this section by stating that if you can test a scripted/automated process for patching and it's successful enough that you can deploy it, the please by all means, go that route.  If you simply don't have the time or aptitude for scripting, then manual patching it is.  The most important thing to remember when you are downloading patches via FTP site, you must ensure that it's a trustworthy site.  With RedHat and SUSE, you're going to get their trusted and secured FTP site to download your patches, however with other distros of Linux such as Ubuntu (Debian based) or CentOS, you're going to have to find a trustworthy mirror site that won't introduce a Trojan to your network.  The major drawback with manual patching is security, unfortunately there are a ton of bad sites out there that will help you introduce malware into your systems and corrupt your network.  Be careful!

That's all folks!  Does any of thing seem familiar to you?  What do you use to patch your Linux systems?  If you've set up an elaborate YUM repository or apt/get repository, please share the love! 

tux.jpg Tux out!!

When implementing a SIEM infrastructure, we’re very careful to inventory all of the possible vectors of attack for our critical systems, but how carefully do we consider the SIEM itself and its logging mechanisms in that list?


For routine intrusions, this isn’t really a consideration. The average individual doesn’t consider the possibility of being watched unless there is physical evidence (security cameras, &c) to remind them, so few steps are taken to hide their activities… if any.


For more serious efforts, someone wearing a black hat is going to do their homework and attempt to mitigate any mechanisms that will provide evidence of their activities. This can range from simple things like…


  • adding a static route on the monitored system to direct log aggregator traffic to a null destination
  • adding an outbound filter on the monitored system or access switch that blocks syslog and SNMP traffic


… to more advanced mechanisms like …


  • installing a filtering tap to block or filter syslog, SNMP and related traffic
  • filtering syslog messages to hide specific activity


Admittedly, these things require administrator-level or physical access to the systems in question, which is likely to trigger an event in the first place, but we also can’t dismiss the idea that some of the most significant security threats originate internally. I also look back to my first post about logging sources and wonder if devices like L2 access switches are being considered as potential vectors. They're not in the routing path, but they can certainly have ACLs applied to them.


I don’t wear a black hat, and I’m certain that the things I can think of are only scratching the surface of possible internal attacks on the SIEM infrastructure.


So, before I keep following this train of thought and start wearing a tin foil hat, let me ask these questions?


Are we adequately securing and monitoring the security system and supporting infrastructure?

If so, what steps are we taking to do so?

How far do we take this?


It's Always The Storage

Posted by strebeld Dec 8, 2014

One of the most difficult things storage admins face on a day-to-day basis is that "It's always the storage's fault". You have Virtualization admins calling you constantly telling you saying there's something with the storage, then you have application owners telling you their apps are slow because of the storage. It's a never ending fight to prove out that it's not a storage issue, which leads to a lot of wasted time in the work week.


Why is it always a storage issue? Could it possibly be a application or compute  issue? Absolutely, but the reason these teams start pointing fingers is because they don't have insights into each others' IT Operations Management tools. In a lot of environments, an application team doesn't have insight into IOPS, latency, and throughput metrics for the storage supporting their application. On the other hand the storage team doesn't have insight into the application metrics such as paging, TTL, memory consumption, etc.


So for example let's look at the below scenario:


Application team starts noticing their database is running slow, so what comes to mind? We better call the storage team, as there must be a storage issue. Storage team looks into the issue; it doesn't find anything unusual and they've verified they haven't made in changes recently. So hours go by, then a couple days go by and they still haven't gotten to the bottom of the issue. Both teams keep finger pointing  and have lost trust in each other and just decide they must need more spindles to increase the performance of the application. Couple more days go by and the Virtualization Admin comes to the application team and says "Do you know you're over allocated memory on your SQL server"? So what happened here? A exorbitant amount of time was spent on troubleshooting the wrong issue. Why were they troubleshooting the wrong issue? This happened because each of these teams had no insight into the other teams' operations management tools. This type of scenario is not uncommon and happens more then we would ever like; as we  caused a disruption to business  and wasted a lot of time that could have been spent on valuable activities.


So the point is, when looking at operations management tools or process, you must ensure that these tools are transparent between multiple infrastructure groups and applications teams. By doing this we can provide better time-to-resolution, which will allows us to provide less impact to the business.


I would love to hear if other users in the community have these types of scenarios and how they have changed their processes to avoid these issues.

Hello thwack! My name is Kong Yang and I recently joined SolarWinds as the Virtualization Head Geek aka vHead Geek. I am super stoked to join my fellow Head Geeks and THE community for IT pros – thwack.


A little background - I earned my BS in EE and MS in ECE from UNR and UIUC respectively. After that, I spent 13-years grinding experience in performance tuning & troubleshooting enterprise stacks, virtualization sizing & capacity planning best practices, tech community management, and tech evangelism at Dell. For the last 14-months, I was the Cloud Practice Leader at Gravitant, a hybrid cloud software startup.


I am passionate about understanding the behavior of the entire application ecosystem – the analytics of the interdependencies as well as qualifying & quantifying the results to the business bottom line. This encompasses: 

  • Virtualization & Cloud technologies.
    • VMware vSphere and vCloud Air.
    • Microsoft Hyper-V and Azure.
    • Amazon Web Services (AWS).
    • IBM SoftLayer.
    • Google Compute Engine (GCE).
  • Application performance.
    • Tier 1 Application performance best practices – bridging what’s done in ideal lab environments, i.e. vendor reports & benchmark results, and real world IT lab environments.

    • Best practices for proactive performance optimization and reactive performance troubleshooting.
  • Hybrid Cloud best practices – on-premises, off-premises, private & public cloud services.
    • How do I efficiently and effectively monitor & optimize my application assets across hybrid cloud ecosystems?

    • What skills do IT pros need to add in order to not only survive but thrive?
  • Containers, hypervisors, cloud native best practices – vehicles for IT application stacks.
  • DevOps conversations. Gene Kim co-wrote an awesome book entitled The Phoenix Project that annunciates DevOps well.
  • Converged Infrastructure technologies.
    • Nutanix, SimpliVity, Scale Computing.
    • VMware EVO family.
    • VSPEX, FlexPod.
    • VCE Vblocks.
    • Microsoft Cloud Platform System.
  • Data analytics – ask the right questions, pivot points, and correctly interpreting & applying results.


Rather than continuing to bore you with my CV, I will leave you with my seven tips on a long and prosperous IT career:


  1. Do what you love and love what you do – be passionate about IT, technologies, and people.
  2. Know your IT and do IT – there is no substitute for experience and know-how.
  3. Don't be afraid to fail. My greatest successes have followed failures. Character is built from failures so always learn & keep moving forward.
  4. Don't strive for perfection. Perfection limits innovation by setting an arbitrary & unnecessary ceiling. Innovation is unbounded!
  5. Build your network of trusted advisors - techie friends, peers, professional mentors, colleagues and resources – know whose info you can trust. Return that trust by continually earning & maintaining their trust.
  6. Strength and honor – policies, processes & people-in-charge change; but your principles should never waver.
  7. Remember those who have helped you grow and those who have stood in your way. Be thankful for both of them.


I look forward to the opportunity to make your acquaintance and earn your trust. I am @KongYang on Twitter and Kong.Yang on thwack.


Let’s close with some fun because IT work can be a real PITA at times. Below is picture of me with two of my friends - @virtualTodd, who is a Sr. Staff Engineer on VMware’s Perf R&D team, and @CiscoServerGeek, who is a Cisco Consulting Systems Engineer. I’m wearing the green & yellow Jester’s hat with the green feather bola and throwing the peace sign while Todd is posing as Captain America and Scott is sporting Wolverine’s claws and the red & black top hat.









Last month I took part in our regular #datachat on Twitter. The topic was “Rolling SQL Server® Upgrades”, and my guest was Argenis Fernandez (blog | @DBArgenis) from SurveyMonkey. I’ve enjoyed doing the #datachat for the past year and I’m excited that they will be continuing in 2015.


The discussion that night was wonderful. Lots of data professionals talking about their experiences with upgrades, both good and bad. And the discussion wasn’t just one-way, either. We took the time to field questions from anyone participating in the #datachat hashtag.


When the night was done, and I reviewed the tweets that following day, I found myself grouping many of the tweets into some common thoughts regarding upgrades. Here’s what I saw:


  1. Have a plan
  2. Test the plan, repeatedly
  3. Have a plan for rollbacks
  4. Understand complexities
  5. Involve the business stakeholders


Let’s break those down a bit.


Have a plan

That goes without saying…or does it? You’d be surprised at the lack of planning when it comes to upgrades. Many, many times I have seen upgrades go wrong and because there is no actual plan in place the upgrade continues forward. This is just madness, really, as changes are now being hastily applied in production in an effort to get things working as expected.


Test the plan, repeatedly

I’ve seen situations where plans were developed, but not thoroughly tested. Sometimes, the plans weren’t tested at all. The end result is similar to not having any plan in place (see above).


Have a plan for rollbacks

Rolling back is something that must be considered. If your company is unwilling to rollback when things go wrong than you might as well not have any plan in place (see above). The idea that the changes MUST be deployed at all costs is the wrong mentality to have. You might think it is driving the business forward, but the reality is that you are letting chaos rule your change deployment process.


Understand complexities

As a server or database administrator you need to understand the complexities of the technologies you are using. The easiest way I have found to get this done is to ask yourself “what if?” at every stage of your plan. What if the scripts fail? What if we upgrade the wrong instance first? What if mirroring breaks while the upgrade is in progress? Answering these questions helps everyone to understand what actions may be needed.


Involve the business stakeholders

I’m kinda surprised at this, but apparently there are shops out there performing upgrades without notifying the business end-users. Perhaps my experience in a regulated industry like financial services means I have some blinders on here, but I cannot imagine that the business is not involved in signing off on the changes when they are completed. You simply must have them involved in some way throughout the upgrade process, if nothing else to serve as reassurance that things are working correctly.


Thanks again to everyone who participated in the #datachat, we always have fun putting these together and I’m looking forward to many more!

One of the roles in my IT career was managing a large IT Operations Management platform. This was probably the most challenging role I have had in IT, as I quickly found out it was a thankless job. The majority of this role was focused on providing forecasting, log management, alert management, problem management, and service-level management. These tasks

Thanks-for-nothing-300x180.jpg all rolled up to what I called "The Thankless Engineer."  This was not because the job wasn't important, but because it needed to satisfy many different technology silos.  IT Operations Management needs to satisfy not only the operations teams, but also needs to meet the requirements and workflows of infrastructure, security, and application teams. This becomes very tricky business when trying to satisfy multiple IT silos' workflows. This role becomes even more of a pain when ops and apps team start receiving false positive alerts, as we all know how much fun it is to be paged in the middle of the night for a non-issue. The biggest issue I see with traditional IT Operations Management, is that it tends to fall on a general operations group to set requirements and needs. This method doesn't always allow a lot of insight into the needs\requirements of infrastructure and application owners.


So is it possible to take such a "thankless" role and convert it into a role that provides "business value"? Does "cloud" change the way we need to think about operations management? Does this thingy called "Devops" change operations management? I would say "yes" to all of these trends and we need to change quickly in how we think about IT Operations Management or we are going to fail to innovate. Efficiency and agility are two key traits that companies need, so they are able to drive innovation. IT Operations Management is a key part of allowing companies to deliver services to their organization and to their customers.


When changing the IT Operations Management process there are a few concepts that I think we should practice, so we can move from "thankless" to "IT Superhero:"


  • Utilize operation-aware tools for application teams
  • Provide application teams insight into the infrastructure
  • Provide infrastructure teams insight into applications
  • Utilize tools that are heterogeneous between private\public cloud infrastructures
  • Utilize application analytics to gain insight into to end-user experience
  • One tool does not rule all


I would love to hear from the community on what patterns they think need to change in IT Operations Management and any thoughts you have on "The Thankless Engineer".





A system administrator’s roles and responsibilities span various dimensions of an IT organization. As a result, keeping tabs on what’s going on in the world of technology, including vendors and their products, latest product releases, end-user experiences, and troubleshooting performance issues are just some of the areas of focus. Over time, system administrators turn into thought leaders due to the technology, industry, and domain experience they gain and use. They pass on their knowledge to colleagues and technology aficionados. Even organizations turn to such experts to hear what they have to say about where IT is headed.


On that note, we at SolarWinds® are glad to have brought together IT gurus, geeks, and fellow system administrators to share their thoughts on system and application performance management. This event took place recently in the form of a #syschat on twitter. For those who didn’t get a chance to tune in, here are some highlights:


Application monitoring: Generally, there is a consensus that application downtime affects business performance. Given that businesses are paranoid about this, why hasn’t the adoption of application monitoring in some organizations taken off like it should? Experts like @standaloneSA and @LawrenceGarvin feel that, “Some of it has to do with need.” Or, as @patcable points out, “Admins don’t know what to monitor, and apps don’t provide the right data.” This is true for various reasons. Often, IT pros are given a mandate by business groups saying that all apps are critical. Therefore, they have to watch apps closely for performance issues. Before answering the “what to monitor” question, IT pros need to ask, “Why should I monitor these apps, are they really that critical?” Knowing the answer to this question eliminates the additional noise, and you can focus only on what to do with the really critical apps and ensure that you’re monitoring the right metrics.


Apps in the cloud: Monitoring the performance of apps in the cloud is, again, not a direct solution to solving a performance problem that can arise from your apps running in the cloud. As more applications are being deployed in the cloud, the level of difficulty in monitoring those apps gets higher. IT pros have to really get down to understanding the “how,” which takes time. For example, @vitroth said, “Ops finds it hard to monitor what engineering doesn't instrument. Give me counters, categories and severities!” When IT pros have difficulties managing apps running on a physical server, the cloud layer is certainly going to be an unfamiliar place, and new complications will arise.


Skills sets for SysAdmins: A lot of buzz is going around about whether SysAdmins will need to have coding skills one day. It may not be mandatory for IT pros to have programing skills, but they might want to develop these skills so they can create and automate tools. While this was only one opinion, others like @patcable suggested that “sysadmins are going to have to become more comfortable writing stuff in some language other than shell.” Learning and understanding your IT infrastructure and environment are essential. IT pros should be willing to learn and learn quickly because ‘things aren’t slowing down.’ Where gaining technical knowledge and skills is concerned, it always helps to “learn a programming language, version control w/git, config management, and keep an eye on Hadoop,” as recommended by @standaloneSA.


What are your thoughts on these topics? Where are you doing with application monitoring in your organization? What difficulties do you see with monitoring apps in the cloud? Do you see DevOps improving the adoption of application monitoring? We’re happy to hear your views and opinions. Follow us on @SWI_Systems to learn more.

The modern day network handles a high volume of data and more applications than ever before. Many of these applications are sensitive to delay and latency. Under such situations, network engineers need QoS to prioritize delay-sensitive business apps over others or to drop non-business traffic.


A QoS implementation method used to classify and mark applications or protocols in the network is Modular Quality of Service (MQC) QoS. With MQC QoS, the traffic you need to prioritize or drop is grouped into a class-map. The class-map is then assigned to a policy-map to perform QoS actions. If you are not familiar with QoS, check out this blog for getting started with MQC QoS.


An option available under MQC QoS to group traffic into a class-map is the “match protocol” statement. This statement allows users to match a desired application or protocol, such as FTP or HTTP, into a class-map and then perform QoS actions on it. Here, the ‘protocol’ key word can refer either to regular protocols like bgp, citirix, dhcp, etc., or Network Based Application Recognition (NBAR) recognized protocols.

What is NBAR?


NBAR is a classification technology from Cisco that can identify and classify applications and protocols, including those that use dynamic port numbers. NBAR goes beyond TCP/UDP port numbers and can inspect the payload to identify a protocol. NBAR classifies applications using the default Packet Description Language Modules (PDLM) available in the IOS.


Cisco also has NBAR2, which is the next generation version of NBAR that enhances the existing NBAR functionality to classify even more applications. It also provides additional classification capabilities, such as field extraction and attributes-based categorization. Cisco routinely releases updated protocol packs for NBAR2, which can be accessed from the NBAR protocol library for new signatures, signature updates, and bug fixes.


Conveniently, Cisco NBAR is supported on most Cisco IOS devices and NBAR2 is supported on devices such as ISR-G2, ASR1K, ASA-CX, and Wireless LAN controllers. And to make it easy, NBAR2 configuration is exactly the same as NBAR.



Many network engineers use Access Control Lists (ACL) for application classification when defining their QoS policies. But sometimes, NBAR is a better choice than ACLs because of NBAR’s ability to automatically recognize applications and protocols which otherwise would have to be defined manually.


NBAR is also easier to configure compared to ACLs and provides collection statistics (if you need them) via an NBAR protocol discovery MIB for each application identified by NBAR.

Finally, the biggest advantage of NBAR is that it can be used for custom protocol identification.

Custom Protocols with NBAR


There are many applications that are designed to use dynamic port numbers. Such a dynamic change in port numbers can make it difficult to identify applications when using regular monitoring tools and sometimes even with NBAR. While NBAR2 does have signatures for various applications, there are chances you might be using an internally built application not defined in NBAR2 which gives a good reason to define your own custom protocol for NBAR.


NBAR custom protocol is quite extensive too. You can define custom protocols to be identified by the NBAR engine based on IP address, port, transport protocol, and even after inspecting into specific bytes of the payload for keywords.


Another is the HTTP advantage. Every network allows ingress and egress HTTP protocol which also makes it the protocol used by many non-business applications, rouge applications, and even malware to gain access into the enterprise. With custom protocol matching, NBAR can classify HTTP traffic based on URL, host, MIME, or even the HTTP header fields. So imagine the advantages: allow HTTP traffic from specific sources and block everything else, stop unwanted HTTP traffic and allow all business applications, block only Youtube, but not Salesforce, or allow only Salesforce, but block everything else and many more permutations.


So, here it is. You do not have to enable NBAR on your device to group with QoS policies unless you need either NBAR protocol discovery or the NBAR custom protocol identification. There are two options that Cisco reference sites mention for enabling custom NBAR, depending on your IOS version. There is ip nbar custom and also ip nbar custom_name transport command. Provided below is the syntax for both:


ip nbar custom name [offset [format value]] [variable field-name field-length] [source | destination] [tcp | udp ] [range start end | port-number]


In the above command, offset refers to the byte location in the payload for inspection. The format and its value can be a term (when used with ascii format), a hexadecimal value (used with hex format), or a decimal value (used with decimal format). For complete information on what each option refers to, check this link:



Another command, mostly referred to with NBAR2 or newer IOS is:


ip nbar custom name transport {tcp | udp} {id id } ip address ip-address | subnet subnet-ip subnet-mask}| ipv6 address {ipv6-address | subnet subnet-ipv6 ipv6-prefix} | port {port-number | range start-range end-range} | direction {any | destination | source}


Check the link below for a reference on the above command:



Once you have your custom protocol captured with NBAR, create a class-map and use the match protocol statement with your custom protocol name to classify the traffic that matches the custom-protocol into a class-map. You can then prioritize, drop or police the traffic based on your requirements.


Well, I hope this information eases your implementation of NBAR. More importantly, I hope you enjoy the many benefits of NBAR and a trouble-free network!

Filter Blog

By date:
By tag: