Corporate Dilemma.jpg



Have you ever faced this problem before?   Whether you are brand new to the industry, technology or a position; or you have been fighting out issues, troubleshooting in the trenches for decades upon decades. This problem often rears its ugly head, often in the form of “We don’t have the budget to train our employees”, or as the comic captioned above so eloquently puts it, “If we train our employees they might leave.”


This is a quandary I personally have experienced in my life time and one I know technician, to engineer, to architect, to all roles within an organization who have equally faced. It is not to say that all organizations suffer this same fate, technology vendors often ones who have their own certification program tend to support education, training, certification. Partner or Resellers often tend to support and embrace education within the workplace.  Even furthermore organizations where they dictate requirements around “Your position must have ‘x’ education/certification in order to advance within the organization”.  It would be awkward to have requirements yet no support of the organization to achieve them.


Yet even given those few scenarios where I have seen them be supportive of training, I could pull a dozen Administrators into a room, System, Network, SysOps and DevOps and probably 2 to 3 out of each of those groups would say their organization supports education, whether financially, providing time, or training and resources to pursue this education.


I can think of more admins than not who spend countless hours educating themselves, searching out and researching problems, constantly staying up on tomorrows technology while supporting the systems of yesterday, even those who regularly read up on and comment on forums like this within the Thwack community.   You are the heroes, the rock stars, those who in spite of an organizations support of your actions you continue to pursue your own evolution.


I’ve published in the past countless resources for others to educate themselves, free or discounted certification programs, even one I’ll mention here where SolarWinds has a free training & certification program to become a SolarWinds Certified Professional  (SCP) Interested in becoming a SolarWinds Certified Professional FOR FREE?!?!?

(I just checked and it seems to still work [Someone let me know if it's not still free!], two years on from when I wrote this blog post, so worth adding to your arsenal if you’re Cert limited and want to know what it feels like to have resources to learn something and then get a cert around it )   But enough about me!


What are some of the ways you the people are able to keep up to date on things, to continually grow and educate yourselves.   Share your experiences of how you cope with organizations who do not support your advancement for fear you may ‘jump ship’ with your new knowledge.   Or if you are some of the few who have a supportive organization whether Company, Partner, Vendor who gives you the fuel to light the fire of your mind, or just to continue to support and maintain your existing environments.   And anywhere in between.

In my last post, we talked about the Business going outside of your I.T. controls and self-provisioning Software as a Service solutions. Most of you were horrified and could identify a number of internal policies that a SaaS solution wouldn’t comply with. You understood that these policies are there to protect the organization’s confidential information or intellectual property, so why is it so hard for the Business to grasp those implications?


I’ve recently just finished reading “The Phoenix Project” by Gene Kim, Kevin Behr and George Spafford. Actually, I couldn’t put it down.


One of the characters is an over-zealous Chief Security Officer who wants to tie I.T. down so tightly, to meet every point in a third-party security audit. In reality, the business actually has processes and procedures in place in the finance department that mean that these controls in the I.T. system are actually unnecessary.


In fact, it made better business sense for a human responsible for the money in the organization to watch out for these red flags instead of coding the computer systems to do it. I’m not saying that’s going to be the case in every situation, and I.T. controls certainly have their place in mitigating organizational risk, but do they have to prevent every possible risk?


The UK government’s Centre for Protection of National Infrastructure (CESG) is now advising that passwords should only be changed ‘on indication or suspicion of compromise’, throwing the old 30 day or 90 day expiry out of the window. While that seems insane, they say regular password changes force people to store them somewhere to remember them or re-use the same base with a minor variation.


Tough I.T. controls or policies can often lead to people inventing workarounds. I can bet you that someone in the organization has given their password out to a co-worker because they were away sick, and it was quicker than asking I.T. to sort out delegated access to their mailbox. Or perhaps they ran late at a meeting and someone needed to unlock their computer.


If you could wave a magic wand, what I.T. policies or controls would you relax to make life easier for you and the end users? Could you do this and retain (or even improve) the security and stability of your systems? Or is this all just crazy talk and you should be locking down your systems even more?


Let me know your thoughts!



Tomorrow is the first day of the Caribbean hurricane season, and that means: named storms, power outages and the need for IT emergency preparedness. And now is a great time to make sure your disaster toolbox is well stocked, before a major calamity strikes. And as a federal IT manager, you always have to be prepared for the unnatural disaster, such as a cyber-attack.


The scary thing is that even the idea of creating a disaster recovery plan has been put on the backburner at many government agencies. In fact, according to a federal IT survey we conducted last year, over 20 percent of respondents said they did not have a disaster preparedness and response plan in place.


We suggest that you make sure you have a plan in place, and follow these best practices:


Continuously monitor the network. Here’s a phrase to remember: “collect once, report to many.” This means installing software that automatically and continuously monitors IT operations and security domains, making it easier for federal IT managers to pinpoint – or even proactively prevent – problems related to network outages and system downtime.


Continuous monitoring can give IT professionals the information needed to detect abnormal behavior much faster than manual processes. This can help federal managers react to these challenges quickly and reduce the potential for extended downtime.


Monitor devices, not just the infrastructure. You need to keep track of all of the devices that impact your network, including desktops, laptops, smartphones and tablets.


For this, consider implementing tools that can track individual devices. First, devise a whitelist of devices acceptable for network access. Then, set up automated alerts that notify you of non-whitelisted devices tapping into the network or any unusual activity. Most of the time, these alerts can be tied directly to specific users. This tactic can be especially helpful in preventing those non-weather-related threats I referred to earlier.


Plan for remote network management. There’s never an opportune time for a disaster, but some occasions are just, well, disastrous. For example, when a hurricane knocks out electricity in your data center and you’re stuck at home thinking, “Yeah, right.” In such cases, you’ll want to make sure you have software that allows you to remotely manage and fix anything that might adversely impact your network.


Remote management technology typically falls into two categories: in-band and out-of-band remote management. Both get the job done for their particular circumstances. And, there are some instances where remote management is insufficient. It’s perfectly adequate when your site loses power, or your network goes offline, but in the face of a major catastrophe – massive floods, for example – you’ll need onsite management. In many cases, however, remote management tools will be more than enough to get you through some rough spots without you having to get to the office.


Each of these best practices, and the technologies associated with them, are like backup generators. You may never need to use them, but when and if you do, you’ll be glad you have them at your disposal.


Find the full article on Government Computer News.


The DevOps Days - Washington, DC event is right around the corner and it turns out I happen to be sitting on 3 tickets. You know what that means!


As I did for the Columbus DevOps Days, I'm going to be giving these babies away 100% free to the first people who post a selfie.


A selfie that is worthy of it's own $55,000 kickstarter. A selfie that shows your creativity, humor, patriotism, and flair.


That's right, I want a



Now I'm not here to tell you what that means. (heck, I just made up that sentence 10 seconds ago). Heck, even if you don't celebrate Memorial day (looking at you, jbiggley), but you can get your Geeky cheeks down to DC next week, those tickets could be yours.


All I'm saying is that if you are able to attend the DevOps Days event in DC on June 8 & 9, and you post a selfie in the comments below, then one of those golden tickets is yours yours YOURS!!


So, have a safe, responsible, geeky, fun, relaxing, non-stressful Memorial Day (or as you call it in other countries, "Monday") and start selfie-ing!

All-flash storage array (AFA) provides two major benefits for the data center. First, AFA enables capacity efficiency with consistent performance and a reduced storage footprint. Second, AFA usually includes a software overlay, which abstracts storage hardware functions into the software. Think software-defined storage. These features include deduplication, or the elimination of duplicate copies of data, data compression, and thin provisioning.


These two qualities combine to form a dynamic duo of awesomeness for infrastructure teams looking to optimize their applications and maximize the utility of their storage arrays. All-flash storage essentially optimizes CPU utilization, so the number of IOs per second (IOPs) per host increases and also reduces the number of host servers needed to service the IOs.


So all that glitters is gold, right? Not so fast. The figure below shows how AFA affects the data ecosystem. In the past, traditional storage performance was measured in terms of the number of IOPs and the most influential variable is the number of spindles. With AFA, spindle count doesn’t matter, so performance centers on average latency, and that latency is influenced by the number of applications that will be piled onto the AFA. This means the bottleneck moves from spindle count to hitting the storage capacity limit as well as running hot in the other subsystems in the overall application stack.

trad vs afa.png

Are you considering AFA in your data center environment? Have you already implemented AFA? What issues have you run into, if any? Let me know in the comments below.


And don't forget to join SolarWinds and Pure Storage as we examine AFA beyond the IOPs to highlight performance essentials and uptime during a live webcast on June 8th at 2PM EST.

This week's edition comes to you on the day of my 19th wedding anniversary, which apparently is the Bronze anniversary, so I'm going to arrange for my wife and I to visit the Basketball Hall of Fame and look at the various bronzed pair of sneakers therein. OK, I'm kidding. We are actually going to spend our anniversary at the middle school band concert which is as romantic as it sounds.


Anyway, here is this week's list of things I find amusing from around the internet...


Google Patented a Sticky Car Hood That Traps Pedestrians Like Flies

Seriously. Human flypaper. Every car will be tinted yellow during pollen season. The future is dumb.


For Microsoft, Its Achilles’ Heel Is Excel

Excel is the dirty secret that powers millions of business worldwide every day. Achilles heel? Sure. But Microsoft keeps pushing it forward with things like PowerBI. Face it, Excel runs the world, and Microsoft runs Excel. Neither are going away anytime soon. Personally I'm looking forward to seeing Excel on Linux just to see the look on adatole's face.


Ditch the data dump

At first I thought they were talking about backups, but the article is about analytics and how your business end-users need data, lots of data, fast and accurate data, more than ever before. This is a trend that is not going away, and if you ignore your business end users need for data analytic tools you will find yourself with a growing Shadow IT problem. Shows You How Much Your ISP is Screwing You

I think I'd like this site better if it wasn't backed by Netflix, who has an axe (or two) to grind with ISPs. If only there was a tool that would help you get similar details...


D.C.’s Metro Catches Fire More Than Four Times A Week

That's a bad thing, right? I have to admit I am a bit surprised this article didn't end with a #ThanksObama.


The Changing American Diet

Nice visualization of how Americans have changed their diets over the past 30+ years. Sadly, bacon isn't at the top of the list.


The Electric Car Revolution Is Finally Starting

I'm filing this under "But this time we REALLY mean it!"


Spring is finally here, which means we can get started on training the next generation of systems administrators:





Despite my protestations to the contrary, and my sincere belief that everything was better when we all used command lines, time marches on and my shouting at kids to get off my lawn routine just is not carrying much water any more. We humans are visual creatures, and visualizing the good and the bad helps us make sense of our environment. It is one reason among many why video conferencing has taken off over the last few years; we get so much more out of a conversation when we can see the other person’s facial expressions along with hearing their voice.


So it goes in the world of network troubleshooting as well. For years we stared at text data streaming across our screens, whether from real-time monitoring applications (home grown or otherwise,) or the output of tools we manually ran like ICMP echoes, path traces, etc. But we were limited in what we could see in the proverbial code. We looked at the matrix, but never experienced it.


Eventually we started to take what we already had, our formerly static network maps, and automate them somewhat to show real-time data. Now our NOC operators, or anyone else who regularly looked at the network, could “see” problems as they were happening. And if a picture is worth a thousand words, then a network map is worth a hell of a lot of static textual printouts. But even though we could now see trouble spots in the network, and bathe the boss in the radiating confidence from a big green button, we still had our feet firmly planted in the old ways.


Half-in, half-out might be the best way to view where we had gotten to by the time our green buttons and network maps became firmly lodged in the standard operating practices of NOC operators everywhere. Graphical maps with flashing buttons, and all the real meat came in the form of text boxes filled with data. Even more hampering was the fact that a lot of our visibility came from legacy tools that had not kept up with the ever-changing reality of our networks. Networks which used to be somewhat flat, open, and largely not prone to asymmetrical routing or very complex application stacks. And god knows the cloud was not even a buzzword at that point, let alone a concept with real solutions behind it.


Fast-forward to today where we do have the application stacks, mass virtualization, the cloud, containers, virtualized networks, and a roughly eighty-percent chance of asymmetrical routing both inside and outside of our networks. It is pretty clear that the legacy tools we have been using are just not cut out for today’s reality. What is really needed is a new set of tools, a new way of looking at our networks and our traffic patterns, which take into account all of these challenges we face in modern networks. We need something to keep up with the speed and agility with which our networks and our response times, and the business, demand.


Our tool sets going forward are certainly going to have to cope with a much more complex and ever changing landscape than at any prior time in our history. We’ll need to be able to not just monitor network paths and look for latency which may or may not be artificial, but also to look much more accurately into the performance of our businesses’ applications quickly, visually, and in a way that allows us to hone in on a particular area or device in our networks. A tool to solve all of these complex problems—to help those of us in the network world look like heroes to our bosses—would be a tool in high demand in the marketplace. One step closer to hearing and seeing no evil, and to not having to speak it as well. Of course, we will still give our boss that shiny green button.

If you've ever read the “Adventures of Sherlock Holmes” by Sir Arthur Conan Doyle, you're probably familiar with some of the plot contrivances. They usually entail a highly complex scheme that involves different machinations, takes twists and turns, and requires the skills of none other than The World's Greatest Detective to solve.


Today's government networks are a bit like a Holmes story. They involve many moving parts, sometimes comprising new and old elements working together. And they are the central nervous system of any IT application or data center infrastructure environment – on premise, hosted, or in the cloud.


That's why it's so important for IT pros to be able to quickly identify and resolve problems. But the very complexity of these networks can often make that task a significant challenge.


When that challenge arises, it requires skills of a Sherlockian nature to unravel the diabolical mystery surrounding the issue. And, as we know, there's only one Sherlock Holmes, just as there's only one person with the skills to uncover where the network problems lie.


That would be you, my dear federal IT professional.


Your job has changed significantly over the past couple of years. Yes, you still have to "keep the lights on," as it were, but now you have even greater responsibilities. You've become a more integral, strategic member of your agency, and your skills have become even more highly valued. You're in charge of the network, the foundation for just about everything that takes place within your organization.


To keep things flowing, you need to get a handle on everything taking place within your network, and the best way is through a holistic network monitoring approach.


Holistic network monitoring requires that all components of the network puzzle – including response time, availability, performance and devices -- are analyzed and accounted for. These days, it also means taking into consideration the many applications that are tied together across wireless, LAN, WAN, and cloud networks, not to mention the resources (such as databases, servers, virtualization, storage) they use to function properly.


Network monitoring and performance optimization solutions help solve the mystery entwined within this diabolical complexity. They can help you identify and pinpoint issues before they become real issues – security threats, such as detection of malware and rogue devices, but also productivity threats, including hiccups that can cause outages and downtime.


And, let's not forget a key perpetrator to poor application performance: network latency. Network monitoring tools allow you to automatically and continuously monitor packets, application traffic, response times and more. Further, they provide you with the ability to respond quickly to potential issues, and the ability to do this is absolutely critical.


As Sherlock said in “A Study in Scarlet,” "there is nothing like first-hand evidence." Network monitoring solutions provide just that – first-hand evidence of issues as they arise, wherever they may take place within the network. As such, implementing a holistic approach to network management can make solving even the biggest IT mysteries elementary.


Find the full article on Defense Systems.

Patrick Hubbard

DevOpsDays Daze

Posted by Patrick Hubbard Employee May 23, 2016

I attended DevOpsDays last week, and have had time to get my head around what’s going on with SolarWinds customers. And this being thwack, (my safe place), I want to brag on you all a bit. Something amazing is happening, and you, the members of thwack, are at the center of it. Sharp IT admins, from very small companies all the way to our largest multinational corporate users, are actually making the move to DevOps.



You’re not doing it because someone like me told you it can help. You’re doing it because it’s helping you get a handle on production, the crazy-complexity you didn’t ask for, and  reducing breakage. It’s letting you test- scratch that- QA changes before you make them. For some of you, it’s even enabling the holy grail of IT we all dream about: business-hours production changes (without the need for maintenance windows) using continuous delivery processes.


Speaking to you on the phone, or meeting with you at tradeshows, gives me hints that the movement is growing. But DevOpsDays Austin gave me a chance to experience a full immersion class in what you’re actually doing in the field. First, and this seems to be the same everywhere, more than half of the attendees were current customers or former users working to bring SolarWinds to their new gig. And while that was a little surprising at the cloud mothership show, (AWS Re:Invent), DevOpsDays engineers didn’t materialize out of thin air into Linux and Scrum. They got the DevOps religion while keeping the lights blinking on all the same gorp that everyone else deals with.


Tuesday I gave a presentation about using our SWIS API to turn Orion into an IT automation platform, which is not something we normally talk about from the stage. SolarWinds’ primary and eternal internal design requirement is to be easy to use, right out of the box. It’s a secret – okay, an open secret – that  Orion is also hugely customizable, and you regularly stun us with how powerfully you’re integrating it into your operations. I concluded my remarks by inviting customers interested in integrating SolarWinds into their DevOps processes to visit our table, and – wow – did you take me up on it.


DevOpsDaysConnie.jpgYou often start with inventory management-driven discovery and automated monitoring, followed by network config change automation using NCM. The third step seems to be split between integration with your helpdesk systems and sophisticated alert suppression and report customization based on customer properties updated via the API. Most amazing is how many of you are doing this integration and customization solely by thwack threads and a little bit of Tim Danner magic.


My takeaway is to ask the community if this is something we should talk about more. Should we surface more of the amazing customizations you advanced users are doing, or will that confuse new users who, like you once did, are starting out of the box and just want to get going quickly?  Let me know what you think. Keep your comments coming in the SolarWinds Lab live chat, or say hello at a live event like SWUG, Cisco Live, VMworld, or Microsoft Ignite. You are advancing the future of IT by embracing your internal programmers. Let me know how I can help.

IT is changing at an accelerating rate with plenty of IT jobs at stake. And yet, doing a job may not be enough in this IT-as-a-Service paradigm that hybrid IT is ushering in. With IT jobs evolving and forking into multiple paths, deciding which path to take and when becomes integral to continuing one's prosperous IT career. Too bad there's not a tool like SolarWinds NPM 12 with NetPath for the IT career path, because it would be cool to visualize one's IT career on a hop-by-hop basis.




The IT career path is one of the hottest topics that I regularly discuss with my industry friends and peers, when we aren't talking about current IT trends. This brings up an important task that many IT professionals often overlook, which is building a competent, trusted network of techie friends, peers, colleagues, and resources. This is one of my golden rules that has served me incredibly well in my career. I make a concerted effort to continually do what I can to earn -- and return -- mutual trust.


In my career, I form and leverage a network of trusted IT advisors, who have helped me progress in my career path. From the Office of the CTO, working on virtualization, to Global Solutions Engineering, working on the first instances of converged infrastructure, every opportunity was presented and taken largely thanks to my trusted network of IT advisors. This extends to my time at a cloud start-up, and even my decision to accept the role of SolarWinds Head Geek. My circle of trusted advisors continue to play a major role in my life and my career, especially with so many opportunities presenting themselves in this hybrid IT world.


So I ask you again: do you have a network of trusted IT advisors helping you advance your career? If not, what are you waiting for? Let me know below in the comment section.


As I mentioned a while ago, I've returned to the world of the convention circuit after a decades-long hiatus. As such, I find I'm able to approach events like Cisco Live and Interop with eyes that are both experienced ("I installed Slack from 5.25 floppies, kid. Your Linux distro isn't going to blow my socks off,") and new (“The last time I was in Las Vegas, The Luxor was the hot new property”).


This means I'm still coming up to speed on tricks of the trade show circuit. Last week I talked about the technology and ideas I learned. But here are some of the things I learned while attending Interop 2016. Feel free to add your own lessons in the comments below.


  • A lot of shows hand you a backpack when you register. While this bag will probably not replace your $40 ThinkGeek Bag of Holding, it is sufficient to carry around a day's worth of snacks, plus the swag you pick up at vendor booths. But some shows don’t offer a bag. After a 20-minute walk from my hotel to the conference, I discovered Interop was the latter kind.
    LESSON: Bring your own bag. Even if you're wrong, you'll have a bag to carry your bag in.
  • What happens in Vegas – especially when it comes to your money – is intended to stay in Vegas. I'm not saying don't have a good time (within the limits of the law and your own moral compass), but remember that everything about Las Vegas is designed to separate you from your hard-earned cash. This is where your hard-won IT pro skepticism can be your superpower. Be smart about your spending. Take Uber instead of cabs. Bulk up on the conference-provided lunch, etc.
    LESSON: As one Uber driver told me, "IT guys come to Vegas with one shirt and a $20 bill and don't change either one all week."
  • Stay hydrated. Between the elevation, the desert, the air conditioning, and the back-to-back schedule, it's easy to forget your basic I/O subroutines. This can lead to headaches, burnout, and fatigue that you don't otherwise need to suffer.
    LESSON: Make sure your bag (see above) always has a bottle of water in it, and take advantage of every break in your schedule to keep it topped off.
  • Be flexible, Part 1. No, I'm not talking about the 8am Yoga & SDN Session. I mean that things happen. Sessions are overbooked, speakers cancel at the last minute, or a topic just isn't as engaging as you thought it would be.
    LESSON: Make sure every scheduled block on your calendar has a Plan B option that will allow you to switch quickly with minimal churn.
  • Be flexible, Part 2. As I said, things happen. While it's easy in hindsight (and sometimes in real-time), to see the mistake, planning one of these events is a herculean task with thousands of moving parts (you being one of them). Remember that the convention organizers are truly doing their best. Of course, you should let staff know about any issues you are having, and be clear, direct, and honest. But griping, bullying, or making your frustration ABUNDANTLY CLEAR is likely not going to help the organizers regroup and find a solution.
    LESSON: Instead of complaining, offer suggestions. In fact, offer to help! That could be as simple as saying, "I see your room is full. If you let me in, I'll Periscope it from the back and people in the hall can watch remotely." They might not take you up on your offer, but your suggestion could give them the idea to run a live video feed to a room next door. (True story.)
  • VPN or bust. I used to be able to say, "You’re going to a tech conference and some savvy person might..." That's no longer the case. Now it is, "You are leaving your home/office network. Anybody could..." You want to make sure you are being smart about your technology.
    LESSON: Make sure every connected device uses a VPN 100% of the time. Keep track of your devices. Don't turn on radios (Bluetooth
    , Wi-Fi, etc.) that you don't need and/or can't protect.
  • Don't bail. You are already in the room, in a comfortable seat, ready to take notes. Just because every other sentence isn't a tweetable gem, or because you feel a little out of your depth (or above it), doesn't mean the session will have nothing to offer. Your best interaction may come from a question you (or one of the other attendees) ask, or a side conversation you strike up with people in your area.
    LESSON: Sticking out a session is almost always a better choice than bailing early.
  • Tune in. Many of us get caught up in the social media frenzy surrounding the conference, and have the urge to tweet out every idea as it occurs to you. Resist that urge. Take notes now – maybe even with pen and paper – and tweet later. A thoughtfully crafted post on social media later is worth 10 half-baked live tweets now.
    LESSON: You aren't working for the Daily Planet. You don't have to scoop the competition.
  • Pre-game. No, I'm not talking about the after-party. I mean make sure you are ready for each session prior to each session. Have your note-taking system (whether that's paper and pen, Evernote, or email), preloaded with the session title, the speaker name, and related info (Twitter handle, etc.), and even a list of potential going-in questions (if you have them). It will save you from scrambling to capture things as they slide off the screen later.
    LESSON: Ten minutes prepping the night before is worth the carpal tunnel you avoid the following day.
  • Yes, you have time for a survey. After a session, you may receive either an electronic or hard copy survey. Trust me, you aren't too busy to fill it out. Without this feedback, organizers and speakers have no way of improving and providing you with a better experience next time.
    LESSON: Take a minute, be thoughtful, be honest, and remember to thank people for their effort, in addition to offering constructive criticism.


Do you have any words of advice for future conference attendees? Do you take issue with anything I’ve said above? I’d love to hear your thoughts! Leave a note in the comments below and let’s talk about it!

I remember a dark time in my life when I didn't know where I was going. I scrambled to find direction but I couldn't understand the way forward. It was like I was lost. Then, that magic moment came. I found the path to my destination. All thanks to GPS.

It's hard to imagine the time before we had satellite navigation systems and very accurate maps that could pinpoint our location. We've come to rely on GPS and the apps that use it quite a bit to find out where we need to go. Gone are the huge road atlases. Replacing them are smart phone and GPS receivers that are worlds better than the paper of yesteryear.

But even GPS has limitations. It can tell you where you are and where you need to be. It can even tell you the best way to get there based on algorithms that find the fastest route. But what if that fastest route isn't so fast any more? Things like road construction and traffic conditions can make the eight-lane super highway slower than a one-lane country road. GPS is infinitely more useful when it is updated with fresh information about the best route to a destination for a given point in time.

Let's use GPS as a metaphor for your network. You likely have very accurate maps of traffic flows inside your network. You can tell which path traffic is going to take at a given time. You can even plan for failure of a primary link. But how do you know that something like this occurred? Can you tell at a moment's notice that something isn't right and you need to take action? Can you figure it out before your users come calling to find out why everything is running slow?

How about the traffic conditions outside your local or data center network? What happens when the links to your branch offices are running suboptimally? Would you know what to say to your provider to get that link running again? Could you draw a bullseye on a map to say this particular node is the problem? That's the kind of information that service providers will bend over backwards to get from you to help meet their SLAs.

This is the kind of solution that we need. We need visibility into the network and how it's behaving instantly. We need to know where the issues are before they become real problems. We need to know how to keep things running smoothly for everyone so every trip down the network is as pleasant as an afternoon trip down the highway.

If you read through this entire post nodding your head and wanting a solution just like this, stayed tuned. My GPS tells me your destination is right around the corner.

I am in Redmond this week to take part in the SQL Server® 2016 Reviewer's Workshop. Microsoft® gathers a handful of folks into a room and we review details of the upcoming release of SQL Server (available June 1st). I'm fortunate to be on the list so I make a point of attending when asked. I'll have more details to share later, but for now let's focus on the things I find amusing from around the internet...


How do you dispose of three Petabytes of disk?

And I thought having to do a few of these for friends and family was a pain, I can't imagine having to destroy this many disks. BTW, this might be a good time to remind everyone that data can never be created or destroyed, but it most certainly can be lost or stolen.


The Top Five Reasons Your Application Will Fail

Not a bad list, but the author forgot to list "crappy code, pushed out in a hurry, because agile is an excuse to be sloppy". No, I'm not bitter.


Audit: IT problems with TSA airport screening equipment persist

"The TSA's lack of server updates and poor oversight caused a plethora of IT security problems". Fortunately no one has any idea how many problems are in a plethora. Also? I know a company that makes tools to help fix such issues.


AWS Discovery Service Aims To Ease Legacy Migration Pain

Something tells me this tool is going to cause more pain when companies start to see just how much work needs to be done to migrate anything.


Bill Gates’ open letter

Wonderful article on how much the software industry has changed over the past 40 years. It will keep changing, too. I see the Cloud as a way for the software industry to change their licensing model from feature driven (Enterprise, Standard, etc.) to one driven by scalability and performance.


How to Reuse Waste Heat from Data Centers Intelligently

While this might sound good to someone, the reality is the majority of companies in the world do not have the luxury of building a data center from scratch, or even renovating existing ones. Still, it's interesting to understand just how much electricity data centers consume, and understand that the power has to come from somewhere.


How Much Does the Xbox One’s “Energy Saving” Mode Really Save?

Since we're talking about power usage, here's a nice example to help us understand how much extra it costs us to keep our Xbox always on. If it seems cheap to you then you'll understand how the cost of a data center may seem cheap to a company.


This week marks the fifth anniversary of my seeing the final launch of Endeavour, so I wanted to share something related to STS-134:



Lastly, if you've been enjoying The Actuator please like, share, and/or comment. Thanks!

In the past few years, there has been a lot of conversation around the “hypervisor becoming a commodity." It has been said that the underlying virtualization engines, whether they be ESXi, Hyper-V, KVM etc. are essentially insignificant, stressing the importance of the management and automation tools that sit on top of them.


These statements do hold some truthfulness: in its basic form, the hypervisor simply runs a virtual machine. As long as end-users have the performance they need, there's nothing else to worry about. In truth, though, the three major hypervisors on the market today (ESXi, Hyper-V, KVM) do this, and they do it well, so I can see how the “hypervisor becoming a commodity” works in these cases. But to SysAdmins, the people managing everything behind the VM, the commoditized hypervisor theory isn't bought quite so easily.


When we think about the word commodity in terms of IT, it’s usually defined as a product or service that is indistinguishable to it’s competitors, except for maybe price. With that said, if the hypervisors were a commodity, we shouldn’t care what hypervisor our applications are running on. We should see no difference between the VMs that are sitting inside an ESXi cluster or a Hyper-V cluster. In fact, in order to be commodity, these VMs should be able to migrate between hypervisors. The fact is that VMs today are not interchangeable between hypervisors, at least not without changing their underlying anatomy. While it is possible to migrate between hypervisors, the fact of the matter is that there is a process that we have to follow, including configurations, disks, etc. The files that make up that VM are all proprietary to the hypervisor they are running on and cannot simply be migrated and run by another hypervisor in their native forms.


Also, we stressed earlier the importance of the management tools that lie above the hypervisor, and how the hypervisor didn’t matter as much as the management tools did. This is partly true. The management and automation tools put in place are the heart of our virtual infrastructures, but the problem is that these management tools often create a divide in the features they support on different hypervisors. Take, for instance, a storage array providing support for VVOLs, VMware’s answer to per-vm-based policy storage provisioning. This is a standard that allows us to completely change the way we deploy storage, eliminating LUNs and making VMs and their disk first-class citizens on their subsequent storage arrays. That said, these are storage arrays that are connected to ESXi hosts, not Hyper-V hosts.  Another example, this time in favor of Microsoft, is in the hybrid cloud space. With Azure stack coming down the pipe, organizations will be able to easily deploy and deliver services from their own data centers, but with azure-like agility. The VMware solution, which is similar, involving vCloud Air and vCloud Connector, is simply not at the same level as Azure when it comes to simplicity, in my opinion. They are two very different feature-sets that are only available on their respective hypervisors.


So with all that, is the hypervisor a commodity?  My take: No! While all the major hypervisors on the market today do one thing – virtualize x86 instructions and provide abstraction to the VMs running on top of them - there are simply two many discrepancies between the compatible 3rd-party tools, features, and products that manage these hypervisors for me to call them commoditized. So I’ll leave you with a few questions. Do you think the hypervisor is a commodity?  When/if the hypervisor fully becomes a commodity, what do you foresee our virtual environments looking like? Single or multi-hypervisor? Looking forward to your comments.

The other day we were discussing the fine points of running an IT Organization and the influence of People, Process and Technology on Systems Management and Administration, and someone brought up one of their experiences.   Management was frustrated at how it would take days for snapshots on their storage and virtualization platform was looking to replace their storage platform to solve this problem.  Clearly as this was a technology problem they sought out a solution which would tackle this and address the technology needs of their organization!  Chances are one or more of us have been in this situation before, so they did the proper thing and looked at the solutions!  Vendors were brought in, solutions spec’d, technical requirements were established and features were vetted.  Every vendor was given the hard and fast requirements of “must be able to take snapshots in seconds and present to the operating system to use in a writable fashion”.  Once all of the options were reviewed, confirmed, demo’d and validated they had made a solid solution!


Months followed as they migrated off of their existing storage platform onto this new platform, the light at the end of the tunnel was there, the panacea to all of their problems was in sight! And finally, they were done. Old storage system was decommissioned and the new storage system was put in place.  Management patted themselves on the back and they went about dealing with their next project, first and foremost on that list was the instantiation of a new Dev environment which would be based off of their production SAP data.   This being a pretty reasonable request they proceeded following their standard protocol to get it stood up, snapshots taken and presented.  Several days later their snapshot was presented as requested to the SAP team in order to stand up this Dev landscape.  And management was up in arms!


What exactly went wrong here? Clearly a technology problem had existed for the organization and a technology solution was delivered to act on those requirements.   Yet had they taken a step back for a moment and looked at the problem for it’s cause and not its symptoms they would have noticed that their internal SLAs and processes are really what was at fault, not the choice of technology.  Don’t get me wrong, some technology truly is at fault and a new technology can solve it, but to say that is the answer to every problem would be untrue, and some issues need to be looked at in the big picture.   To give you the true cause of their problem as their original storage platform COULD have met the requirements; was their ticketing process required multiple sign-offs for Change Advisory Board Management, approval and authorization, and the SLAs given to the storage team involved a 48-hour response time.  In this particular scenario the Storage Admins were actually pretty excited to present the snapshot so instead of waiting until the 48th hour to deliver, the provided it within seconds of the ticket making it into their queue.


Does this story sound familiar to you or your organization? Feel free to share some of your own personal experiences where one aspect of People, Process or Technology was blamed for the lack of agility in an organization and how you (hopefully) were able to overcome it?  I’ll do my best to share some other examples, stories and morals over these coming weeks!


I look forward to hearing your stories!

It was all about the network


In the past, when we thought about IT, we primarily thought about the network. When we couldn’t get email or access the Internet, we’d blame the network. We would talk about network complexity and look at influencers such as the number of devices, the number of routes data could take, or the available bandwidth.


As a result of this thinking, a myriad of monitoring tools were developed to help the network engineer keep an eye on the availability and performance of their networks and they provided basic network monitoring.


It’s now all about the service


Today, federal agencies cannot function without their IT systems being operational. It’s about providing critical services that will improve productivity, efficiency, and accuracy in decision making and mission execution. IT needs to ensure the performance and delivery of the application or service, and understand the application delivery chain.


Advanced monitoring tools for servers, storage, databases, applications, and virtualization are widely available to help diagnose and troubleshoot the performance of these services, but one fact remains: the delivery of these services relies on the performance and availability of the network. And without these critical IT services, the agency’s mission is at risk.


Essential monitoring for today’s complex IT infrastructure


Users expect to be able to connect anywhere and from anything. Add to that, IT needs to manage legacy physical servers, new virtual servers, and cloud infrastructure as well as cloud-based applications and services, and it is easy to see why basic monitoring simply isn’t enough. This growing complexity requires advanced monitoring capabilities that every IT organization should invest in.


Application-aware network performance monitoring provides visibility into the performance of applications and services as a result of network performance by tapping into the data provided by deep packet inspection and analysis.


With proactive capacity forecasting, alerting, and reporting, IT pros can easily plan for future needs, making sure that forecasting is based on dynamic baselines and actual usage instead of guesses.


Intelligent topology-aware alerts with downstream alert suppression will dramatically reduce the noise and accelerate troubleshooting.


Dynamic real-time maps provide a visual representation of a network with performance metrics and link utilization. And with the prevalence of wireless networks, adding wireless network heat maps is an absolute must to understand wireless coverage and ensure that employees can reach critical information wherever they are.


Current and detailed information about the network’s availability and performance should be a top priority for IT pros across the government. However, federal IT pros and the networks that they manage are responsible for delivering services and data that ensure that critical missions around the world are successful and that services are available to all citizens whenever they need them. This is no small task. Each network monitoring technique I discussed provides a wealth of data that federal IT pros can use to detect, diagnose, and resolve network performance problems and outages before they impact missions and services that are vital to the country.


Find the full article on our partner DLT’s blog, TechnicallySpeaking.

The increasing rate of change in applications and its amplitude footprint are causing a lot of consternation within IT organizations. It’s no coincidence, either, since everything revolves around the application, which is innovation personified. It’s the revenue-generating, value-added differentiation, and it's potentially an industry game changer. Think Uber, Facebook, Netflix, Airbnb, Amazon, and Alibaba.


Accordingly, the rate and scale of change are products in the application lifecycle. For instance, applications deployed in a virtualization stack will live for months or years, while applications deployed in a cloud stack will live for hours or weeks. Applications deployed in containers or with microservices will live for microseconds or milliseconds.


From my Interop 2016 DART Framework presentation.


For IT professionals, it’s good to know where job security is. As such, I’ve been keeping monthly tabs of the number of jobs with the key words virtualization, cloud, or (containers AND microservices), on In the past year, since June 2015, the number of jobs with the key word "virtualization" has remained flat with around 2600 job openings. In that same time frame, the number of cloud jobs has increased by over 30% to 8900 job openings, while the number of container/microservices jobs has more than doubled, reflecting almost 600 job openings.


These trends re-affirm the hybrid IT paradigm and the need to deal efficiently and effectively with change in their application ecosystem. Let me know what you think in the comment section below.

The vast majority of my customers are highly virtualized, and quite potentially using Amazon or Azure in a shadow IT kind of approach. Some groups within the organization have deployed workloads into these large public provider spaces. It’s simply due to these groups having the need to gain access to resources and deploy them as rapidly as possible.


Certainly Development and Testing groups have been building systems, and destroying them as testing moves forward toward production. But also, marketing, and other groups may find that the IT team is less than agile in providing these services on a timely basis. Thus, a credit card is swiped, and development occurs. The first indication that these things are taking place is when the bills come.


Often, the best solution is a shared environment in which certain workloads deployed into AWS, Azure or even Softlayer, into peer data centers for a shared, but less public workload provide ideal circumstances for the organization.


Certainly these services are quite valuable to organizations. But, is it secure, or does it potentially expose the company to vulnerabilities of data and/or potentially an entrée into the corporate network? Are there compliance issues? How about the costs? If your organization could provide these services in a way that would satisfy the user community, would that be a more efficient, cost-effective, compliant, and consistent platform?


These are really significant questions. The answers rarely, though, are simple. Today, there are applications, such as Cloudgenera which will analyze the new workload and advise the analyst as to whether any of these issues are significant. It’ll also advise as to current cost models to prove out the costs over time. Having that knowledge prior to deployment could be the difference between agility and vulnerability.


Another issue to be addressed with opening your environment up to a hybrid or public workload is the learning curve of adopting a new paradigm within your IT group. This can be daunting. To address these kinds of shifts in approach, a new world of public ecosystem partners have emerged. These tools, create workload deployment methodologies that bridge the gap between your internal virtual environment, and ease or even facilitate that transition. Tools like Platform9’s create what is essentially a software tool that allows the administrator to decide from within vCenter’s Platform9 panel where to deploy that workload. The deployment of this tool is as simple as downloading an OVF, and deploying it into your vCenter. Platform9 leverages the VMware API’s and the AWS API’s to integrate seamlessly into both worlds. Simple, elegant, and learning curve is minimal.


There are other avenues to be addressed, of course. For example, what about latencies to the community? Are there storage latencies? Network latencies? How about security concerns?


Well, analytics against these workloads as well as those within your virtual environment will no longer be a nice-to-have, but actually a must-have.


Lately, I’ve become particularly enthralled with the sheer level of log detail provided by Splunk. There are many SIEM (Security Information and Event Management) tools out there, but in my experience, no other tool gives the functional use as Splunk does. To be sure, other tools, like SolarWinds provide this level of analytics as well, and do so with aplomb. Splunk, as a data collector is unparalleled, but beyond that, the ability to tailor your dashboards to show you the trends, analytics, and pertinent data against all of that volume of data in a functional at-a-glance method. The tool’s ability to stretch itself to all your workloads, security, thresholds, etc., and to present it in such a way that the monitor panel or dashboard can show you so simply where your issues and anomalies lie.


There is a large OpenSource community of SIEM software as well. Tools such as OSSIM, Snort, OpenVAS and BackTrack are all viable options, but remember, as OpenSource, they rarely provide the robust dashboards that SolarWinds or Splunk do. They will, as OpenSource, cost far less, but may require much more hand-holding, and support will likely be far less functional.


When I was starting out in the pre-sales world, we began talking of the Journey to the Cloud. It became a trope.  We’re still on that journey. The thing is, the ecosystem that surrounds the public cloud is becoming as robust as the ecosystem that exists surrounding standard, on-prem workloads.


I'm flying home after another incredible Interop experience. It’s the perfect time to capture the conversations, ideas, and feelings I experienced this week in the desert, before they fade like the tan lines I got while waiting ten minutes outside for an Uber.


100Gbps (The summary)


If money was no object, I would honestly say that this should be on our MUST ATTEND list every year. Even as a conference newbie who probably missed a ton of opportunities along the way, Interop generated an incredibly diverse set of interactions, stories, and ideas.


Even if money is an object (which happens to be true for most people and organizations), I would still say that making Interop a priority would reap rewards that totally justify the expense.


While vendors are certainly present at Interop, the overall tone is refreshingly agnostic compared to events like Cisco Live, Microsoft Ignite, and VMworld. That means sessions are more focused on the real shortcomings of products and solutions, which allows for conversations about work-arounds, alternatives, and comprehensive solutions.


It's not hard to guess what the big stories were at the show this year: cloud, security, and SDN all had places in the sun. More surprising was the level to which the DevOps narrative bled into conversations that were once considered pure networking.


Fat Pipe (The details)

  1. One example of that DevOps/NetOps transition was a talk by Jason Edelman about using Ansible to perform configuration backups on legacy (meaning SSH-connected, command-line driven) network devices. While it might sound strange to the THWACKâ community, familiar as we are with tools like NCM, it represents an extension of existing skills and technology to teams that are used to using Ansible to deploy and manage cloud- and hybrid-cloud based environments.


  1. There were also a few deep-dive sessions on building and leveraging coding skills, such as Pythonä for network outcomes, mostly in relationship to SDN, NVF, and the like.


This, in turn, led to an ongoing dialogue between speakers and attendees in several sessions on the best ways for network professionals to identify, acquire, and develop new skills that will allow them to make the leap to the new age of networking.


All of this built up to a narrative that was best championed during Martin Casados’ keynote. In one of the best comparisons I've heard to date, Casados compared the current movement from traditional data centers, networking, server, and storage to the evolution from in-car navigation systems to running Waze on your phone.


He pointed out that every layer of the data center that once featured specialized hardware-based solutions are now completely contained at the software layer.


This overall shift is leading to the "rise of the developer,” as Casados put it. This means no silo will be safe from hardware being optimized by a software solution. It also means developers will have more influence over choosing operational frameworks, i.e., the solutions that run the business.


  1. Developers, Casados pointed out, care little for Gartnerâ, or vendor-specific certifications that tie IT pros to specific solutions, or sales relationships, or the vagaries of bureaucratic procurement cycles.


The result is that this shift in software-as-infrastructure has the potential to disrupt everything we used to know about the business of IT. 


Packet Footer (Summary)

Were you at InterOp and saw/heard/discussed something I missed? Do you have a different take than mine? Do you want to hear more on a specific topic? Let me know in the comments below!


All of this and more (I haven't even gotten into the discussions about IoT, SDN, or IPv6 that I was able to participate in), made this one of the best conferences I have attended in a very long time.


It got me even more excited for conferences to come. Next up is CiscoLive in Las Vegas, July 10-14. I hope to see you there!

Interop 2016 kicked off the week with two days of IT summits that covered an amazing range topics, including cloud, containers, and microservices, IT Leadership, and cybersecurity, plus hands-on hacking tutorials. The following three days included the Expo floor opening as well as the session tracks.


Since the IT Leadership Summit was sold out, I decided to join the Dark Reading Cyber Security Summit Day 1. I was only planning on attending Day 1, but the content was so good that I eschewed Container Summit and attended Dark Reading's Day 2. To kick things off, the editors at Dark Reading shared some interesting insights followed by industry thought leaders.


DevOps - SecOps Relational image via @petecheslock and his Austin DevOps Days 2015 presentation.


My top 10 takeaways from the Dark Reading Cybersecurity Summit Days are below.

  1. $71.1B was spent on cybersecurity last year.
  2. Security pros spend most of their time patching legacy stuff and fixing vulnerabilities versus addressing targeted, sophisticated attacks, which happens to be their primary security concern. Number two is phishing and social engineering attacks.
  3. Security is one of the most important priorities and one of the least resourced by IT organizations. Security pros make policy decisions, but non-security people make purchasing decisions.
  4. The weakest link is the end-user, who make up the surface area of vulnerability.
  5. There are not enough skilled security ops people. 500K to 2M more security pros are needed by 2020.
  6. The most talented security pros are hackers.
  7. The average time to detect an intrusion is 6-7 months.
  8. 92% of the intrusions, incidents, and attacks of the past 10 years fall into nine distinct patterns, which can be further reduced down to three.
  9. The cost of a breach is roughly $254 per record for breaches, including 100 records, while $0.09 per record for breaches involving 100M records. Note that the cost is a multi-variable function with many dimensions to factor in.
  10. Only 40% of attacks are malware, so stopping malware is not enough.


Attached below is my DART IT Skills Framework presentation from my Interop IT Leadership speaking session. One of the CIO's SLA is security, so the Cybersecurity Summit was timely.


Let me know what you think of the security insights, as well as my presentation below, in the comment section. I would be happy to present my DART session to our community if there is enough interest, so let me know and I will make it so.

I'm back from Liverpool and SQLBits. It was a brilliant event, as always. If you were there I hope you came by to say hello.


Here's this week's Actuator, filled with things I find amusing from around the Internet...


What is ransomware and how can I protect myself?

You recover from backups. If you don't have backups then you are hosed.


Ivy League economist ethnically profiled, interrogated for doing math on American Airlines flight

To be fair, he is a member of the al-Gebra movement, and was carrying weapons of math instruction.


The Year That Music Died

Wonderful interactive display of the top five songs every day since 1958. Imagine if you had this kind of interaction with your monitoring data, with some machine learning on top.


Apple Stole My Music. No, Seriously.

Since we are talking about music, here's yet another reason why reading the fine print is important.


Apple's Revenue Declines For The First Time In 13 Years

I am certain it has *nothing* to do with the issues inherent in their software and services like Apple Music. None.


The Formula One Approach to Security

This article marks the first time I have seen the phrase "security intelligence" and now I'm thinking it will be one of the next big buzzwords. Still a great read and intro to NetFlow for those that haven't heard about that yet.


Study: Containers Are Great, but Skilled Admins Are Scarce

I wonder how long they spent studying this. I believe it's always been the case that skilled admins are scarce, which is why we have so many accidental admins in the world. There's more tech work available than tech people available.


My secret to avoiding jet lag for events revealed:

NDQE7187 copy.jpg

In the world of networking, you would be hard pressed to find a more pervasive and polarizing topic than that of SDN. The concept of controller-based, policy-driven, and application-focused networks has owned the headlines for several years as network vendors have attempted to create solutions that allow everyone to operate with the optimization and automation as the large Web-scale companies do. The hype started in and around data center networks, but over the past year or so, the focus has sharply shifted to the WAN, for good reason.


In this three-part series we are going to take a look at the challenges of current WAN technologies, what SD-WAN brings to the table, and what some drawbacks may be in pursuing an SD-WAN strategy for your network.


Where Are We Now?


In the first iteration of this series, we’re going to identify and discuss some of the limitations in and around WAN technology in today’s networks. The lists below are certainly not comprehensive, but speak to the general issues faced by network engineers when deploying, maintaining, and troubleshooting enterprise WANs.


Perspective – The core challenge in creating a policy-driven network is perspective. For the most part, routers in today's networks make decisions independent of the state of peer devices. While there certainly are protocols that share network state information (routing protocols being the primary example), actions based off of this exchanged information are exclusively determined through the lens of the router's localized perspective of the environment.


This can cause non-trivial challenges in the coordination of desired traffic behavior, especially for patterns that may not follow the default/standard behavior that a protocol may choose for you. Getting every router to make uniform decisions, each utilizing a different perspective, can be a difficult challenge and add significant complexity depending on the policy trying to be enforced.


Additionally, not every protocol shares every piece of information, so it is entirely possible that one router is making decisions off of considerably different information than what other routers may be using.


Application Awareness - Routing in current generation network is remarkably simple. A router considers whether or not it is aware of the destination prefix, and if so, forwards the packet on to the next hop along the path. Information outside of the destination IP address is not considered when determining path selection.  Deeper inspection of the packet payload is possible on most modern routers, but that information does not play into route selection decisions. Due to this limitation in how we identify forwarding paths, it is incredibly difficult to differentiate routing policy based off of the application traffic being forwarded.


Error Detection/Failover – Error detection and failover in current generation routing protocols is a fairly binary process. Routers exchange information with their neighbors, and if they don’t hear from them in some sort of pre-determined time window, they tear down the neighbor relationship and remove the information learned from that peer. Only at that point will a router choose to take what it considers to be an inferior path. This solution works well for black-out style conditions, but what happens when there is packet loss or significant jitter on the link? The answer is that current routing protocols do not take these conditions into consideration when choosing an optimal path. It is entirely possible for a link to have 10% packet loss, which significantly impact voice calls, and have the router plug along like everything is okay since it never loses connection with its neighbor long enough to tear down the connection and choose an alternate path. Meanwhile, a perfectly suitable alternative may be sitting idle, providing no value to the organization.


Load Balancing/Efficiency - Also inherent in the way routing protocols choose links is the fact that all protocols are looking to identify the single best path (or paths, if they are equal cost) and make it active, leaving all other paths passive until the active link(s) fail. EIGRP could be considered an exception to this rule as it allows for unequal cost load balancing, but even that is less than ideal since it won’t detect brown-out conditions on a primary link and move all traffic to the secondary. This means that organizations have to purchase far more bandwidth than necessary to ensure each link, passive or active, has the ability to support all traffic at any point. Since routing protocols do not have the ability to load balance based off of application characteristics, load balancing and failover is an all or nothing proposition.


As stated previously, the above list is just a quick glance at some of the challenges faced in designing and managing the WAN in today’s enterprise network.  In the second part of this series we are going to take a look at what SD-WAN does that helps remediate many of the above challenges.  Also keep your eyes peeled for Part 3, which will close out the series by identifying some potential challenges surrounding SD-WAN solutions, and some final thoughts on how you might take your next step to improving your enterprise’s WAN.

Did the title of this blog entry scare you and make you think, "Why in the world would I do that?"  If so, then there is no need to read further.  The point of this blog post is not to tell you why you should be doing so, only why some have chosen to do so, and what issues they find themselves dealing with after having done so. If you still think that the idea of moving any of your data center to the cloud is simply ludicrous, you may go back to your regularly scheduled programming.


If the demand for on your company's IT resources is consistent throughout the week and year, then the biggest reason for moving to the cloud really doesn't apply to you.  Consider how Amazon Web Services (AWS) got built. They discovered that most of the demand on their company's IT resources came from a few days of the year: Black Friday, Mother's Day, Christmas, etc. The rest of the year, the bulk of their IT resources were going unused. They asked themselves whether there might be other people who had the need for their IT resources when they weren't using them, and AWS was born. It has, of course, grown well beyond the simple desire to sell excess capacity into one of their most profitable business lines.

If your company's IT systems have a demand curve like that, then the public cloud might be for you. Why pay for servers to sit there for an entire year when you can rent them when demand is high and give them back when demand is low?  In fact, some companies even rent extra computing capacity by the hour when the demand is high. Imagine being able to scale the capabilities of your data center within minutes in order to meet the increased demand created by a Slashdot article or a viral video. This is the reason to go to the cloud. Then, once the demand goes down, simply give that capacity back.


The challenge for IT people looking to replace portions of their data center with the public cloud is automating it, and making sure that what they automate fits within the budget.  While a public cloud vendor can typically scale to whatever demand level you find yourself with, the bill will automatically scale as well. Unless the huge spike in demand is directly related to a huge spike in sales, your CFO might not take kindly to an enormous bill when your video goes viral. Make sure you plan for that ahead of time so you don't end up having to pay a huge and unexpected cost. Perhaps the decision will be made to just let things get slow for a while. After all, that ends up in the news, too. And if you believe all publicity is good publicity, then maybe it wouldn't be such a bad thing.


There are plenty of companies that have replaced all their data centers with the cloud. Netflix is perhaps the most famous company that runs their entire infrastructure in AWS.  But they argue that the constant changes in demand for their videos make them a perfect match for such a setup. Make sure the way your customers use your services is consistent with the way the public cloud works, and make sure that your CFO is ready for the bill if and when it happens. That's how to move things into the cloud.

As an avid cloud user, I'm always amused by people who suggest that moving things to the cloud means you don't have to manage them.  And, of course, when I say "amused," what I really mean is I feel lnigo Montoya in Princess Bride.  "You keep using that word.  I do not think it means what you think it means."


Why do I say this?  Because I am an avid cloud user and I manage my cloud assets all the time.  So where do we get this idea?  I'd say it starts with the idea that you don't have to manage the hardware.  Push a few buttons and a "server" magically appears in your web browser.  This is so much easier than creating a real server, which actually works similarly these days.  Push a few buttons on the right web site, and an actual server shows up at your front door in a few days.  All you have to do is plug it in, load the appropriate OS and application stack and you're ready to go.  The cloud VM is a little bit easier.  It appears in minutes and comes preloaded with the OS and application stack that you specified during the build process.


I think what most people think when they say their cloud resources don't need to be managed is that they don't have to worry about the hardware.  They know that the VM is running on highly resilient hardware that is being managed for them.  They don't have to worry about a failed disk drive, network controller, PCI card, etc.  It just manages itself. But anyone who thinks this is all that needs to be managed for a server must never have actually managed any servers.


There are all sorts of things that must be managed on a server that have nothing to do with hardware.  What about the filesystems?  When you create the VM, you create it with a volume of a certain size.  You need to make sure that volume doesn't fill up and take your server down with it.  You need to monitor the things that would fill it up for no reason, such as web logs, error logs, database transaction logs, etc.  These need to be monitored and managed.  Speaking of logs, what about those error logs?  Is anyone looking at them? Are they scanning them for errors that need to be addressed?  Somebody should be, of course.


Another thing that can fill up a filesystem is an excessive number of snaphshots.  They need to be managed as well.  Older snapshots need to be deleted and certain snapshots may need to kept for longer periods of time or archived off to different medium. Snapshots do not manage themselves.


What about my favorite topic of backups?  Is that VM getting backed up?  Does it need to be?  If you configured it to be backed up, is it backing up?  Is anyone looking at those error logs?  One of the biggest challenges is figuring out when a backup didn't run. It's relatively easy to figure out when a backup ran but failed; however, if someone configured the backup to not run at all, there's no log of that.  Is someone looking for backups that just magically disappeared?

Suffice it to say that the cloud doesn't remove the need for management.  It just moves it to a different place.  Some of these things may be able to be offloaded to the cloud vendor, of course.  But even if that's the case someone needs to watch the watcher.  There is no such thing as free lunch and there is no such thing as a server that manages itself.

Network variation is hurting us

Network devices like switches, routers, firewalls and load-balancers ship with many powerful features. These features can be configured by each engineer to fit the unique needs of every network. This flexibility is extremely useful and, in many ways, it's what makes networking cool. But there comes a point at which this flexibility starts to backfire and become a source of pain for network engineers.

Variation creeps up on you.  It can start with harmless requests for some non-standard connectivity, but I've seen those requests grow to the point where servers were plugging straight into the network core routers.  In time, these one-off solutions start to accumulate and you can lose sight of what the network ‘should’ look like.  Every part of the network becomes its own special snowflake.

I’m not judging here. I've managed quite a few networks and all of them end up with high-degrees of variation and technical debt. In fact, it takes considerable effort to fight the storm of snowflakes. But if you want a stable and useful network you need to drive out variation. Of course you still need to meet the demands of the business, but only up to a point. If you're too flexible you will end up hurting your business by creating a brittle network which cannot handle changes.

Your network becomes easier and faster to deploy, monitor, map, audit, understand and fix if you limit your network to a subset of standard components. Of course there are great monitoring tools to help you manage messy networks, but you’ll get greater value from your tools when you point them towards a simple structured network.

What’s so bad about variety?

Before we can start simplifying our networks we have to see the value in driving out that variability. Here are some thoughts on how highly variable (or heterogeneous) networks can make our lives harder as network engineers:

  • Change control - Making safe network change is extremely difficult without standard topologies or configurations. Making a change safely requires a deep understanding of the current traffic flows - and this will take a lot of time. Documentation makes this easier, but a simple standardized topology is best. The most frustrating thing is that when you do eventually cause an outage, the lessons learned from your failed change cannot be applied to other dissimilar parts of your network.
  • Discovery time can be high. How do you learn the topology of your network in advance of problems occurring? A topology mapping tool can be really helpful to reduce the pain here, but most people have just an outdated visio diagram to rely on.
  • Operations can be a nightmare in snowflake networks.  Every problem will be a new one, but probably one that could have been avoided - it's likely that you'll go slowly mad. Often you'll start troubleshooting a problem and then realize, ‘oh yeah, I caused this outage with the shortcut I took last week. Oops’.  By the way, it’s a really good sign when you start to see the same problems repeatedly. Operations should be boring, It means you can re-orient your Ops time towards 80/20 analysis of issues, rather that spending your days firefighting.
  • Stagnation -  You won't be able to improve your network until you simplify and standardize your network. Runbooks are fantastic tools for your Ops and Deployment teams, but the runbook will be useless if the steps are different for every switch in your network. Think about documenting a simple task...if network Y do step1, except if feature Z enabled then do something else, except if it’s raining or if it's a leap year.  You get the message.
  • No-Automation - If your process it too complicated to capture in a runbook you shouldn't automate it. Simplify your network, then your process, then automate.



Network variation can be a real source of pain for us engineers. In this post we looked at the pain it causes and why we need to simplify and standardize our networks. In Part 2 we'll look at the root causes for these complicated, heterogenous networks and how we can begin tackling the problem.

Data center consolidations have been a priority for years, with the objectives of combatting server sprawl, centralizing and standardizing storage, and streamlining application management and establishing shared services across multiple agencies.


But, consolidation has created challenges for federal IT professionals, including:

  • Managing the consolidation without an increase in IT staff
  • Adapting to new best practices like shared services and cloud computing
  • Shifting focus to optimizing IT through more efficient computing platforms


Whether agencies have finished their consolidation or not, federal IT pros have definitely felt the impact of the change. But how do the remaining administrators manage the growing infrastructure and issues while meeting SLAs?


One way data center administrators can stay on top of all the change is to modernize their monitoring system, with the objective of improved visibility, and troubleshooting.


The Value of Implementing Holistic Monitoring


A holistic approach to monitoring provides visibility into how each individual component is running and impacting the environment as a whole. It can bridge the gap that exists between the IT team and the program groups through connected visibility.



Who is responsible for what? Shared services can be hard to navigate.


Even though the data center team now owns the infrastructure and application operations, the application owners still need to ensure application performance. Both teams require visibility into performance with a single point of truth, which streamlines communication and eases the transition to shared services.


Application Performance

Application performance is critical to executing agency missions, so when users provide feedback that an application is slow, it is up to data center administrators to find the problem and fix it—or escalate it—quickly.


Individually checking each component of the IT infrastructure—the application, servers, storage, database or a virtualized environment—can be tedious, time consuming and difficult. End-to-end visibility into how each component is performing, allows for quick identification and remediation of the issues.



Virtualization can introduce complexities and management challenges. In a virtual environment, virtual machines can be cloned and moved around so easily and often that the impact on the entire environment can be missed, especially in a dynamically changing infrastructure.


Consolidated monitoring and comprehensive awareness of the end-to-end virtual environment is the answer to effective change management in the virtualized environment.



Efficiency was a key driver behind consolidations, but this can seem near impossible for the remaining data centers. But with integrated monitoring that provides end-to-end visibility, data center administrators can troubleshoot issues in seconds instead of hours or days and proactively manage their IT. With the right tools, administrators can provide end-users with high service levels.


Consolidation is part of the new reality for data center administrators. Holistic, integrated monitoring and management of the dynamically changing IT environment will help to refine the new responsibilities of being a shared service, ensure mission-critical applications are optimized and improve visibility into virtualized environments.


Find the full article on Signal.

Practitioners in nearly every technology field are facing revolutionary changes in the way systems and networks are built. Change, by itself, really isn't all that interesting. Those among us who have been doing this a while will recognize that technological change is one of the few reliable constants. What is interesting, however, is how things are changing.


Architects, engineers, and the vendors that produce gear for them have simply fallen in love with the concept of abstraction. The abstraction flood gates have metaphorically flown open following the meteoric rise of the virtual machine in enterprise networks. As an industry, we have watched the abstraction of the operating system -- from the hardware it lives on -- give us an amazing amount of flexibility in the way we deploy and manage our systems.  Now that the industry has fully embraced the concept of abstraction, we aim to implement it everywhere.


Breaking away from monolithic stack architecture


If we take a look at systems specifically, it used to be that the hardware, the operating system, and the application all existed as one logical entity.  If it was a large application, we might have components of the application split out across multiple hardware/OS combos, but generally speaking the stack was a unit. That single unit was something we could easily recognize and monitor as a whole. SNMP, while it has its limitations, has done a decent job of allowing operators to query the state of everything in that single stack.


Virtualization changed the game a bit as we decoupled the OS/Application from the hardware. While it may not have been the most efficient way of doing it, we could still monitor the VM like we used to when it was coupled with the hardware.  This is because we hadn't really changed the architecture.  Abstraction gave us some significant flexibility but our applications still relied on the same components, arranged in a similar pattern to the bare-metal stacks we started with.  The difference is that we now had two unique units where information collection was required, the hardware remained as it always had and the OS/Application became a secondary monitoring target.  It took a little more configuration but it didn't change the nature of the way we monitored the systems.


Cloud architecture changes everything


Then came the concept of cloud infrastructure. With it, developers began embracing the elastic nature of the cloud and started building their products to take advantage of it. Rather than sizing an application stack based off of guesstimates of the anticipated peak load, it can now be sized minimally and scaled out horizontally when needed by adding additional instances. Previously, just a handful of systems would have handled peak loads. Now those numbers could be dozens, or even hundreds of dynamically built systems scaled out based on demand. As the industry moves in this direction, our traditional means of monitoring simply do not provide enough information to let us know if our application is performing as expected.


The networking story is similar in a lot of ways. While networking has generally been resistant to change over the past couple of decades, the need for dynamic/elastic infrastructure is forcing networks to take several evolutionary steps rather quickly.  In order to support the cloud models that application developers have embraced, the networks of tomorrow will be built with application awareness, self-programmability, and moment-in-time best path selection as core components.


Much like in the systems world, abstraction is one of the primary keys to achieving this flexibility. Whether the new model of networks is built upon new protocols, or overlays of existing infrastructure, the traditional way of statically configuring networks is coming to an end. Rather than having statically assigned primary, secondary, and tertiary paths, networks will balance traffic based off of business policy, link performance, and application awareness. Fault awareness will be built in, and traffic flows will be dynamically routed around trouble points in the network. Knowing the status of the actual links themselves will become less important, much like physical hardware that applications use. Understanding network performance will require understanding the actual performance of the packet flows that are utilizing the infrastructure.


At the heart of the matter, the end goal appears to be ephemeral state of both network path selection as well as systems architecture.


So how does this change monitoring?


Abstraction inherently makes application and network performance harder to analyze. In the past, we could monitor hardware state, network link performance, CPU, memory, disk latency, logs, etc. and come up with a fairly accurate picture of what was going on with the applications using those resources. Distributed architectures negate the correlation between a single piece of underlying infrastructure and the applications that use it.  Instead, synthetic application transactions and real-time performance data will need to be used to determine what application performance really looks like. Telemetry is a necessary component for monitoring next generation system and network architectures.


Does this mean that SNMP is going away?


While many practitioners wouldn't exactly shed a tear if they never needed to touch SNMP again, the answer is no. We still will have a need to monitor the underlying infrastructure even though it no longer gives us the holistic view that it once did. The widespread use of SNMP as the mechanism for monitoring infrastructure means it will remain a component of monitoring strategies for some time to come. Next generation monitoring systems will need to integrate the traditional SNMP methodologies with deeper levels of real-time application testing and awareness to ensure operators can remain aware of the environments they are responsible for managing.

“With me, everything turns into mathematics.”

– Rene Descartes



Ransomware is not new. Beginning as misleading ads, and warnings that your computer is infected, Symantec traces ransomware deployments (including crypto lockers) back to 2005.[1] Early crypto locking extortion scams were not that successful. However, current business owners face increasing risk of cyber extortion, and crypto locking ransomware has been on the rise over the past two years. It has become so prevalent that the FBI issued a warning highlighting the increasing threat to businesses.[2]  Given the increasing velocity of deployment, the ease of infiltration, and the dire consequences of infection, we believe ransonware is a significant risk to businesses.


There are two primary factors contributing to the rise of ransomware:


  1. More real-time business data has been digitized, especially in health care and loan processing, which has increased the available pool of targets.
  2. Anonymous payment systems make monetizing ransomware easy, efficient, and risk-free for cyber criminals.


Observed samples of ransomware in 2014 totaled almost 9 million, yet in Q2 2015 alone, samples hit 4 million. This run rate is doubling year over year. Ransomware, unlike many vulnerabilities and malware, does not require administrative privileges, as its purpose is to encrypt the files useful to the end-user. Furthermore, the same types of scams and hooks that make ransomware successful on Windows are being deployed against other platform targets. 

What systems are at risk?

Cyber criminals have built ransomware kits that target a wide range of systems, including Windows, Linux, Android, and recently (March 2016) Mac OS. While the majority of ransomware successes are still on Windows, users should be alert to the increasing risk of ransomware on Android, which is on the rise.  Android ransomware could become particularly troubling in dedicated devices used in health care, manufacturing, and retail.

How does ransomware behave?

On Windows, ransomware works to impair your computer in one of three common ways:


  1. Encrypt your files (Locky and Cerber).
  2. Prevent you from accessing in certain apps (FakeBsod – locks browser).[3]
  3. Restrict access to the operating system itself (Revton – locks PC).


On Android, ransomware falls generally into one of two types:


  1. 1. Screen locking.
  2. 2. File encrypting.


Unfortunately for Android users, both forms of ransomware are increasingly seen in the wild. The chronology of Android ransomware follows a similar pattern to the Windows chronology; it begins with a fake antivirus, then fake police demands, followed by full cryptographic file locking. Versions of Simplocker malware on Android encrypt the SIM card; versions of Lockerpin acquire administrative privileges and prevent access to the device.[4]


On Linux, the most common target is web servers. The ransomware Linux.Encoder.1 has been reported in the wild since November 2015. This variant does require root privileges, and it walks the web server file directory structure as well as nginx, /root and others.[5]  The reported ransom for this variant is one bitcoin.


Fortunately for Mac OS users, the first reported ransomware that encrypts Mac OS files has not been widely deployed or successful. With only 6500 downloads identified, Mac OS ransomware is a drop in the proverbial bucket.

What organizations are likely targets?

As mentioned above, real-time access needs for critical data create the easiest targets for ransomware. While no individual or business is free from worry, public service (police stations) and health care (hospitals) have been successfully targeted in the last 12 months. We can infer that other businesses, such as title companies, car dealerships, and other loan processors are likely targets as well. The criticality of data in these organizations is intuitive, and most cyber criminals keep the ransom amount “reasonable” (around $10,000). This amount is low enough that it appears to be economically rational for businesses that need to restore access quickly. Additionally, setting up a bitcoin wallet is relatively straightforward, with a number of YouTube how-to videos readily accessible. For an individual system, or business with less real-time critical data, the price is usually a single bitcoin.  


What defensive steps can you take?

Prevention is, of course, the goal. However, between the ranges of infection vectors (SMS on Android, browser exploitation, spam malware, and exploit kits), and the volume of ransomware samples observed in the wild, the risk of initial infection of ransomware is difficult to eliminate. Therefore a combination of preventative tactics as well as planning for incident remediation is the best risk-mitigating course of action.


Preventative Actions


  1. Educate your users on the risk. Users who process a large number of inbound attachments and emails, such as accounts receivable processors, account managers, and marketing personnel, are particularly vulnerable.
  2. Maintain patches on desktop users’ systems, as well as critical data servers.  Desktop users are often updated in a haphazard manner, or not at all, which makes them vulnerable to exploitation.
  3. Reduce or eliminate automatic mapping of drives. Recommended by thwack community member Stephen Black, eliminating automatic drive mapping means the ransomware won’t be able to walk your network from one initial infected system.
  4. Monitor for infections to prevent contagion.  If you use LEM, there is a monitoring rule you can download and use.


Incident remediation

If you find yourself in the unfortunate situation where a system has become locked with ransomware, you have limited options. While some researchers have been successful reverse engineering ransomware, the ability to do so takes time and depends on vulnerabilities in the ransomware code itself. If you were lucky enough to be hit by one of these old variants, you can use the techniques the researchers have published.[6]  But, realistically, for most situations there are only two real options:


  1. Restore from backup.
  2. Pay the ransom.


If your business fits in the class of organizations currently being targeted, or shares characteristics with organizations being targeted, it would be prudent to actually test your ability to restore from your backup media, whether that is a cloud backup, local backup, or offsite backup. Businesses with Android users are encouraged to explore mobile device backup, or at least educate your users on their options.[7] Unfortunately, the only time the restore from backup process is usually tested or validated is during an audit, or test of a business continuity or disaster recovery plan, which may be too late.


Do you have a favorite way to use LEM to look for malware? 

When did you last test your business continuity plan? 

Know anyone who has successfully recovered files after a ransomware attack?

Share your stories so we can all benefit.

[1] Symantec, Internet Security Threat Report, 2016 pg. 58







I'm on my way to Liverpool for SQLBits. So if you are reading this and find yourself near Liverpool this week, head on over to SQLBits and say hello.


Things I find amusing from around the Internet...


Star Wars: A Bad Lip Reading

In case you haven't seen this yet, I figured this was a good way to celebrate May the Fourth. Also? I want a wooden snowman.


Nearly All of Your ATMs are Insecure

Not sure what I find more amusing here, the fact that 'ATM not secure' is seen as something new or the fact that it's a Russian firm cited in the report.


Automating Change With Help From Fibonacci

As a math geek and IT pro, this article is full of so much win that I want to place it inside a Golden Rectangle and place it on top of a Klein Bottle.


Scientists Have Figured Out How To Put Electronics Inside Your Body

"We've always dreamed of infusing our bodies with technology". Um, no, we haven't. And as if building robots wasn't enough, now we have this to help usher in the singularity. (Darth Vader)


The advent of the citizen developer

Otherwise known as "shadow IT", and not something new. End users will turn to whatever tools they can find in an effort to do their jobs better than the day before.


FBI Says It Won't Disclose How It Accessed Locked iPhone

Because they don't know what they did, kinda like how I can never explain to my mother what I did to her computer to get her email working again.


Digital Genies

How can we ensure safety for humans when the robots rise up? Fascinating post here about how AI could go horribly wrong if it *thinks* it has the right data, but doesn't.



All industry changing trends have an uncomfortable period where the benefit to adoption is understood but real world use is often exaggerated. The way the modern use of containers fundamentally changes the paradigm with which operations folks run their data centers means that the case for adoption needs to be extremely compelling before anyone will move forward.


Also, since change is hard, major industry-shifting trends come with lots of pushback from people who have built a career on the technology that is being changed, disrupted, or even displaced. In the case of containers, there exists a sizeable assembly of naysayers and not shockingly, they generally come from an Operations (and specifically virtualization) background.


To that end, I decided to dig deep into a handful of case studies and interview industry acquaintances about their experiences with containers in production. Making the case that containers can be handy for 2 developers on their laptops is easy; I was curious to find out what happens when companies adopt a container-based data center practice throughout the entire software lifecycle and at substantial scale. Here is what I found.

It’s Getting Better

One of the major challenges many people reported with containerization in the early stages with relation to products like Docker Engine and rkt was that at scale, it was very difficult to manage. Natively, these tools didn’t include any sort of single pane of glass management or higher level orchestration.


As the container paradigm has matured, tools like Docker Swarm, Kubernetes, and Cloud Foundry have helped adopters make sense of what’s happening across their entire environment and begin to more successfully automate and orchestrate the entire software development lifecycle.

Small Businesses Are Last, As Usual

As with other pivotal data center technologies like server virtualization, small businesses are sometimes least likely to see a valuable return by jumping on the bandwagon. Because of their small data center footprint, they don’t see the dramatic impact to the bottom line that enterprises do when making a change to the way their data center operates. While that’s obviously not always the case, my discussions with colleagues in the field and research into case studies seems to indicate that just like all the big shifts before it, full-steam-ahead containerization is primarily for the data centers of scale, at least for now.


One way this might change in the future is software distribution by manufacturers in a container format. While small businesses might not need to leverage containers to accelerate their software development practice, they may start getting forced into containerizion by the software manufacturers they deal with. Just like many, many ISVs today deliver their offering in an OVA format to be deploy into a virtualized environment, we may begin to see lots of containers delivered as the platform for running a particular software offering.

Containers are Here to Stay

As much as the naysayers and conservative IT veterans speculate about containers being mostly hype, the anecdotal evidence I’ve collected seems to indicate that many organization have indeed seen dramatic improvement in their operations, limited defects, and ultimately seen the impact to their bottom line.


I try to be very careful about buying in to hype, but it doesn’t look like containers are slowing down any time soon. The ecosystem that is developing around the paradigm is quite substantial, and as a part of the overall DevOps methodology trend, I see container-based technologies enabling the overall vision as much as any other sort of technology. It will be interesting to see how the data center landscape looks with regard to containers in 2020; will it be like the difference between virtualization in 2005 and 2015?

Today’s users demand access to easy-to-use applications even though the IT landscape has become a complex mishmash of end-user devices, connectivity methods, and siloed IT organizations, some of which contain further siloes for applications, databases and back-end storage.


These multiple tiers of complexity, combined with end users’ increasing dependency on accessible applications, creates significant difficulties for IT professionals across the globe, but especially in government agencies, with all their regulations and policies.


Figuring out how to maintain application performance in these complex environments has become a key objective for federal IT staff. Here are five methods for preserving a high-performance app stack:


1. Simplifying application stack management


A significant part of the effort lies in simplifying management of the application stack (app stack) itself, which includes the application, middleware and the extended infrastructure the application requires for performance. Think about the entire environment.


Rather than looking at networks, storage, servers and clients as distinct silos of individual responsibility, federal IT departments can reduce the complexity of the sometimes conflicting information they use to manage these silos. The simplification lies in the practice of monitoring all applications and the resources they use as a single application ecosystem, recognizing the relationships.


Working through the entire app stack lets federal IT pros understand where performance is degraded and improves troubleshooting.


2. Monitoring servers


Server monitoring is a significant part of managing the app stack. Servers are the engines that provide application services to the end user. And applications need sufficient CPU cycles, memory, storage I/O and network bandwidth to work effectively.


Monitoring current server conditions and analyzing historical usage trends is the key to ensuring problems are resolved rapidly or prevented.


3. Monitoring virtualization


Monitoring the virtualization infrastructure is key and Federal IT pros should monitor how and when VMs move from one host or cluster to another as well as the status of shared hosts, networks and storage resources, especially if they are over-subscribed.


Federal IT pros should prioritize how individual VMs on a host are working together, whether resource contention is occurring on a host or a cluster, and what applications are causing those conflicts. In addition, federal IT pros should keep tabs on network latency.


4. Monitoring user devices


Today’s users are running applications on all types of devices with a range of capabilities and connectivity options, all of which are significant factors in maintaining a healthy app ecosystem.


5. Bring it together with alerting


The last component is alerting, which notifies technicians when there is an issue with a component of the app stack prior to the first end-user noticing the problem.


The ability to set proactive performance baselines for devices and applications to signal when app stack issues arise helps both in day-to-day monitoring and future capacity planning.


In short, it’s critical for federal IT pros to be aware of, monitor and set up notifications across the app stack – from back end storage, through application services and processes to front-end users – and provide high performance from a holistic perspective.


Find the full article on Government Computer News.

At the recent AWS Summit in Australia, a case was presented that had most I.T. folks in shock. A business user had gone outside of the I.T. controls of his organization to test a business capability in the Cloud. The organization was Australia’s largest provider of electricity and LPG gas and this guy was on stage as a hero.


In the post session write-up, the media was quick to clarify that only dummy data was used and no customer data was at risk. The person who initiated this didn’t want to go through the long and tedious process of an I.T. proof of concept just to run some data analytics. His heart was in the right place, with a drive to improve their business, but I.T. was getting in the way. You can read an article about it here.


So why did the rest of us have a heart attack at this news? Well, not only was AWS not on the organization’s approved vendors list, access to the platform had actually been blocked from the corporate network. The workaround? Use the free Wi-Fi across the road.

I’m sure this isn’t the only example of the business going around the outside of I.T.


When you work so hard to keep the Enterprise (or even SMB) secured, stable and legally compliant, it’s frustrating to know that those efforts can be completely ignored with a corporate credit card (or even a free trial)! What’s the solution if you’ve even blocked the website from your network?

SaaS is the hardest Cloud capability to integrate into an existing environment. It can impact so much of your I.T. footprint, with a system that you have very little control over. Secure data integration, identity management, access management, data storage, terms of use, APIs … the list goes on. There’s no point running a proof of concept if you don’t have answers for the longer term operation, maintenance and security of a SaaS application. But if it’s not needed as a long-term capability (such is the beauty of SaaS), is it worth having ALL the answers before we allow a dummy data test? Or do we want to get the hopes up of the business users, only to tell them there’s no way it would work with live data because it doesn’t meet your compliance regulations? Is it a “chicken or the egg” type question?

The currently reality is it IS easy for the business to go ahead without I.T. backing, though I’d love to see the reactions from the Legal & Compliance teams. With dummy data available, the business CAN try some cool stuff without touching Production systems or real data, minimising some of the risk. Are we making it too hard for the business to innovate, or are we protecting themselves from themselves?

Do you have a way to support fast initiation of SaaS proof of concept initiatives?  Does the risk just make it too hard? Is someone else in your organization holding up the NO card when it comes to Cloud (and SaaS in particular)?  Let me know what you think.




P.S. I'll be at Interop in Las Vegas this week from May 4-6. where I'll get to meet some SolarWinds Head Geeks in person! It's a long flight from Brisbane Australia, so come and find me and say Hi if you are attending.


In Logging We Trust

Posted by SomeClown May 3, 2016

Happy - Bob Ross Meme.jpg

Everyone in IT loves to log. We love to log our servers, our networks, our security devices, and our security. We log all the things. Sometimes we even look at those logs, but mostly we dump them to a tool which paints a nice, happy little dashboard… just right there, and then we forget about it until we get those pesky notices that something has gone wrong.


The challenges here are myriad, however, and not always easy to address because they require political as well as technical fixes. IT personnel are generally great with technology, but not so much with the politicking. Kissing hands and shaking babies is apparently the wrong approach to take, and so when rebuffed by the suits, or sometimes the Bobs, we retreat to our happy little world of dashboards and log data.


One major challenge to logging is mentally getting beyond logging. We don’t need logging for loggings sake. What we actually need is correlation. What do I mean? I mean that all of the bits of information we collect from all of our disparate systems sit idly by, locked in their own little bubbles, sending occasional notices that send us squirreling off to solve a problem. None of the information we collect is analyzed collectively, it’s not correlated at all, and so we miss patterns.


Think about it. We collect from all of our systems in a structured way. We also collect vast amounts of machine data from systems as diverse as badge readers, BLE beacons, tweets, failed domain lookups, etc., but we don’t do anything with it as a whole. What we need to do is to start looking at our data instead of random datum. If we normalize as much of that data as possible, make intelligent connections between it all, and use intelligent analysis, we can start to make sense of the random noise on the wire and stop chasing squirrels.


The other problem we face is the major impediment to what I’ve just described: silos. Let’s face it, within the IT industry as a whole we segment ourselves off by specialty. Security, networking, systems, applications, storage, voice, wireless, and probably even more sub categories are all common areas of expertise, and those areas are very frequently operated as different departments within the larger IT organization, either de facto, or de jure. And as often as not those departments don’t work together, don’t always like each other, and sometimes even work against one another.


So, each segment of the IT organization is gathering data using different tools and methodologies, and with a varying amount of fidelity to what the data is telling them. Data correlation in the big picture is mostly worthless if it’s not done in a deliberate way across the entire organization. To get a detailed picture of your organization you need everything collected, not just the bits from groups who get along. Without that, you won’t realize the benefits of big data analytics, what I’ve been calling correlation, in any meaningful way. You won’t be able to connect the proverbial dots to a place from which valid, useful conclusions may be drawn. And without that, we might as well go back to our insular worlds, and work on our squirrel chasing.


Save the date(s) September 14th & 15th for THWACKcamp, our annual virtual conference. This year’s going to be bigger and better than ever before! EMEA THWACKcamp will live-stream in local time on September 15th!


For an overview of what happened last year, and to access the on-demand sessions, head on over to: THWACKcamp 2015


What are you looking forward to most with this year’s THWACKcamp? Tell us below, or join the conversation on social using #THWACKcamp.

Filter Blog

By date: By tag:

SolarWinds uses cookies on its websites to make your online experience easier and better. By using our website, you consent to our use of cookies. For more information on cookies, see our cookie policy.