The title of this post raises an important question and one that seems to be on the mind of everyone who works in an infrastructure role these days. How are automation and orchestration going to transform my role as an infrastructure engineer? APIs seem to be all the rage, and vendors are tripping over themselves to integrate northbound APIs, southbound APIs, dynamic/distributed workloads, and abstraction layers anywhere they can. What does it all mean for you and the way you run your infrastructure?


My guess is that it probably won’t impact your role all that much.


I can see the wheels turning already. Some of you are vehemently disagreeing with me and want to stop reading now, because you see every infrastructure engineer only interacting with an IDE, scripting all changes/deployments. Others of you are looking for validation for holding on to the familiar processes and procedures that have been developed over the years. Unfortunately, I think both of those approaches are flawed. Here’s why:


Do you need to learn to code? To some degree, yes! You need to learn to script and automate those repeatable tasks that you can save time being run via script. The thing is, this isn’t anything new. If you want to be an excellent infrastructure engineer, you’ve always needed to know how to script and automate tasks. If anything, this newly minted attention being placed on automation should make it less of an effort to achieve (anyone who’s had to write expect scripts for multiple platforms should be nodding their head at this point). A focus on automation doesn’t mean that you just now need to learn how to use these tools. It means that vendors are finally realizing the value and making this process easier for the end-user. If you don’t know how to script, you should pick a commonly used language and start learning it. I might suggest Python or PowerShell if you aren’t familiar with any languages just yet.


Do I need to re-tool and become a programmer?  Absolutely not! Programming is a skill in and of itself, and infrastructure engineers will not need to be full-fledged programmers as we move forward. By all means, if you want to shift careers, go for it. We need full-time programmers who understand how infrastructure really works. But, automation and orchestration aren’t going to demand that every engineer learn how to write their own compilers, optimize their code for obscure processors, or make their code operate across multiple platforms. If you are managing infrastructure through scripting, and you aren’t the size of Google, that level of optimization and reusability isn’t going to be necessary to see significant optimization of your processes. You won’t be building the platforms, just tweaking them to do your will.


Speaking of platforms, this is the main reason why I don’t think your job is really going to change that much. We’re in the early days of serious infrastructure automation. As the market matures, vendors are going to be offering more and more advanced orchestration platforms as part of their product catalog. You are likely going to interface with these platforms via a web front end or a CLI, not necessarily through scripts or APIs. Platforms will have easy-to-use front ends with an engine on the back end that does the scripting and API calls for you. Think about this in the terms of Amazon AWS. Their IaaS products are highly automated and orchestrated, but you primarily control that automation from a web control panel. Sure, you can dig in and start automating some of your own calls, but that isn’t really required by the large majority of organizations. This is going to be true for on-premises equipment moving forward as well.


Final Thoughts


Is life for the infrastructure engineer going to drastically change because of a push for automation? I don’t think so. That being said, scripting is a skill that you need in your toolbox if you want to be a serious infrastructure engineer. The nice thing about automation and scripting is that it requires predictability and standardization of your configurations, and this leads to stable and predictable systems. On the other hand, if scripting and automation sound like something you would enjoy doing as the primary function of your job, the market has never been better or had more opportunities to do it full time. We need people writing code who have infrastructure management experience.


Of course, I could be completely wrong about all of this, and I would love to hear your thoughts in the comments either way.

You may be wondering why, after creating four blog posts encouraging non-coders to give it a shot, select a language and break down a problem into manageable pieces, I would now say to stop. The answer is simple, really: not everything is worth automating (unless, perhaps, you are operating at a similar scale to somebody Amazon).


The 80-20 Rule


Here's my guideline: figure out what tasks take up the majority (i.e. 80%) of your time in a given time period (in a typical week perhaps). Those are the tasks where making the time investment to develop an automated solution is most likely to see a payback. The other 20% are usually much worse candidates for automation where the cost of automating it likely outweighs the time savings.


As a side note, the tasks that take up the time may not necessarily be related to a specific work request type. For example, I may spend 40% of my week processing firewall requests, and another 20% processing routing requests, and another 20% troubleshooting connectivity issues. In all of these activities, I spend time identifying what device, firewall zone, or VRF various IP addresses are in, so that I can write the correct firewall rule, or add routing in the right places, or track next-hops in a traceroute where DNS is missing. In this case, I would gain the most immediate benefits if I could automate IP address research.


I don't want to be misunderstood; there is value in creating process and automation around how a firewall request comes into the queue, for example, but the value overall is lower than for a tool that can tell me lots of information about an IP address.


That Seems Obvious


You'd think that it was intuitive that we would do the right thing, but sometimes things don't go according to plan:


Feeping Creatures!


Once you write a helpful tool or an automation, somebody will come back and say, Ah, what if I need to know X information too? I need that once a month when I do the Y report. As a helpful person, it's tempting to immediately try and adapt the code to cover every conceivable corner case and usage example, but having been down that path, I counsel against doing so. It typically makes the code unmanageably complex due to all the conditions being evaluated and worse, it goes firmly against the 80-20 rule above. Feeping Creatures is a Spoonerism referring to Creeping Features, i.e. an always expanded feature list for a product.


A Desire to Automate Everything


There's a great story in What Do You Care What Other People Think (Richard Feynman) that talks about Mr. Frankel, who had developed a system using a suite of IBM machines to run the calculations for the atomic bomb that was being developed at Los Alamos.


"Well, Mr. Frankel, who started this program, began to suffer from the computer disease that anybody who works with computers now knows about. [...] Frankel wasn't paying any attention; he wasn't supervising anybody. [...] (H)e was sitting in a room figuring out how to make one tabulator automatically print arctangent X, and then it would start and it would print columns and then bitsi, bitsi, bitsi, and calculate the arc-tangent automatically by integrating as it went along and make a whole table in one operation.


Absolutely useless. We had tables of arc-tangents. But if you've ever worked with computers, you understand the disease -- the delight in being able to see how much you can do. But he got the disease for the first time, the poor fellow who invented the thing."


It's exciting to automate things or to take a task that previously took minutes, and turn it into a task that takes seconds. It's amazing to watch the 80% shrink down and down and see productivity go up. It's addictive. And so, inevitably, once one task is automated, we begin looking for the next task we can feel good about, or we start thinking of ways we could make what we already did even better. Sometimes the coder is the source of creeping features.


It's very easy to lose touch with the larger picture and stay focused on tasks that will generate measurable gains. I've fallen foul of this myself in the past, and have been delighted, for example, with a script I spent four days writing, which pulled apart log entries from a firewall and ran all kinds of analyses on it, allowing you to slice the data any which way and generate statistics. Truly amazing! The problem is, I didn't have a use for most of the stats I was able to produce, and actually I could have fairly easily worked out the most useful ones in Excel in about 30 minutes. I got caught up in being able to do something, rather than actually needing to do it.


And So...


Solve A Real Problem


Despite my cautions above, I maintain that the best way to learn to code is to find a real problem that you want to solve and try to write code to do it. Okay, there are some cautions to add here, not the least of which is to run tests and confirm the output. More than once, I've written code that seemed great when I ran it on a couple of lines of test data, but then when I ran it on thousands of lines of actual data, I discovered oddities in the input data, or in the loop that processes all the data reusing variables carelessly or similar. Just like I tell my kids with their math homework, sanity check the output. If a script claims that a 10Gbps link was running at 30Gbps, maybe there's a problem with how that figure is being calculated.


Don't Be Afraid to Start Small


Writing a Hello World! script may feel like one of the most pointless activities you may ever undertake, but for a total beginner, it means something was achieved and, if nothing else, you learned how to output text to the screen. The phrase, "Don't try to boil the ocean," speaks to this concept quite nicely, too.


Be Safe!


If your ultimate aim is to automate production device configurations or orchestrate various APIs to dance to your will, that's great, but don't start off by testing your scripts in production. Use device VMs where possible to develop interactions with different pieces of software. I also recommend starting by working with read commands before jumping right in to the potentially destructive stuff. After all, after writing a change to a device, it's important to know how to verify that the change was successful. Developing those skills first will prove useful later on.


Learn how to test for, detect, and safely handle errors that arise along the way, particularly the responses from the devices you are trying to control. Sanitize your inputs! If your script expects an IPv4 address as an input, validate that what you were given is actually a valid IPv4 address. Add your own business rules to that validation if required (e.g. a script might only work with 10.x.x.x addresses, and all other IPs require human input). The phrase Garbage in, garbage out, is all too true when humans provide the garbage.


Scale Out Carefully


To paraphrase a common saying, automation allows you to make mistakes on hundreds of devices much faster that you could possibly do it by hand. Start small with a proof of concept, and demonstrate that the code is solid. Once there's confidence that the code is reliable, it's more likely to be accepted for use on a wider scale. That leads neatly into the last point:


Good Luck Convincing People


It seems to me that everybody loves scripting and automation right up to the point where it needs to be allowed to run autonomously. Think of it like the Google autonomous car: for sure, the engineering team was pretty confident that the code was fairly solid, but they wouldn't let that car out on the highway without human supervision. And so it is with automation; when the results of some kind of process automation can be reviewed by a human before deployment, that appears to be an acceptable risk from a management team's perspective. Now suggest that the human intervention is no longer required, and that the software can be trusted, and see what response you get.


A coder I respect quite a bit used to talk about blast radius, or what's the impact of a change beyond the box on which the change is taking place? Or what's the potential impact of this change as a whole? We do this all the time when evaluating change risk categories (is it low, medium, or high?) by considering what happens if a change goes wrong. Scripts are no different. A change that adds an SNMP string to every device in the network, for example, is probably fairly harmless. A change that creates a new SSH access-list, on the other hand, could end up locking everybody out of every device if it is implemented incorrectly. What impact would that have on device management and operations?




I really recommend giving programming a shot. It isn't necessary to be a hotshot coder to have success (trust me, I am not a hotshot coder), but having an understanding of coding will, I believe, will positively impact other areas of your work. Sometimes a programming mindset can reveal ways to approach problems that didn't show themselves before. And while you're learning to code, if you don't already know how to work in a UNIX (Linux, BSD, MacOS, etc.) shell, that would be a great stretch goal to add to your list!


I hope that this mini-series of posts has been useful. If you do decide to start coding, I would love to hear back from you on how you got on, what challenges you faced and, ultimately, if you were able to code something (no matter how small) that helped you with your job!

I'm in Vancouver, BC this week, and jet lag has given me some extra hours to get this edition of the Actuator done. Maybe while I am here I can ask Canadians how they feel about Amazon taking over the world. My guess is they won't care as long as hockey still exists, and they can get equipment with Prime delivery.


As always, here are some links from the Intertubz that I hope will hold your interest. Enjoy!


Windows 10 Source Code Leak is a Minor Incident at Best

Seems to be some inconsistent information regarding the Windows "leak" last week.


Recycled Falcon 9 rocket survives one of SpaceX's most challenging landings yet

I enjoy watching SpaceX make the possibility of regular space travel a real thing to happen in my lifetime.


The RNC Files: Inside the Largest US Voter Data Leak

Including this link not for the politics, but for the shoddy data security. Reminder: data is valuable; treat it right.


Is Continuing to Patch Windows XP a Mistake? -

This is a great question. I really believe that it *is* a mistake, but I also understand that Microsoft wants to help do what they can to keep users safe.


Amazon Is Trying to Control the Underlying Infrastructure of Our Economy

I'm not trying to alarm you or anything with this headline, but in related news, Walmart can be a jerk, too.


This Company Is Designing Drones To Knock Out Other Drones

Because they read the other article and know that this is the only way to save humanity.


Harry Potter: How the boy wizard enchanted the world

Because it's the 20th anniversary and I've enjoyed watching my kids faces light up when they watch the movies.


Vancouver is one of my favorites cities to visit and jog along the water:


FullSizeRender 1.jpg

By Joe Kim, SolarWinds EVP, Engineering & Global CTO


There continues to be pressure on government IT to optimize and modernize, and I wanted to share a blog written in 2016 by my SolarWinds colleague, Mav Turner.


Federal IT professionals are in the midst of significant network modernization initiatives that are fraught with peril. Modernizing for the cloud and achieving greater agility are attractive goals, but security vulnerabilities can all too easily spring up during the modernization phase.


A path paved with danger, ending in riches


Last year, my company, SolarWinds, released the results of a Federal Cybersecurity Survey showing that the road to modernization is marked with risk. Forty-eight percent of respondents reported that IT consolidation and modernization efforts have led to an increase in IT security issues. These primarily stem from incomplete transitions (according to 48 percent of respondents), overly complex management tools (46 percent), and a lack of training (44 percent).


The road to modernization can potentially lead to great rewards. Twenty-two percent of respondents actually felt that modernization can ultimately decrease security challenges. Among those, 55 percent cited the benefits of replacing old, legacy software, while another 52 percent felt that updated equipment offered a security advantage. Still more (42 percent) felt that newer software was easier to use and manage.


The challenge is getting there. As respondents indicated, issues are more likely to occur in the transitional period between out-with-the- old and in-with-the-new. During this precarious time, federal administrators need to be hyper aware of the dangers lurking just around the corner.


Here are a few strategies that can help.


Invest in training


Federal IT professionals should not trust their legacy systems or modern IT tools to someone without the proper skill sets or knowledge.


Workers who do not understand how to use, manage, and implement new systems can be security threats in themselves. Their inexperience can put networks and data at risk. Agencies must invest in training programs to help ensure that their administrators, both new and seasoned, are familiar with the deployment and management of modern solutions.


Maximize the budget


If the money is there, it’s up to federal CIOs to spend it wisely. Some funds may go to the aforementioned training, while others may go to onboarding new staff. Yet another portion could go to investing in new technologies that can help ease the transition from legacy to modernized systems.


Avoid doing too much at once


That transition should be gradual, as a successful modernization strategy is built a win at a time.


Administrators should start upgrades with a smaller set of applications or systems, rather than an entire infrastructure. As upgrades are completed, retrospective analyses should be performed to help ensure that any security vulnerabilities that were opened during the transition are now closed. Connected systems should be upgraded simultaneously. Further analyses should focus on length of time for the transition, number of staff required, and impact on operations, followed by moving on to the next incremental upgrade.


Find the full article on Federal News Radio.


I realized that I do so many of these "travelog" type posts that, like Tom's "Actuator" newsletter, I might as well have a snappy name to go with it. So here we are.


Next week I'll be heading out to Las Vegas with 20 of my SolarWinds peeps (including patrick.hubbard and Dez ) for a week of madness called CiscoLive. So what are my goals for this trip, besides claiming my very own NetVet badge and avoiding bursting into flame whenever I walk outside? (The weather forecasts temperatures between 105 and 111 F°.)


First off, CiscoLive kicks off once again with the unofficial event known as #KiltedMonday. While I will NOT be sporting a kilt, I have acquired a scarf woven with the official Jewish tartan and will be wearing it proudly...


...along with socks. Because #SocksOfCLUS are also A Thing, and we at SolarWinds are very proud to be offering one of three new patterns for anyone who shows up and registers for a THWACK account.


Jumping ahead, a lot of my focus is going to be on Wednesday morning, when Destiny and I will sit for Cisco exams. I'm renewing my CCNA and Destiny is going for the CCNA+Security cert - because of course she is! While I cannot wait to talk to people in the booth, I admit that whenever there's a quiet moment I will probably be huddled in a corner reviewing ACLs, OSPF routing, and IPv6.


Finally, I'm looking forward to renewing old friendships that have become a yearly tradition. I hope to get to spend time on the dCloudCouch with Silvia Spiva (@silviakspiva) and Anne Robotti (@arobotti); attempting to get in a podcast with both Lauren Friedman (@lauren) and also Amy Lewis (@CommsNinja), and finding a minute to chill with Roddie Hasan (@eiddor).


And, of course, that ignores the fact that 20 SolarWinds folks will all be in the same place at the same time for more than 30 minutes in a row. So you can expect jokes, videos, tweets, interviews, and more!


Next week I'll let you know what I actually DID see, as well as anything unexpected that I happened across.


User, Help Thyself

Posted by scuff Jun 23, 2017

I’m going to switch gears on the automation topic now. It’s natural to think of scripts, packages, images, tools, triggers, and actions when you think of automating IT tasks. We automate our technical things with more tech. But what if we removed ourselves from the equation some other way? Are there tasks that we are holding onto that we could use to empower other, non-technical humans to do instead?


Don’t rush off to outsource your monitoring checks to someone on Fiverr. Instead, we’re going to talk about what we can get our users to do for themselves, without breaking anything.


Self-service password reset - In a cloud SaaS world, we’re used to a lovely little “Forgot your password?” link that will send a reset link to our recorded email address. For better security, you’d want 2FA or some secret questions as well. Microsoft’s Azure Active Directory lets you enable this for your users and, for a change, it’s not on by default. If you have directory synchronization and password write-back turned on, presto! Your users have just reset their own on-premises AD password, too. If your AD isn’t connect to the cloud, a ton of third-party vendors jumped on this need to create paid tools of their own to achieve this for you. Might be worth a look if you have high help desk stats for password resets.


AD automation – While we’re on the subject of AD, how manual is your process for creating new user accounts? Have you played with CSVDE or PowerShell as a scripted input method? Could you take that to the next level and wrap a workflow around it that gets HR to enter the correct data (first name, last name, role, department, etc.) that could then feed into your script and run an automation user creation process? There are third- party tools that handle this, including the workflow and an approval step.


Azure Active Directory Premium also offers dynamic group membership. You can set attributes on a user object (such as, Department) and have a group that queries AAD and automatically adds/removes members based on that attribute, for access to resources. Now, if you could automate HR submitting a web form that changes the Department value in AD, you are now hands-off. Sounds good in theory, but is anyone using it?


Chatbots as help desks – We've previously talked about chatbots saving the world (or not). They are good at providing answers to FAQ-style queries, for sure. Facebook for Work and Microsoft Teams certainly think it’s important to support bots in their collaboration tools. Has anyone replaced their help desk with a bot, yet? Are users helping themselves with this modern day Clippy replacement? Or is this tech gone mad?


Aside from the bots, are we seeing collaboration tools enabling users to help each other with questions they may otherwise call the help desk for? Are we using those tools to communicate current IT known issues, to reduce incoming call volumes? Is it working?


Let me know if you’ve managed to automate yourself out of a process by enabling someone else to do it, instead.

IT Right Equals Might?

Posted by kong.yang Jun 23, 2017

If I learned anything from Tetris, it’s that errors pile up and accomplishments disappear.

– Andrew Clay Shafer (@littleidea on Twitter).


In IT, we make our money and maintain our job by being right. And we have to be right more often than not because the one time we are wrong might cost us our job. This kind of pressure can lead to a defensive, siloed mentality. If might equals right, then look for an IT working environment that is conducive to hostilities and sniping.


I’ve witnessed firsthand the destructive nature of a dysfunctional IT organization. Instead of working as a cohesive team, that team was one in which team members would swoop in to fix issues only after a colleague made a mistake. It was the ultimate representation of trying to rise to the top over the corpses of colleagues. Where did it all go wrong? Unfortunately, that IT organization incentivized team members to outdo one another for the sake of excellent performance reviews and to get ahead in the organization. It was a form of constant hazing. There were no mentors to help guide young IT professionals to differentiate between right and wrong.


Ultimately, it starts and ends with leadership and leaders. If leaders allow it, bad behaviors will remain pervasive in the organization’s culture. Likewise, leaders can nip such troubling behavior in the bud if they are fair, firm, and consistent. That IT team’s individual contributors were eventually re-organized and re-assigned once their leaders were dismissed and replaced.


Rewards and recognition come and go. Sometimes it’s well-deserved and other times we don’t get the credit that’s due. Errors, failures, and mistakes do happen. Don’t dwell on them. Continue to [ learn and ] move forward. A career in IT is a journey and a long one at that. Mentees do have fond memories of mentors taking the time to help them become a professional. Lastly, remember that kindness is not weakness, but rather an unparalleled kind of strength.


I’ve attempted to locate a manager or company willing to commit to the pretense of corporate pushback against a hybrid mentality. I’ve had many conversations with customers who’ve had this struggle within their organizations, but few willing to go on record.


As a result, I’m going to relate a couple of personal experiences, but I’m not going to be able to commit customer references to this.


Let’s start with an organization I’ve worked with a lot lately. They have a lot of data of an unstructured type, and our goal was to arrive at an inexpensive “SMB 3.0+” storage format that would satisfy this need. We recommended a number of cloud providers, both hybrid and public, to help them out. The pushback came from their security team, who’d decided that compliance issues were a barrier to going hybrid. Obviously, most compliance issues have been addressed. In the case of this company, we, as a consultative organization, were able to make a good case for both the storage of the data, the costs, and an object-based model for access from their disparate domains. As it turned out, this particular customer chose a solution that placed a compliant piece of storage on-premises that could satisfy its needs, but as a result of the research we’d submitted for them, their security team agreed to be more open in the future to these types of solutions.


Another customer had a desire to launch a new MRP application and was evaluating hosting the application in a hybrid mode. In this case, the customer had a particular issue with relying on the application being hosted entirely offsite. As a result, we architected a solution wherein the off-prem components were designed to augment the physical/virtual architecture built for them onsite. This was a robust solution that ensured a guarantee of uptime for the environment with a highly available approach toward application uptime and failover. In this case, just what the customer had requested. The pushback in this solution wasn’t one of compliance because the hosted portion of the application would lean on our highly available and compliant data center for hosting. They objected to the cost, which seemed to us to be a reversal of their original mandate. We’d provided a solution based on their requests, but they changed that request drastically. In their ultimate approach, they chose to host the entire application suite in a hosted environment. Their choice to move toward a cloudy environment for the application, in this case, was an objection to the costs of their original desired approach.


Most of these objections were real-world, and decisions that our customers had sought. They’d faced issues they had not been entirely sure were achievable. In these cases, pushback came in the form of either security or cost concerns. I hoped we had delivered solutions that met their objections and helped the customers achieve their goals.


It’s clear that the pushback we’d received was due to known or unknown real-world issues facing their business. In the case of the first customer, we’d been engaged to solve their issues regardless of objections, and we found them a storage solution that gave them the best on-premises solution for them. But in the latter, by promoting a solution that was geared toward satisfying all they’d requested, we were bypassed in favor of a lesser solution provided by the application vendor. You truly win some and lose some.


Have you experienced pushback in similar situations? I'd love to hear all about it.

Hope everyone enjoyed Father's Day this past weekend, and that your day was filled with good food and good times with family. This week's Actuator is timed with the summer solstice, the longest day of the year. But as any SysAdmin knows, the longest day of the year is any day you are working with XML.


As always, here are some links from the Intertubz that I hope will hold your interest. Enjoy!


Amazon to buy Whole Foods for $13.7 billion, wielding online might in brick-and-mortar world

In the biggest news story last week, Amazon agreed to purchase Whole Foods. I am cautiously optimistic for what this could mean with regards to world hunger. By purchasing Whole Foods, Amazon gets a brand name AND a distribution channel they need not build themselves. Combined with drone delivery, Amazon could find a way to provide food to remote locations. Heck, if Amazon partners with a real estate company such as McDonald's (who already feeds 1% of the world each day), Amazon could be feeding 5% of the global population within ten years.


Divide and Conquer: How Microsoft Researchers Used AI to Master Ms. Pac-Man

Good news for SkyNet fans, we've now created AI smart enough to defeat video games. It won't be long now before the AI decides that the best way to win is to not play and instead eliminate the game creators.


Complete list of wifi routers from WikiLeaks' Cherry Blossom release detailing CIA hacking tools

If your home router is on this list, you might want to make sure you've protected yourself against the exploits that have been publicly released.


Forget Autonomous Cars—Autonomous Ships Are Almost Here

And now I have something else to write about other than just autonomous cars. Autonomous ships!


Marissa Mayer Bids Adieu to Yahoo

Only in America can someone be given the opportunity to run an already failing corporation into the ground and then walk away with a quarter of a billion dollars.


Block Untrusted Apps Using AppLocker

For anyone looking to add an extra layer of protection against malware. As much as I know users are a large security surface area to control, I also know that a lot of SysAdmins take and run scripts they find from internet help forums. Running random scripts you find on blogs are also a risk. Be careful out there, folks.


20 Percent of Users Still Don’t Know about Phishing or Ransomware, Reveals Survey

That 20% seems like a low estimate, IMO.


For all the fathers out there:


Screen Shot 2017-06-20 at 12.47.16 AM.png


IT professionals are a hardworking group. We carry a lot of weight on our shoulders, a testament to our past and future successes. Yet, sometimes we have to distribute that weight evenly across the backs of others. No, this is not because we don’t want to do something. I’m sure that any of you, while capable of performing a task, would never ask another person to do something you wouldn’t willingly do yourself. No. Delegating activities to someone else is actually something we all struggle with.


Trust is a huge part of delegating. You're not only passing the baton of what needs to be done to someone else, but you’re also trusting that they’ll do it as well as you would, as quickly as you would, and -- this is the hard part -- that they'll actually do it.


As the world continues to evolve, transition, and hybridize, we are faced with this challenge more often. I’ve found there are some cases where delegation works REALLY well, and other cases where I’ve found myself banging my head against the wall, desk, spiked mace, etc. You know the drill.


One particular success story that comes to mind involves the adoption of Office 365. Wow! My internal support staff jumped for joy the day that was adopted. They went from having to deal with weird, awkward, and ridiculous Exchange or Windows server problems on a regular basis to... crickets. Sure, there were and still are some things that have to be dealt with, but it went from daily activity to monthly activity. Obviously, any full-time Exchange admin doesn't want to be replaced by Robot365, but if it's just a small portion of your administrative burden that regularly overwhelms, it's a good bet that delegating is a good idea. In this particular use-case, trust and delegation led to great success.


On the other hand, I’ve seen catastrophes rivaled only by the setting of a forest fire just for the experience of putting it out. I won’t name names, but I've had rather lengthy conversations with executives from several cloud service providers we all know and (possibly) love. Because I’m discussing trust and delegation, let’s briefly talk about what we end up trusting and delegating in clouds.


  • I trust that you won’t deprecate the binaries, libraries, and capabilities that you offer me
  • I trust that you won’t just up and change the features that I use and my business depends on
  • I trust that when I call and open a support case, you’ll delegate activities responsibly and provide me with regular updates, especially if the ticket is a P1


This is where delegating responsibility and trusting someone to act in your best interest versus the interests of themselves or some greater need beyond you can be eye-opening.


I’m not saying that all cloud service providers are actively seeking to ruin our lives, but if you talk to some of the folks I do and hear their stories, THEY might be the one to say that. This frightful tale is less about the fear and doubt of what providers will offer you, and more about being aware and educated about the things that could possibly happen, especially if you aren’t fully aware of the bad things that happen on the regular.


In terms of trust and delegation, cloud services should provide you with the following guarantees:

  • Trust that they will do EXACTLY what they say they will do, and nothing less. Make sure you are hearing contractual language around that guarantee versus marketing speak. Marketing messages can change, but contracts last until they expire.
  • Trust that things DO and WILL change, so be aware of any depreciation schedules, downtime activities, impacts, overlaps of changes, and dependencies that may lie within your business.
  • Delegate to cloud services only those tasks and support that may not matter to your production business applications. You want to gauge how well they can perform and conform to an SLA. It’s better to be disappointed early on when things don’t matter than to be in a fire-fight and go looking for support that may never come to fruition.


This shouldn't be read as an attack or assault on cloud services. Instead, view this as being more about enlightenment. If we don’t help make them better support organizations, they won’t know to and will not improve. They currently function on a build-it-and-they-will-come support model, and if we don’t demand quality support, they have no incentive to give it to us.


Wow! I went from an OMG Happy365 scenario to cloudy downer!


But what about you? What kinds of experiences with trust and delegation have you had? Successes? Failures? I’ll offer up some more of my failures in the comments if you’re interested. I would love to hear your stories, especially if you've had contrary experiences with cloud service providers. Have they gone to bat for you, or left you longing for more?


IT Is Everywhere

Posted by shidoshi1000 Employee Jun 20, 2017

By Joe Kim, SolarWinds Chief Technology Officer


This is evident from two surveys we conducted last year. First, we asked more than 800 employed, non-IT adult end-users in North America a series of questions about how they use technology at work, and the types of technologies being used within their organizations. We also asked more than 200 IT professionals to give their impressions on these end-users’ expectations. Here’s a sample of what we found:


Users are taking IT everywhere. Forty-seven percent of end-user respondents said they connect more electronic devices, whether personally or company-owned, to their employers’ networks than they did 10 years ago. In fact, they connect an average of three more devices than they did a decade ago, two of which they own themselves.


The cloud has taken IT outside the agency. Most organizations allow some form of cloud-based applications, such as Google® Drive or Dropbox®, and 53 percent of respondents said they use these applications at work. Forty-nine percent said they regularly use work-related applications outside the office, on either personally or company-owned devices. Our survey also found that end-users will occasionally use non-IT-sanctioned cloud applications, such as iTunes® or something similar, while at work.


IT professionals must manage technology that may be outside their comfort zones. They must be versed in cloud-driven applications, mobile devices, open source software, and, increasingly, hybrid IT environments that incorporate aspects of on-premises and outsourced components. They must also continually be aware of and monitor the security risks that these solutions – and the actions of end- users – can present, adding one more layer of complexity to an already intricate set of concerns.


Eighty-seven percent of end-user respondents said they expect their organizations’ IT professionals to help ensure the performance of the cloud-based applications they use at work. Further, 68 percent blamed their IT professionals if these applications did not work correctly (“Dropbox isn’t working! Someone call IT!”).


According to the IT is Everywhere survey, 62 percent of IT professional respondents felt that the expectation to support users’ personally-owned devices on their networks is significantly greater than it was 10 years ago. Meanwhile, 64 percent of IT professionals said that end-users expect the same time to resolution for issues with both cloud-based and local applications. The inference is that users do not draw a distinction between cloud and on-premises infrastructures, despite the many differences between the two, and the fact that hybrid IT operations can be exceedingly complex and difficult to manage.


All of this is to say that IT is indeed everywhere. It’s in our offices and homes. It’s on our desktops and smartphones. It’s onsite and in the cloud.


IT professionals are constantly on deck to help ensure always-on availability and optimal performance, regardless of device, platform, application, or infrastructure. The end-users don’t care, as long as things are working.


Find the full article on GovLoop.

As technology professionals, we live in an interruption-driven world; responding to incidents is part of the job. All of our other job duties go out the window when a new issue hits the desk. Having the right information and understanding the part it plays in the organization is key to handling these incidents with speed and accuracy. This is why it's critical to have the ability to compare apples-to-apples when it comes to the all-important troubleshooting process.


What is our job as IT professionals?

Simply put, our job is to deliver services to end-users. It doesn't matter if those end-users are employees, customers, local, remote, or some combination of these. This may encompass things as simple as making sure a network link is running without errors, a server is online and responding, a website is handling requests, or a database is processing transactions. Of course, for most of us, it's not a single thing, it's a combination of them. And considering the fact that 95 percent of organizations report having migrated critical applications and IT infrastructure to the cloud over the past year, according to the SolarWinds IT Trends Report 2017, visibility into our infrastructure is getting increasingly murky.


So, why does this matter? Isn't it the responsibility of each application owner to make sure their portion of the environment is healthy? Yes and no. Ultimately, everyone is responsible for making sure that the services necessary for organizational success are met. Getting mean time to resolution (MTTR) down requires cooperation, not hostility. Blaming any one individual or team will invariably lead to a room full of people pointing fingers. This is counterproductive and must be avoided. There is a better way: prevention via comprehensive IT monitoring.


Solution silos

Monitoring solutions come in all shapes and sizes. Furthermore, they come with all manner of targets. We can use solutions specific to vendors or specific to infrastructure layers. A storage administrator may use one solution while a virtualization and server administrator may use another and the team handling website performance a third solution. And, of course, none of these tools may be applicable to the database administrators.

At best, monitoring infrastructure with disparate systems can be confusing, at worst, it can be downright dangerous. Consider the simple example of a network monitoring solution seeing traffic moving to a server at 50 megs/second, but the server monitoring solution sees incoming traffic at 400 megs/second. Which one is right? Maybe both of them, depending on if they mean 50 MBps and 400 Mbps. This is just the start of the confusion. What happens if your virtualization monitoring tool reports in Kb/sec and your storage solution reports in MB/sec? Also, when talking about kilos, does it mean 1,000 or 1,024?


You can see how the complexity of analyzing disparate metrics can very quickly grow out of hand. In the age of hybrid IT, this gets even more complex since cloud monitoring is inherently different than monitoring on-premises resources. You shouldn't have to massage the monitoring data you receive when troubleshooting a problem. That only serves to lengthen MTTR.


Data normalization

In the past, I've worked in environments with multiple monitoring solutions in place. During multi-team troubleshooting sessions, we've had to handle the above calculations on the fly. Was it successful?  Yes, we were able to get the issue remedied. Was it as quick as it should have been? No, because we were moving data into spreadsheets, trying to align timestamps, and calculating differences in scale (MB, Mb, KB, Kb, etc.). This is what I mean by data normalization: making sure everyone is on the same page with regard to time and scale.


Single pane of glass

Having everything you need in one place with the timestamps lined up and everything reporting with the same scale — a single pane of glass through which you see your entire environment — is critical to effective troubleshooting. Remember, our job is to provide services to our end-users and resolve issues as quickly as possible. If we spend the first half of our troubleshooting time trying to line up data, are we really addressing the problem?


About the Post

This is a cross-post from my personal blog @ [Link].



At least once a week I read or hear the familiar refrain, “SQL Server® is a memory hog,” or “SQL Server uses all the memory.” If you, or anyone you know, are saying these things, I am here today to tell you something.




Just no.


Stop. Saying. This.


It’s like hearing fingernails on a chalkboard when people say such things. It’s time to put an end to this myth.


And, as always, you’re welcome.


SQL Server Is a Software Program

That’s right. SQL Server is a piece of software. And software programs are good at doing what they have been programmed to do. Typically, software programs are programmed and configured by humans.


That’s where you come in, my fellow humans.


SQL Server will, by design, read data pages from disk into memory. SQL Server will store as many pages as you tell it to store, and will only evict them from memory as needed. My conclusion, for which I’ve done no research, is that 95% of the people complaining about SQL Server using all the memory on a server are 100% responsible for not configuring SQL Server memory properly.


And there’s the crux of the problem. The people that don’t understand how SQL Server uses memory also don’t understand that it is up to them to decide how much memory SQL Server will use.


That’s the #hardtruth folks. It’s been you all along.


(Editor’s note: I think Oracle®/UNIX® folks don’t have these complaints about memory because Windows® makes it easier to see memory consumption in Task Manager. Perhaps this myth would have died a long time ago if it weren’t for giving RDP access to people who don’t understand how SQL Server works, or that Task Manager is a dirty, filthy liar. But I digress.)


95 < 100

For those Generation Next-ers out there (people who install software by clicking Next-Next-Finish), you should know that SQL Server will not try to use all the available memory for data pages. The default setting allows for SQL Server to dynamically manage the memory consumption and it will not allocate more than 95% of the total physical memory.


For those of us old experienced enough to remember database servers with 8GB of RAM, that 95% is close enough to “all” for appearance's sake. And SQL Server has other memory needs than just database pages. Over the years, we have seen different data objects share the buffer cache with data pages. These days we can query the sys.dm_os_memory_clerks dynamic management view to find out how much of our memory is assigned to the various memory clerks.


The bottom line is that 95% is not 100%. SQL Server will not try to use all the memory by default. And setting the minimum memory will not cause SQL Server to start allocating memory, either. SQL Server will not allocate pages without being asked to do so.


By a human, most likely.


Deciding the Max Memory Setting

Assuming you have gotten this far, you understand that you are responsible for how much memory SQL Server will use. The next logical question becomes: What should the max memory be set to by default?


I have no idea. And neither does anyone else. If someone tells you they know exactly how much memory your SQL Server needs they are either (1) lying, (2) trying to sell you something, or (3) both.


There is no shortage of formulas out there for prognosticating the initial amount of memory to set as a max value for SQL Server. I’ve even seen suggestions that you use the size of a database to guess at your max memory setting. That’s absurd. It’s not the size of the database that determines the amount of memory needed, it’s the workload that matters.


Here is the formula I offer to clients and customers that ask for help with finding a max memory setting. This formula assumes you are trying to right-size the memory for a dedicated database server (engine only, no SSAS, SSRS, SSIS, etc., or any other significant applications), and this is for physical servers (but holds mostly true for virtualized servers, too):


• Take physical memory (say, 128 GB RAM)

• Subtract memory for the O/S itself (1 GB for every 8 GB of RAM; 16 GB in this example)

• Subtract memory needed for thread stack size (the number of worker threads multiplied by thread size; typical example for a x64, 4 CPU system would be 512 * 2 = 1024 MB, or 1 GB in our example here)


That would give us a max memory setting of about 111 GB in this example. Again, this doesn’t consider any other applications that might be running. The formula also does not consider the use of features such as Columnstore indexes or In-Memory OLTP. These features will require you to adjust your settings further.


Once you arrive at your number, you set your max memory and then monitor memory consumption, adjusting the settings as necessary. The 111 is not an absolute, it is meant as a decent starting point in the absence of any other information regarding the specific workload for the server.


Do You Need More Memory?

This is a common question that comes up in SQL Server and memory discussions. How do you know if you need more?


The first thing you need to know is if memory is the resource constraint you are facing. If so, then yeah, maybe you need more memory. One way to know is if your instance is using all the memory you assigned following the formula above. Another is if you are seeing memory errors in the SQL error logs. Still another way to know is if you are seeing a lot of disk activity (because SQL Server is not able to keep pages in memory). Any one of those items could mean that you need to allocate more memory to your instance.


However, it could be the case that by adding more memory you end up hurting performance. For example, in a virtualized environment it could be possible that the additional memory is spread over physical NUMA cores, resulting in slower performance than if the entire memory could fit inside one NUMA core.


I advocate that you monitor memory consumption over time, noting if you are trending upwards. Measuring and monitoring memory consumption is the best way to understand if your database server needs more memory.


Anything else is just a wild guess.



It’s not SQL Server, it’s you.


You have been in control all along.


It’s about time you understand, and accept, that you are responsible for what SQL Server is doing.


Blaming SQL Server for using all the memory that you have allowed it to access is like blaming a coffee maker for using all the water you placed inside.

Home again after a trip to Austin where I was filming sessions for THWACKcamp 2017. Excited yet? You should be because you can register now by going here. It happens in October, and although it's four months away, it feels like it will be here in a week. I can't wait for you to see all the buttery goodness we have in store for you this year!


As always, here are some links from the Intertubz that I hope will hold your interest. Enjoy!


Ex-Admin Deletes All Customer Data and Wipes Servers of Dutch Hosting Provider

Remember this the next time someone asks you for elevated permissions. Insider threats are a real thing, folks. (HT to Radioteacher for pointing me to this story over the weekend.)


Microsoft realigns its cloud, AI, data organizations

If only there was a sign telling us that traditional data storage technologies were shifting toward the cloud and integrating with new technologies, like machine learning and artificial intelligence. Then we might be able to better prepare ourselves for this shift.


It's so windy in Britain the electricity price went negative

Because every now and then I see someone comment about how renewable energy sources aren't able to produce enough energy to meet demand. I believe they can produce enough if we are willing to invest enough.


UK cops arrest man picked out by automatic facial recognition software

We are just one step away from arresting people because we *think* they are about to commit a crime.


Microsoft buys security-automation vendor Hexadite

Interesting acquisition in the wake of WannaCry, although I am certain the wheels were in motion for many months prior. I believe this is yet another example of how Microsoft is taking data security seriously, and being as proactive as possible to minimize risks as Azure continues to gobble up data.


Why You Shouldn’t Use SMS for Two-Factor Authentication (and What to Use Instead)

I was somewhat aware of the risks with using SMS, and I liked how this article was able to explain the issue and possible workarounds.


Gamestop hacked. Financial data of online shoppers accessed by crooks

Yep. We got a letter last week about this matter. The letter, however, didn't specify that it was for online purchases only, as this article indicates.


As soon as this Evil-Clown-as-a-Service (ECaaS) becomes available in the US, I know what I'm getting some folks for their birthday:




Want to know a secret?


I'm going to start at the end.


If your environment collects syslog and trap messages, no matter what vendor solution you are using, create a filtration layer that will take all those messages, process them, and forward just the useful ones along.


Now, moving from the end back to the beginning, here's what you want to do: Get some copies of Kiwi Syslog Server, set up a load balancer like an F5 to do UDP round robin between all those servers, and set rules on the first server to filter out everything but the alerts you want to keep. For the messages you want to keep, set up rules to transparently forward them to the system(s) that will process and act on them. Export that rule set and import it to the other servers sitting behind the load balancer. Finally, update all of the devices in your enterprise to send their trap and syslog messages to the VIP presented by the load balancer.


That's the secret! Now that I've explained it, the trick, the bottom line, are you curious to know WHY I am telling you all this?


This is why: I've seen the following scenario a half-dozen times. I'm brought in to consult on a monitoring project and someone announces, "My monitoring sucks! It's dog slow and just doesn't work. Find me something else!" So, I poke around and realize that all of their traps and syslog messages are going to a single system, which also happens to be the monitoring system. In Solarwinds terms, that's the primary poller.


In my experience, network devices generate a metric buttload (yes, that's a scientifically accurate measurement) of messages per hour. In more boring terms, we're talking about roughly 4,000 messages per hour per machine.


If you have a server that is trying to manage pinging a set of devices (and collecting and storing those metrics) along with pulling SNMP or WMI data from that same set of devices (again, and storing that data), along with presenting that information in the form of views and reports, and checking the database for exceeded thresholds to create alerts, and analyzing that data to provide baselines, and... Well, you get the point. Polling engines have a lot of work to do. And one of the ways they stay on top of that work is having a finely tuned scheduler that manages all those polling cycles.


If you then start throwing a few million spontaneous messages, which must be processed in real-time, what you have is a VERY unhappy system. What you have is monitoring solution that "sucks" through no fault of its own.


Once I am able to point this out to clients, the next question is, "Should we turn off syslog or traps?" Of course not. That is a rich and vital source of information. What you need is to put something in front of those messages to filter them out.


Which brings me back to the "filtration system."


BUT... there's a catch! The catch is that most syslog and trap receivers expect to also process those messages themselves - to create alerts, to store the data, etc. What is needed in my example is to be able to ignore the messages that are unimportant, but then FORWARD the ones that matter to another system that is able to act upon them. The challenge here is to forward them without changing the source machine.


Many trap and syslog handlers can forward messages, but they replace the original machine with itself as the source. That's not helpful when you want to correlate a syslog message with data collected another way, say SNMP polling, for example. To do that, you need to perform what is called "transparent" forwarding, which keeps the original source machine information intact.


Kiwi Syslog has done this for years. But not so with SNMP traps. For a variety of reasons, which I won't get into now, that capability hasn't existed until 9.6, the latest version.


Now that this essential function within your monitoring infrastructure is available (not to mention really, REALLY affordable) you can impact the performance of your monitoring system in a great big, positive way.


So, take a minute and check out the new version. Forwarding traps transparently isn't the only new feature, by the way. There's also IPv6 support, SNMP v3 support, use of VarBinds in output, logging to Papertrail, and more! Try it and let me know what you think in the comments below.


By Joe Kim, SolarWinds Chief Technology Officer


With container adoption on the rise, I wanted to share a blog written in 2016 by my SolarWinds colleague, Kong Yang.


While the initial inroads are primarily still in the education phase, container technology has started making its way into federal IT networks, and the appeal is clear. Container-based technology provides value specifically in the areas of efficiency, optimization, and security, particularly as networks grow. This combination is uniquely suited to meet government IT needs.


Before an agency dips its toes into container technology, it’s vital for federal IT pros to gain an understanding of exactly what containers are, and what benefits they can bring.


What are Containers?


Container technology is far less complex than it sounds. Containers wrap a piece of software in a complete file system that contains everything the software needs to run, including code, run time, system tools, and libraries. Containers guarantee that the software will always run the same, regardless of the compute environment.


Let’s say you’re building an application that handles online transactions. The user experience consists of logging in, clicking on an item to add it to the cart, walking through the checkout process, and finally submitting to complete the transaction. With containers, you can isolate these services into loosely coupled services, aka microservices, across multiple containers. The advantage of doing it this way is that if the microservices fail, they will not take down the application.


In fact, a failure of a container or a system running containers will result in those services spinning up on other systems to get the work done. With non-container technology, there’s a good chance a tiered application is running on one or multiple systems to take care of that entire transaction. A failure that occurs on a tier or in a system will result in a degraded application or potential downtime as that tier restarts or fails over.


With container technology, however, each piece is separated out into its own tiny package. The login, for example, may be one container. Adding something to your cart may be another container, and so on. It’s like a distributed assembly line. Each container is responsible for its own small, unique task, which it does expertly, as opposed to one large monolithic application tier that’s responsible for many, often vastly different tasks, and carries much overhead.


How Can I Get Started Using Containers?


As with any new technology, the first thing to do is become familiar with that technology by learning about it. Because containers are typically open source, there is a wealth of publically available information and source materials that can be used for education and replication. and are great places to start.


The next step is to ramp up on skill sets. IT teams should dedicate some resources and time and start building experience around containers and microservices. Spend time testing to understand where these services might be implemented throughout the agency to increase efficiency. Again, Docker® provides installable platforms, such as Docker for Mac®, Linux®, and Windows® that you can leverage to level up your container experience.


Once there is a baseline understanding of containers, how they work, and how they can be used, apply that to your own environment and start mapping out a strategy for implementation.


Find the full article on Federal Technology Insider.

There’s no question that trends in IT change on a dime and have done so for as long as technology has been around. The hallmark of a truly talented IT professional is the ability to adapt to those ever-present changes and remain relevant, regardless of the direction that the winds of hype are pushing us this week. It’s challenging and daunting at times, but adaptation is just part of the gig in IT engineering.


Where are we headed?


Cloud (Public) - Organizations are adopting public cloud services in greater numbers than ever. Whether it be Platform, Software, or Infrastructure as a Service, the operational requirements within enterprises are being reduced by relying on third parties to run critical components of the infrastructure. To realize cost savings in this model, operational (aka employee) and capital (aka equipment) costs must be reduced for on-premises services.


Cloud (Private) - Due to the popularity of public cloud options, and the normalization of the dynamic/flexible infrastructure that they provide, organizations are demanding that their on-premises infrastructure operate in a similar fashion. Or in the case of hybrid cloud, operate in a coordinated fashion with public cloud resources. This means automation and orchestration are playing much larger roles in enterprise architectures. This also means that the traditional organizational structures of highly segmented skill specialties (systems, database, networking, etc.) are being consumed by engineers who have experience in multiple disciplines.


Commoditization - When I reference commoditization here, it isn’t about the ubiquity and standardization of hardware platforms. Instead, I’m talking about the way that enterprise C-level leadership is looking at technology within the organization. Fewer organizations are investing in true engineering/architecture resources, and instead are bringing those services in either via utilization of cloud infrastructure, or bringing this skill set on through consultation. The days of working your way from a help desk position up to a network architecture position within one organization are slowly fading away.


So what does all of this mean for you?

It’s time to skill up. Focusing on one specialty and mastering only that isn’t going to be as viable a career path as it once was. Breadth of knowledge across disciplines is going to help you stand out because organizations are starting to look for people who can help them manage their cloud initiatives. Take some time to learn how the large public cloud providers like AWS, Azure, and Google Compute operate and how to integrate organizations into them. Spend some time learning how hyperconverged platforms work and integrate into legacy infrastructures. Finally, learn how to script in an interpreted (non-compiled) programming language. Don’t take that as advice to change career paths and become a programmer.  That line of thinking is a bit overhyped in my opinion. However, you should be able to do simple automation tasks on your own, and modify other people’s code to do what you need. All of these skills are going to be highly sought after as enterprises move into more cloud-centric infrastructures.


Don’t forget a specialty. While a broad level of knowledge is going to be prerequisite as we go forward, I still believe having a specialty in one or two specifics areas will help from a career standpoint. We still need experts, we just need those experts to know more than just their one little area of the infrastructure. Pick something you are good at and enjoy, and then learn it as deeply as you possibly can, all while keeping up with the infrastructure that touches/uses your specialty. Sounds easy, right?


Consider what your role will look like in 5-10 years. This speaks to the commoditization component of the trends listed above. If your aspiration is to work your way into an engineering or architecture-style role, the enterprise may not be the best place to do that as we move forward. My prediction is that we are going to see many of those types of roles move to cloud infrastructure companies, web scale organizations, resellers/consultants, and the technology vendors themselves. It’s going to get harder to find organizations that want to custom-design their infrastructure to match and enhance their business objectives, instead opting to keep administrative-level technicians on staff and leave the really fun work to outside entities. Keep this in mind when plotting your career trajectory.


Do nothing. This is bad advice, and not at all what I would recommend, but it is an equally viable path. Organizations don’t turn on a dime (even though our tech likes to), so you probably have 5 to 10 years of coasting ahead. You might be able to eek out 15 if you can find an organization that is really change averse and stubbornly attached to their own hardware. It won’t last forever, though, and if you aren’t retiring before the end of that coasting period, you’re likely going to find yourself in a very bad spot.


Final thoughts


I believe the general trend of enterprises viewing technology as a commodity, rather than a potential competitive advantage, is foolish and shortsighted. Technology has the ability to streamline, augment, and enhance the business processes that directly face a business’ customers. That being said, ignoring business trends is a good way to find yourself behind the curve, and recognizing reality doesn’t necessarily indicate that you agree with the direction. Be cognizant of the way that businesses are employing technology and craft a personal growth strategy that allows you to remain relevant, regardless of what those future decisions may be. Cloud skills are king in the new technology economy, so don’t be left without them. Focusing on automation and orchestration will help you stay relevant in the future, as well. Whatever it is that you choose to do, continue learning and challenging yourself and you should do just fine.

When SolarWinds hosted its first virtual event, THWACKcamp in 2012, about 250 very active THWACK® community members attended, along with technology managers from a few large customers. There were a handful of sessions, with topics concentrated largely around network monitoring best practices, with a nod to IT systems management. THWACKcamp returns this October 18-19, and will mark the sixth year in what has grown to become a live, multi-track event for thousands of skilled IT professionals. It now spans expert advice in everything from networking, to automation, to hybrid IT, to cloud-native APM, DevOps, security, and even MSP operations. And, again this year, IT professionals will be at THWACKcamp’s core, collaborating (and occasionally commiserating), but learning and sharing ideas that make IT more reliable, innovative, and perhaps even fun.


Moon Landing


Being voluntold that you’re supporting a physical tradeshow booth can be nerve-wracking. First, the whole endeavor is, at its heart, a marketing thing. You must specify and configure demo gear that must somehow be squeezed into impossibly designed sets without overheating. You also become a Cord Master, asked to improvise never-before-seen cabling and connectivity, like HDMI to ½” pipe-thread. On top of this, add Layer 8 configurations, live code that attendees can actually see and touch that’s also interesting. Finally, throw the whole mess into crates months before the event, aware that forgetting even something small might mean five days of blank screens. Event tech is not IT’s comfort zone. I know I certainly prefer to have the safety of a hardware lab and dev team nearby.


While THWACKcamp has the advantage of being a virtual event, (more than a few admins have said that attending in shorts and a T-shirt working from home is the way to go), it is nonetheless a live event. And this year, as more technologies and topics are included than ever before, the Q&A and open chat conversations will be wider-ranging and more technical than ever. It’s not limited to what we can fit into a few crates. It’s an opportunity to interact with IT of all types, including very small businesses that rely on Managed Service Providers, midsized businesses managing the complexity of hybrid IT on a budget, to the largest enterprises with hundreds of IT professionals. It’s an open congress of some of the sharpest admins in IT, just as eager to attend and engage as the presenters are to share and learn something new.


Over-provisioned Geek Prize Closet


IT professionals attend technical conferences to learn, talk, and network, but they also certainly enjoy swag. Awesome geek giveaways return in 2017, along with THWACK community status points and bragging rights for those attending live. And for 2017, THWACKcamp attendees may earn up to 20,000 THWACK points for participating in activities, mini-missions, and, of course, attending sessions.

So, whether you’ve never missed a session of THWACKcamp, or you’ve never even been to a technical learning event, be sure to check out the registration page when it goes live in August. Maybe even set a reminder to register, because you can’t attend, chat with others, win prizes, or earn THWACK points if you don’t register. We look forward to seeing you live at THWACKcamp 2017, October 18 and 19!


Automating the Cloud

Posted by scuff Jun 8, 2017

Let’s stick our heads in the cloud for a moment. With your very first test account to play with a SaaS product or an Infrastructure as a Service environment, it’s natural to set up users and servers manually. That’s how we learn. That’s not sustainable on an ongoing basis for a production environment unless you want to screenshot every box you ticked and you know that the next tech will follow that documentation to the letter.


Decisions, decisions
Server builds and user account creation are two SysAdmin processes that are perfect for automating, even when they’re in the cloud. Your biggest challenge will be deciding what tool to use. Do you have a single vendor approach, so a native tool from that vendor will suffice? Are you splitting your risk between AWS and Azure, and looking for one tool that supports both environments? Are you running a hybrid model where there’s still a requirement for internal user accounts that you want to integrate with cloud SaaS products?


The single vendor approach
I’m going to pick on Azure and AWS because they are the two I’m most familiar with and I also have a word count to (roughly) stick to. If you’re a Rackspace or Google Cloud fan, or prefer some other IaaS flavor, add your thoughts in the comments.


Azure: It will be no surprise that Azure’s own automation service is based on PowerShell. PowerShell scripts and workflows (known as runbooks) to be exact. Learn more about Azure Automation here:


AWS: AWS Cloud Formation uses JSON or YAML text files. You can choose from a library of templates or you the designer to create your own.


The multi-vendor approach
I’ve briefly mentioned before the powerhouses of Chef and Ansible. Both have tools that integrate with both Azure and AWS.


Chef and Azure:


Chef and AWS:


Ansible and Azure:


Ansible and AWS:


DevOps also caught my eye, but it integrates with AWS, Digital Ocean, and Linode:


Usage and Billing
The "pay as you use" subscription model for SaaS products can lead to some large, unexpected bills. If the business loads a ton of new content (data) or places a significant amount of new traffic on one particular cloud server, you won’t see it until you get the monthly invoice. There are a few vendors jumping on board to help solve this problem.


Cloud Ctrl shows usage trends, compares spending between business units and allows you to set usage thresholds and alerts. It is compatible with Azure, AWS, Google Cloud, Soft Layer, and Office 365.


Startup Meta SaaS has just come out of stealth mode after a seed investment of around $1.5 million. Their product helps you analyze your spend and usage of SaaS products, including alerting on renewal dates. It will also tell you when accounts are being left dormant, which is handy if people have left your organization and their SaaS accounts haven’t been canceled. Meta SaaS currently supports 224 SaaS vendors and is adding new integrations at a rate of 20 per week.


Over to you!

I've offered just a taste of what you can automate in the cloud. We haven’t covered the automation of account provisioning when you run a hybrid environment (with tools like Azure AD Connect in the Microsoft world), but see my previous comment regarding word count.

Would a move to the cloud make you more open to investigating automation tools? Are they a necessity in the cloud world, or just another thing that will sit on your to-do list? Do you find it easy or hard to wrap your head around things like JSON scripts, to move to a world of cloud infrastructure as code?  Let me know what you think.

Root Cause.png


I remember the largest outage of my career. Late in the evening on a Friday night, I received a call from my incident center saying that the entire development side of my VMware environment was down and that there seemed to be a potential for a rolling outage including, quite possibly, my production environment.


What followed was a weekend of finger pointing and root cause analysis between my team, the virtual data center group, and the storage group. Our org had hired IBM as the first line of defense on these Sev-1 calls. IBM included EMC and VMware in the problem resolution process as issues went higher up the call chain, and still the finger pointing continued. By 7 am on Monday, we’d gotten the environment back up and running for our user community, and we’d been able to isolate the root cause and ensure that this issue would never come again. Others, certainly, but this one was not to recur.


Have you experienced similar circumstances like this at work? I imagine that most of you have.


So, what do you do? What may seem obvious to one may not be obvious to others. Of course, you can troubleshoot the way I do. Occam’s Razor or Parsimony are my courses of action. Try to apply logic, and force yourself to choose the easiest and least painful solutions first. Once you’ve exhausted those, you move on to the more illogical, and less obvious.


Early in my career, I was asked what I’d do as my first troubleshooting maneuver for a Windows workstation having difficulty connecting to the network. My response was to save the work that was open on the machine locally, then reboot. If that didn’t solve the connectivity issue, I’d check the cabling on the desktop, then the cross-connect before even looking at driver issues.


Simple parsimony, aka economy in the use of means to an end, is often the ideal approach.


Today’s data centers have complex architectures. Often, they’ve grown up over long periods of time, with many hands in the architectural mix. As a result, the logic as to why things have been done the way that they have has been lost. As a result, the troubleshooting toward application or infrastructural issues can be just as complex.


Understanding recent changes, patching, etc., can be an excellent way to focus your efforts. For example, patching Windows servers has been known to break applications. A firewall rule implementation can certainly break the ways in which application stacks can interact. Again, these are important things to know when you approach troubleshooting issues.


But, what do you do if there is no guidance on these changes? There are a great number of monitoring software applications out there that can track key changes in the environment and can point the troubleshooter toward potential issues. I am an advocate for the integration of change management software into help desk software and would like to add to that some feed toward this operations element with some SIEM collection element. The issue here has to do with the number of these components already in place at an organization, and with that in mind, would the company desire changing these tools in favor of an all-in-one type solution, or try to cobble pieces together. Of course, it is hard to discover, due to the nature of enterprise architectural choices, a single overall component that incorporates all of the choices made throughout the history of an organization.


Again, this is a caveat emptor situation. Do the research and find out a solution that best solves your issues, determines an appropriate course of action, and helps to provide the closest to an overall solution to the problem at hand.

Data security and privacy links take center stage this week. I didn't intend for that to happen, it just did. I'm guessing we are going to see an uptick in incidents being reported, which is different than saying there is an uptick in incidents as a whole. I believe people are more cognizant of data security and privacy matters and as a result we are seeing increased reporting.


As always, here are some links from the Intertubz that I hope will hold your interest. Enjoy!


Ransomware: Best Practices for Prevention and Response

A nice summary for you to convert into a checklist in an effort to minimize your risk from being a victim of ransomware.


Fireball Malware Infects 20% of Corporate Networks Worldwide
Interesting note here: Adware can spread just as malware would, but it isn’t considered illegal. And the result of not treating adware as a virus are things like Fireball.


The seven deadly sins of statistical misinterpretation, and how to avoid them
Because the future for data professionals is data analytics, and I want you to know about these simple mistakes that are all too common.


Building a Slack bot for channel topic detection using word embeddings
And I thought I was impressed when Outlook tells me that I forgot an attachment to an email, this looks like a real value-add.


OneLogin: Breach Exposed Ability to Decrypt Data
This is why we can’t have nice things. It’s time to move away from the use of passwords.


International data privacy laws create inconsistent rules
It’s almost as if the lawyers are passing contradictory laws to make certain they have billable hours for the next ten years.


The next time you are frustrated with some piece of code I want you to stop and think about how lucky you are that you didn't need to ever lookup the 10th character of a VIN number many times a day:


SAP® recently held their annual Americas’ SAP Users’ Group (ASUG) Sapphire Now celebration in Orlando, which attracted more than 35,000 executives, subject matter experts, sales and public relations personnel, as well as a whole bunch of SAP customers. They all converged on the Orlando Convention Center for four days to celebrate, collaborate, network, and innovate. Yours truly was a speaker for the “Using the Right SAP Support Services and Tools at the Right Time” session.


The Monday afternoon event, “A Call to Lead,” kicked off the conference with special guests former First Lady Michelle Obama and former President George W. Bush leading a discussion about diversity and equality in the workplace. (George Bush is hilarious, and, like the former first lady, a wonderful and charismatic speaker.) Tuesday morning’s keynote was delivered by SAP CEO Bill McDermott, who was joined on stage by Dell® Technologies founder and CEO Michael Dell. Bert and John Jacobs, brothers and co-founders of the Life is Good® clothing line, spoke in the afternoon, ending their presentation by throwing frisbees into the crowd. Wednesday morning, Hasso Plattner co-founder and chair of SAP’s supervisory committee presented, followed by appearances by Derek Jeter and Kobe Bryant on Thursday morning. That night, the British band Muse wrapped up the conference with a special performance.


When Hasso spoke, nearly everyone in the vast conference center stopped to listen. Hasso shared his thoughts on the future of IT, technology, and business, where it is all going, and how the driving forces behind these progressions are being shaped. While SAP ERP software runs on well-recognized technology, the conference did not focus solely on technology. SAPPHIRE targeted opportunities provided by the latest technological trends that drive businesses forward. SAP, vendors, and customers heard from people in human resources, finance, operations, supply chain, IT, and more. The vendor space was immense, and crackling with energy. A wide range of vendors, representing SAP HANA, cloud, integration services, managed, pretty much what you are pitched at every conference and many that you THWACKsters recognize: Microsoft®, VMware®, Dell, Cisco®, AWS®, Google® Cloud, and many more.


So why am I blogging about this? Like most THWACK® followers and contributors, I work in IT. I care about bits and bytes and blinking lights. Apparently, there are 22-25
fellow THWACKsters who have SAP running in their environment. And while the technology is critical to the IT professional, the experienced pros have learned that it is equally important to understand the company’s goals and how their work is aligned with them. Oddly enough, I sat in on several customer presentations, and SolarWinds® was featured on more than one slide deck (spelled incorrectly – Solarwinds --  every time! Grr!). SAP and SolarWinds share a common trait. They thrive on inspiring their customers to innovate and lead. SAP’s user group, ASUG, like THWACK, is a force to be reckoned with.


So will we see SolarWinds at Sapphire next year? Or maybe SAP’s more tech-y conference, TechEd? Here’s hoping!






Hey, guys! This week I’d like to share a very recent experience. I was troubleshooting, and the information I was receiving was great, but it was the context that saved the day! What I want to share is similar to the content in my previous post, Root Cause, When You're Neither the Root nor the Cause, but different enough that I thought I'd pass it along.


This tale of woe begins as they all do, with a relatively obscure description of the problem and little foundational evidence. In this particular case it was, “The internet wasn't working on the wireless, but once we rebooted, it worked fine.” How many of us have had to deal with that kind of problem before? Obviously, all answers lead to, “Just reboot and it’ll be fine." While that’s all fine and dandy, it is not acceptable, especially at the enterprise level, because it offers no real solution. Therefore, the digging began.


The first step was to figure out if I could reproduce the problem.


I had heard that it happened with some arbitrary mobile device, so I set up shop with my MacBook, an iPad, my iPhone and my Surface tablet. Once I was all connected, I started streaming content, particularly the live YouTube stream of The Earth From Space. It had mild audio and continuous video streaming that could not buffer much or for long.


The strangest thing happened in this initial wave of troubleshooting. I was able to REPRODUCE THE PROBLEM! That frankly was pretty awesome. I mean, who could ask for more than the ability to reproduce a problem! Though the symptoms were some of the stranger parts, if you want to play along at home, maybe you can try to solve this as I go. Feel free to chime in with something like, “Ha ha! You didn’t know that?" It's okay. I’m all for a resolution.


The weirdest part of this resolution was that for devices connecting on lower wireless bands, 802.11A, 802.11N, things were working like a champ, or seemingly working like a champ. They didn’t skip a beat and were working perfectly fine. I was able to reproduce it best with the MacBook connected at 802.11AC with the highest speeds available. But seemingly, when it would transfer from one APS channel to another AP on another channel, poof, I would lose internet access for five minutes. Later, it was proven to be EXACTLY five minutes (hint).


At the time though, like any problem in need of troubleshooting, there were other issues I needed to resolve because they could have been symptoms of this problem. Support even noted that these symptoms relate to a particular problem that was all fine and dandy when adjusted in the direction I preferred.  Alas, they didn’t solve my overwhelming problem of, “Sometimes, I lose the internet for EXACTLY five minutes.” Strange, right?


So, I tuned up channel overlap, modified how frequent devices will roam to a new access point and find their new neighbor, cleaned up how much interference there was in the area, and got it working like a dream. I could walk through zones transferring from AP to AP over and over again, and life seemed like it was going great. But then, poof, it happened again. The problem would resurface, with its signature registering an EXACT five-minute timeout.


This is one of those situations where others might say, “Hey, did you check the logs?” That's the strange part. This problem was not in the logs. This problem transcended mere logs.


It wasn’t until I was having a conversation one day and said, “It’s the weirdest thing. The connection with a full wireless signal, with minimal to no interference and nothing erroneous showing in the logs would just die, for exactly five minutes.” My friend chimed in, “I experienced something similar once at an industrial yard. The problem would surface when transferring from one closet-stack to another closet-stack, and the tables for Mac Refresh were set to five minutes. You could shorten the Mac Refresh timeout, or simply tunnel these particular connections back to the controller."


That prompted an A-ha moment (not the band) and I realized, "OMG! That is exactly it." And it made sense. In the earlier phases of troubleshooting, I had noted that this was a condition of the problem occurring, but I had not put all of my stock in that because I had other things to resolve that seemed out of place. It’s not like I didn’t lean on first instincts, but it’s like when there’s a leak in a flooded basement. You see the flooding and tackle that because it’s a huge issue. THEN you start cleaning up the leak because the leak is easily a hidden signal within the noise.


In the end, not only did I take care of the major flooding damage, but I also took care of the leaks. It felt like a good day!


What makes this story particularly helpful is that not all answers are to be found within an organization and their tribal knowledge. Sometimes you need to run ideas past others, engineers within the same industry, and even people outside the industry. I can’t tell you the number of times I've talked through some arbitrary PBX problem with family members. Just talking about it out loud and explaining why I did certain things caused the resolution to suddenly jump to the surface.


What about you guys? Do you have any stories of woe, sacrifice, or success that made you reach deep within yourself to find an answer? Have you had the experience of answers bubbling to the surface while talking with others? Maybe you have other issues to share, or cat photos to share. That would be cool, too.

I look forward to reading your stories!

In this post, part of a miniseries on coding for non-coders, I thought it might be interesting to look at a real-world example of breaking a task down for automation. I won't be digging hard into the actual code but instead looking at how the task could be approached and turned into a sequence of events that will take a sad task and transform it into a happy one.


The Task - Deploying a New VLAN


Deploying a new VLAN is simple enough, but in my environment it means connecting to around 20 fabric switches to build the VLAN. I suppose one solution would be to use an Ethernet fabric that had its own unified control plane, but ripping out my Cisco FabricPath™ switches would take a while, so let's just put that aside for the moment.


When a new VLAN is deployed, it almost always also requires that a layer 3 (IP) gateway with HSRP is created on the routers and that VLAN needs to be trunked from the fabric edge to the routers. If I can automate this process, for every VLAN I deploy, I can avoid logging in to 22 devices by hand, and I can also hopefully complete the task significantly faster.


Putting this together, I now have a list of three main steps I need to accomplish:


  1. Create the VLAN on every FabricPath switch
  2. Trunk the VLAN from the edge switches to the router
  3. Create the L3 interface on the routers, and configure HSRP


Don't Reinvent the Wheel


Much in the same way that one uses modules when coding to avoid rewriting something that has been created already, I believe that the same logic applies to automation. For example, I run Cisco Data Center Network Manager (DCNM) to manage my Ethernet fabric. DCNM has the capability to deploy changes (it calls them Templates) to the fabric on demand. The implementation of this feature involves DCNM creating an SSH session to the device and configuring it just like a real user would. I could, of course, implement the same functionality for myself in my language of choice, but why would I? Cisco has spent time making the deployment process as bulletproof as possible; DCNM recognizes error messages and can deal with them. DCNM also has the logic built in to configure all the switches in parallel, and in the event of an error on one switch, to either roll back that switch alone or all switches in the change. I don't want to have to figure all that out for myself when DCNM already does it.


For the moment, therefore, I will use DCNM to deploy the VLAN configurations to my 20 switches. Ultimately it might be better if I had full control and no dependency on a third-party product, but in terms of achieving the goal rapidly, this works for me. To assist with trunking VLANs toward the routers, in my environment the edge switches facing the routers have a unique name structure, so I was also able to tweak the DCNM template so that if it detects that it is configuring one of those switches, it also adds the VLANs to the trunked list on the relevant router uplinks. Again, that's one less task I'll have to do in my code.


Similarly, to configure the routers (IOS XR-based), I could write a Python script based on the Paramiko SSH library, or use the Pexpect library to launch ssh and control the program's actions based on what it sees in the session. Alternatively, I could use NetMiko which already understands how to connect to an IOS XR router and interact with it. The latter choice seems like it's preferable, if for no other reason than to speed up development.


Creating the VLAN


DCNM has a REST API through which I can trigger a template deployment. All I need is a VLAN number and an optional description, and I can feed that information to DCNM and let it run. First, though, I need the list of devices on which to apply the configuration template. This information can be retrieved using another REST API call. I can then process the list, apply the VLAN/Description to each item and submit the configuration "job." After submitting the request, assuming success, DCNM will return the JobID that was created. That's handy because it will be necessary to keep checking the status of that JobID afterward to see if it succeeded. So here are the steps so far:


  • Get VLAN ID and VLAN Description from user
  • Retrieve list of devices to which the template should be applied
  • Request a configuration job
  • Request job status until it has some kind of resolution (Success, Failed, etc)


Sound good? Wait; the script needs to login as well. In the DCNM REST API that means authenticating to a particular URL, receiving a token (a string of characters), then using that token as a cookie in all future requests within that session. Also, as a good citizen, the script should logout after completing its requests too, so the list now reads:

  • Get VLAN ID and VLAN Description from user
  • Authenticate to DCNM and extract session token
  • Retrieve list of devices to which the template should be applied
  • Request a configuration job
  • Request job status until it has some kind of resolution (Success, Failed, etc)
  • Log out of DCNM


That should work for the VLAN creation but I'm also missing a crucial step which is to sanitize and validate the inputs provided to the script. I need to ensure, for example, that:


  • VLAN ID is in the range 1-4094, but for legacy Cisco purposes perhaps, does not include 1002-1005
  • VLAN Description must be 63 characters or less, and the rules I want to apply will only allow [a-z], [A-Z], [0-9], dash [-] and underscore [_]; no spaces and odd characters


Maybe the final list looks like this then:


  • Get VLAN ID and VLAN Description from user
  • Confirm that VLANID and VLAN Description are valid
  • Authenticate to DCNM and extract session token
  • Retrieve list of devices to which the template should be applied
  • Request a configuration job
  • Request job status until it has some kind of resolution (Success, Failed, etc)
  • Log out of DCNM


Configuring IOS XR


In this example, I'll use Python+NetMiko to do the hard work for me. My inputs are going to be:


  • IPv4 Subnet and prefix length
  • IPv6 Subnet and prefix length
  • L3 Interface Description


As before, I will sanity check the data provided to ensure that the IPs are valid. I have found that IOS XR's configuration for HSRP, while totally logical and elegantly hierarchical, is a bit of a mouthful to type out, so to speak, and as such it is great to have a script take the basic information like a subnet, and apply some standard rules to it (e.g. the 2nd IP is the HSRP gateway, e.g. .1 on a /24 subnet), the next address up (e.g. .2) would be on the A router, and .3 would be on the B router. For my HSRP group number, I use the VLAN ID.  The subinterface number where I'll be configuring layer 3 will match the VLAN ID also, and with that information I can also configure the HSRP BFD peer between the routers too. By applying some simple standardized templating of the configuration, I can take a bare minimum of information from the user and create configurations which would take much longer to create manually and quite often (based on my own experience) would have mistakes in it.


The process then might look like this:


  • Get IPv4 subnet, IPv6 subnet, VLAN ID and L3 interface description from user
  • Confirm that IPv4 subnet, IPv6 subnet, VLANID and interface description are valid
  • Generate templated configuration for the A and B routers
  • Create session to A router and authenticate
  • Take a snapshot of the configuration
  • Apply changes (check for errors)
  • Assuming success, logout
  • Rinse and repeat for B router


Breaking Up is Easy


Note that the sequences of actions above have been created without requiring any coding. Implementation can come next, in the preferred language, but if we don't have an idea of where we're going, especially as a new coder, it's likely that the project will go wrong very quickly.


For implementation, I now have a list of tasks which I can attack, to some degree, separately from one another; each one is a kind of milestone. Looking at the DCNM process again:


  • Get VLAN ID and VLAN Description from user


Perhaps this data comes from a web page but for the purposes of my script, I will assume that these values are provided as arguments to the script. For reference, an argument is anything that comes after the name of the script when you type it on the command line, e.g. in the command, John the program would see one argument, with a value of John.


  • Confirm that VLANID and VLAN Description are valid


This sounds like a perfect opportunity to write a function/subroutine which can take a VLAN ID as its own argument, and will return a boolean (true/false) value indicating whether or not the VLAN ID is valid. Similarly, a function could be written for the description, either to enforce the allowed characters by removing anything that doesn't match, or by simply validating whether what's provided meets the criteria or not. These may be useful in other scripts later too, so writing a simple function now may save time later on.


  • Authenticate to DCNM and extract session token
  • Retrieve list of devices to which the template should be applied
  • Request a configuration job
  • Request job status until it has some kind of resolution (Success, Failed, etc)
  • Log out of DCNM


These five actions are all really the same kind of thing. For each one, some data will be sent to a REST API, and something will be returned to the script by the REST API. The process of submitting to the REST API only requires a few pieces of information:


  • What kind of HTML request is it? GET / POST / etc?
  • What is the URL?
  • What data needs to be sent, if any, to the URL?
  • How to process the data returned. (What format is it in?)


It should be possible to write some functions to handle GET and POST requests so that it's not necessary to repeat the HTTP request code every time it's needed. The idea is not to repeat code multiple times if it can be more simply put in a single function and called from many places. This also means that fixing a bug in that code only requires it to be fixed in one place.


For the IOS XR configuration, each step can be processed in a similar fashion, creating what are hopefully more manageable chunks of code to create and test.


Achieving Coding Goals


I really do believe that sometimes coders want to jump right into the coding itself before taking the time to think through how the code might actually work, and what the needs will be. In the example above, I've run through taking a single large task (Create a VLAN on 20 devices and configure two attached routers with an L3 interface and HSRP) which might seem rather daunting at first, and breaking it down into smaller functional pieces so that a) it's clearer how the code will work, and in what order; and b) each small piece of code is now a more achievable task. I'd be interested to know if you as a reader feel that the task lists, while daunting in terms of length, perhaps, seemed more accomplishable from a coding perspective than just the project headline. To me, at least, they absolutely are.


I said I wouldn't dig into the actual code, and I'll keep that promise. Before I end, though, here's a thought to consider: when is it right to code a solution, and when is it not? I'll be taking a look at that in the next, and final, article in this miniseries.

By Joe Kim, SolarWinds Chief Technology Officer


Because of the Internet of Things (IoT) we're seeing an explosion of devices, from smartphones and tablets to connected planes and Humvee® vehicles. So many, in fact, that IT administrators are left wondering how to manage the deluge, particularly when it comes to ensuring that their networks and data remain secure.


The challenge is significantly more formidable than the one posed by bring-your-own-device issues when administrators only had to worry about a few mobile operating systems. This pales in comparison to the potentially thousands of IoT-related operating systems that are part of an increasingly complex ecosystem that includes devices, cloud providers, data, and more.


How does one manage such a monumental task? Here are five recommendations that should help.


1. Turn to automation


Getting a grasp on the IoT and its impact on defense networks is not a job that can be done manually, which makes automation so important. The goal is to create self-healing networks that can automatically and immediately remediate themselves if a problem arises. A self-healing, automated network can detect threats, keep data from being compromised, and reduce response and downtime.


2. Get a handle on information and events


DoD administrators should complement their automation solutions with security information and event management processes. They are monitoring solutions designed to alert administrators to suspicious activity and security and operational events that may compromise the networks. Administrators can refer to these tools to monitor real-time data and provide insight into forensic data that can be critical to identifying the cause of network issues.


3. Monitor devices and access points


Device monitoring is also extremely important. Network administrators will want to make sure that the only devices that are hitting their networks are those deemed secure. Administrators will want to be able to track and monitor all connected devices by MAC and IP address, as well as access points. They should set up user and device watch lists to help them detect rogue users and devices in order to maintain control over who and what is using their networks.


4. Get everyone on board


Everyone in the agency must commit to complying with privacy policies and security regulations. All devices must be in compliance with high-grade security standards, particularly personal devices that are used outside of the agency. The bottom line is that it’s everyone’s responsibility to ensure that DoD information stays within its network.


5. Buckle up


Understand that while IoT is getting a lot of hype, we’re only at the beginning of that cycle. Analyst firm Gartner® once predicted that there would be 13 billion connected devices by 2020, but some are beginning to wonder if that’s actually a conservative number. Certainly, the military will continue to do its part to drive IoT adoption and push that number even higher.


In other words, when it comes to connected devices, this is only the beginning of the long road ahead. DOD administrators must prepare today for whatever tomorrow might bring.


Find the full article on Defense Systems.


Firewall Logs - Part Two

Posted by Dez Employee Jun 1, 2017

In Part One of this series, I dove into the issue of security and compliance. In case you don't remember, I'm reviewing this wonderful webcast series

to stress the importance of the information presented in each. This week, I'm focusing on the firewall logs webcast.


I chose the Firewall Logs webcast for this week because it is a known and very useful way to prevent attacks. Now, my takeaway from this session is that SIEMs are fantastic ways to normalize your logs from a firewall and also your infrastructure. You guys don't need me to preach on that, I know. However, I feel like when you use health performance and network configuration management tools, you really have a better solution all the way around.


Everyone (I think) knows that I'm not one to tell you to buy or purchase just SolarWinds products! So please do NOT take this that way. I will preach about having some type of SIEM, network performance monitor (NPM), patch manager (PaM), and a solid network configuration change management (NCM) within your environment. Let me give you some information to go along with this webcast on how I would personally tie these together. 


  1. Knowing the health of your infrastructure allows you to see anomalies. When this session was discussing the mean time to detection I couldn't help but think about a performance monitor. You have to know what normal is and have a clear baseline before an attack.
  2. Think about the ACLs along with your VLANs and allowed traffic on your network devices. NCM allows you to use a real-time change notification to help you track if any outside changes are being made and shows you what was changed.  Also, using this with the approval system allows you to verify outside access and stop it in its tracks as they are not approved network config changes. This is a huge win for security.  When you also add in the compliance reports and scheduled email send-outs you are able to verify your ACLs and access based on patterns you customize to your company's needs. This is vital for documentation and also if you have any type of a change request ticketing to validate.
  3. We all know we need to be more compliant and patch our stuff! Not only to be aware of vulnerabilities but also to protect our vested interests in our environment.


Okay, so the stage is laid out and I hope you see why you need more than just a great SIEM like LEM to back, plan, and implement any type of security policies you may need. This webcast brings up great points to think about on how to secure and think about those firewalls. IMHO, if you have LEM, Jamie's demo should help you guys strengthen your installation.  Also, the way he presents this helps you to strengthen or validate any SIEM you may have in place currently.


I hope you guys are enjoying this series as much as I am. I think we should all at least listen to security ideas to help us strengthen our knowledge and skill sets. Trust me, I'm no expert or I would abolish these attacks, lol! What I am is a passionate security IT person who wants to engage different IT silos to have a simple conversation about security.


Thanks for your valuable time! Let me know what you think by posting a comment below, and remember to follow me @Dez_Sayz!



Data is a commodity.


Don’t believe me? Let’s see how the Oxford dictionary defines “commodity.”


“A thing that is useful or has a useful quality.”


No good researcher would stop at just one source. Just for fun, let’s check out this definition from Merriam-Webster:


“Something useful or valued.”


Or, this one from


“An article of trade or commerce, especially a product as distinguished from a service.”


There’s a lot of data on the definition of the word “commodity.” And that’s the point, really. Data itself is a commodity, something to be bought and sold.


And data, like commodities, comes in various forms.


For example, data can be structured or unstructured. Structured data is data that we associate with being stored in a database, either relational or non-relational. Unstructured data is data that has no pre-defined data model, or is not organized in any pre-defined way. Examples of unstructured data include things like images, audio files, instant messages, and even this word document I am writing now.


Data can be relational or non-relational. Relational data is structured in such a way that data entities have relationships, often in the form of primary and foreign keys. This is the nature of traditional relational database management systems such as Microsoft SQL Server. Non-relational data is more akin to distinct entities that have no relationships to any other entity. The key-value pairs found in many NoSQL database platforms are examples of non-relational data.


And while data can come in a variety of forms, not all data is equal. If there is one thing I want you to remember from this article it is this: data lasts longer than code. Treat it right.


To do that, we now have Azure CosmosDB.


Introduced at Microsoft Build™, CosmosDB is an attempt to make data the primary focus for everything you do, no matter where you are. (Microsoft has even tagged CosmosDB as “planet-scale,” which makes me think they need to go back and think about what “cosmos” means to most people. But I digress.)


I want you to understand the effort Microsoft is taking to the NewSQL space here. CosmosDB is a database platform as a service that can store any data that you want: key-value pair, graph, document, relational, non-relational, structured, unstructured…you get the idea.


CosmosDB is a platform as a service, meaning the admin tasks that most DBAs would be doing (backups, tuning, etc.) are done for you. Microsoft will guarantee performance, transactional consistency, high availability, and recovery.


In short, CosmosDB makes storing your data easier than ever before. Data is a commodity and Microsoft wants as big a market share as possible.


I can’t predict the future and tell you CosmosDB is going to be the killer app for cloud database platforms. But I can understand why it was built.


It was built for the data. It was built for all the data.

Filter Blog

By date: By tag:

SolarWinds uses cookies on its websites to make your online experience easier and better. By using our website, you consent to our use of cookies. For more information on cookies, see our cookie policy.