Skip navigation
1 2 3 Previous Next

Geek Speak

2,164 posts

The title of this post raises an important question and one that seems to be on the mind of everyone who works in an infrastructure role these days. How are automation and orchestration going to transform my role as an infrastructure engineer? APIs seem to be all the rage, and vendors are tripping over themselves to integrate northbound APIs, southbound APIs, dynamic/distributed workloads, and abstraction layers anywhere they can. What does it all mean for you and the way you run your infrastructure?


My guess is that it probably won’t impact your role all that much.


I can see the wheels turning already. Some of you are vehemently disagreeing with me and want to stop reading now, because you see every infrastructure engineer only interacting with an IDE, scripting all changes/deployments. Others of you are looking for validation for holding on to the familiar processes and procedures that have been developed over the years. Unfortunately, I think both of those approaches are flawed. Here’s why:


Do you need to learn to code? To some degree, yes! You need to learn to script and automate those repeatable tasks that you can save time being run via script. The thing is, this isn’t anything new. If you want to be an excellent infrastructure engineer, you’ve always needed to know how to script and automate tasks. If anything, this newly minted attention being placed on automation should make it less of an effort to achieve (anyone who’s had to write expect scripts for multiple platforms should be nodding their head at this point). A focus on automation doesn’t mean that you just now need to learn how to use these tools. It means that vendors are finally realizing the value and making this process easier for the end-user. If you don’t know how to script, you should pick a commonly used language and start learning it. I might suggest Python or PowerShell if you aren’t familiar with any languages just yet.


Do I need to re-tool and become a programmer?  Absolutely not! Programming is a skill in and of itself, and infrastructure engineers will not need to be full-fledged programmers as we move forward. By all means, if you want to shift careers, go for it. We need full-time programmers who understand how infrastructure really works. But, automation and orchestration aren’t going to demand that every engineer learn how to write their own compilers, optimize their code for obscure processors, or make their code operate across multiple platforms. If you are managing infrastructure through scripting, and you aren’t the size of Google, that level of optimization and reusability isn’t going to be necessary to see significant optimization of your processes. You won’t be building the platforms, just tweaking them to do your will.


Speaking of platforms, this is the main reason why I don’t think your job is really going to change that much. We’re in the early days of serious infrastructure automation. As the market matures, vendors are going to be offering more and more advanced orchestration platforms as part of their product catalog. You are likely going to interface with these platforms via a web front end or a CLI, not necessarily through scripts or APIs. Platforms will have easy-to-use front ends with an engine on the back end that does the scripting and API calls for you. Think about this in the terms of Amazon AWS. Their IaaS products are highly automated and orchestrated, but you primarily control that automation from a web control panel. Sure, you can dig in and start automating some of your own calls, but that isn’t really required by the large majority of organizations. This is going to be true for on-premises equipment moving forward as well.


Final Thoughts


Is life for the infrastructure engineer going to drastically change because of a push for automation? I don’t think so. That being said, scripting is a skill that you need in your toolbox if you want to be a serious infrastructure engineer. The nice thing about automation and scripting is that it requires predictability and standardization of your configurations, and this leads to stable and predictable systems. On the other hand, if scripting and automation sound like something you would enjoy doing as the primary function of your job, the market has never been better or had more opportunities to do it full time. We need people writing code who have infrastructure management experience.


Of course, I could be completely wrong about all of this, and I would love to hear your thoughts in the comments either way.

You may be wondering why, after creating four blog posts encouraging non-coders to give it a shot, select a language and break down a problem into manageable pieces, I would now say to stop. The answer is simple, really: not everything is worth automating (unless, perhaps, you are operating at a similar scale to somebody Amazon).


The 80-20 Rule


Here's my guideline: figure out what tasks take up the majority (i.e. 80%) of your time in a given time period (in a typical week perhaps). Those are the tasks where making the time investment to develop an automated solution is most likely to see a payback. The other 20% are usually much worse candidates for automation where the cost of automating it likely outweighs the time savings.


As a side note, the tasks that take up the time may not necessarily be related to a specific work request type. For example, I may spend 40% of my week processing firewall requests, and another 20% processing routing requests, and another 20% troubleshooting connectivity issues. In all of these activities, I spend time identifying what device, firewall zone, or VRF various IP addresses are in, so that I can write the correct firewall rule, or add routing in the right places, or track next-hops in a traceroute where DNS is missing. In this case, I would gain the most immediate benefits if I could automate IP address research.


I don't want to be misunderstood; there is value in creating process and automation around how a firewall request comes into the queue, for example, but the value overall is lower than for a tool that can tell me lots of information about an IP address.


That Seems Obvious


You'd think that it was intuitive that we would do the right thing, but sometimes things don't go according to plan:


Feeping Creatures!


Once you write a helpful tool or an automation, somebody will come back and say, Ah, what if I need to know X information too? I need that once a month when I do the Y report. As a helpful person, it's tempting to immediately try and adapt the code to cover every conceivable corner case and usage example, but having been down that path, I counsel against doing so. It typically makes the code unmanageably complex due to all the conditions being evaluated and worse, it goes firmly against the 80-20 rule above. Feeping Creatures is a Spoonerism referring to Creeping Features, i.e. an always expanded feature list for a product.


A Desire to Automate Everything


There's a great story in What Do You Care What Other People Think (Richard Feynman) that talks about Mr. Frankel, who had developed a system using a suite of IBM machines to run the calculations for the atomic bomb that was being developed at Los Alamos.


"Well, Mr. Frankel, who started this program, began to suffer from the computer disease that anybody who works with computers now knows about. [...] Frankel wasn't paying any attention; he wasn't supervising anybody. [...] (H)e was sitting in a room figuring out how to make one tabulator automatically print arctangent X, and then it would start and it would print columns and then bitsi, bitsi, bitsi, and calculate the arc-tangent automatically by integrating as it went along and make a whole table in one operation.


Absolutely useless. We had tables of arc-tangents. But if you've ever worked with computers, you understand the disease -- the delight in being able to see how much you can do. But he got the disease for the first time, the poor fellow who invented the thing."


It's exciting to automate things or to take a task that previously took minutes, and turn it into a task that takes seconds. It's amazing to watch the 80% shrink down and down and see productivity go up. It's addictive. And so, inevitably, once one task is automated, we begin looking for the next task we can feel good about, or we start thinking of ways we could make what we already did even better. Sometimes the coder is the source of creeping features.


It's very easy to lose touch with the larger picture and stay focused on tasks that will generate measurable gains. I've fallen foul of this myself in the past, and have been delighted, for example, with a script I spent four days writing, which pulled apart log entries from a firewall and ran all kinds of analyses on it, allowing you to slice the data any which way and generate statistics. Truly amazing! The problem is, I didn't have a use for most of the stats I was able to produce, and actually I could have fairly easily worked out the most useful ones in Excel in about 30 minutes. I got caught up in being able to do something, rather than actually needing to do it.


And So...


Solve A Real Problem


Despite my cautions above, I maintain that the best way to learn to code is to find a real problem that you want to solve and try to write code to do it. Okay, there are some cautions to add here, not the least of which is to run tests and confirm the output. More than once, I've written code that seemed great when I ran it on a couple of lines of test data, but then when I ran it on thousands of lines of actual data, I discovered oddities in the input data, or in the loop that processes all the data reusing variables carelessly or similar. Just like I tell my kids with their math homework, sanity check the output. If a script claims that a 10Gbps link was running at 30Gbps, maybe there's a problem with how that figure is being calculated.


Don't Be Afraid to Start Small


Writing a Hello World! script may feel like one of the most pointless activities you may ever undertake, but for a total beginner, it means something was achieved and, if nothing else, you learned how to output text to the screen. The phrase, "Don't try to boil the ocean," speaks to this concept quite nicely, too.


Be Safe!


If your ultimate aim is to automate production device configurations or orchestrate various APIs to dance to your will, that's great, but don't start off by testing your scripts in production. Use device VMs where possible to develop interactions with different pieces of software. I also recommend starting by working with read commands before jumping right in to the potentially destructive stuff. After all, after writing a change to a device, it's important to know how to verify that the change was successful. Developing those skills first will prove useful later on.


Learn how to test for, detect, and safely handle errors that arise along the way, particularly the responses from the devices you are trying to control. Sanitize your inputs! If your script expects an IPv4 address as an input, validate that what you were given is actually a valid IPv4 address. Add your own business rules to that validation if required (e.g. a script might only work with 10.x.x.x addresses, and all other IPs require human input). The phrase Garbage in, garbage out, is all too true when humans provide the garbage.


Scale Out Carefully


To paraphrase a common saying, automation allows you to make mistakes on hundreds of devices much faster that you could possibly do it by hand. Start small with a proof of concept, and demonstrate that the code is solid. Once there's confidence that the code is reliable, it's more likely to be accepted for use on a wider scale. That leads neatly into the last point:


Good Luck Convincing People


It seems to me that everybody loves scripting and automation right up to the point where it needs to be allowed to run autonomously. Think of it like the Google autonomous car: for sure, the engineering team was pretty confident that the code was fairly solid, but they wouldn't let that car out on the highway without human supervision. And so it is with automation; when the results of some kind of process automation can be reviewed by a human before deployment, that appears to be an acceptable risk from a management team's perspective. Now suggest that the human intervention is no longer required, and that the software can be trusted, and see what response you get.


A coder I respect quite a bit used to talk about blast radius, or what's the impact of a change beyond the box on which the change is taking place? Or what's the potential impact of this change as a whole? We do this all the time when evaluating change risk categories (is it low, medium, or high?) by considering what happens if a change goes wrong. Scripts are no different. A change that adds an SNMP string to every device in the network, for example, is probably fairly harmless. A change that creates a new SSH access-list, on the other hand, could end up locking everybody out of every device if it is implemented incorrectly. What impact would that have on device management and operations?




I really recommend giving programming a shot. It isn't necessary to be a hotshot coder to have success (trust me, I am not a hotshot coder), but having an understanding of coding will, I believe, will positively impact other areas of your work. Sometimes a programming mindset can reveal ways to approach problems that didn't show themselves before. And while you're learning to code, if you don't already know how to work in a UNIX (Linux, BSD, MacOS, etc.) shell, that would be a great stretch goal to add to your list!


I hope that this mini-series of posts has been useful. If you do decide to start coding, I would love to hear back from you on how you got on, what challenges you faced and, ultimately, if you were able to code something (no matter how small) that helped you with your job!


The Actuator - June 28th

Posted by sqlrockstar Employee Jun 28, 2017

I'm in Vancouver, BC this week, and jet lag has given me some extra hours to get this edition of the Actuator done. Maybe while I am here I can ask Canadians how they feel about Amazon taking over the world. My guess is they won't care as long as hockey still exists, and they can get equipment with Prime delivery.


As always, here are some links from the Intertubz that I hope will hold your interest. Enjoy!


Windows 10 Source Code Leak is a Minor Incident at Best

Seems to be some inconsistent information regarding the Windows "leak" last week.


Recycled Falcon 9 rocket survives one of SpaceX's most challenging landings yet

I enjoy watching SpaceX make the possibility of regular space travel a real thing to happen in my lifetime.


The RNC Files: Inside the Largest US Voter Data Leak

Including this link not for the politics, but for the shoddy data security. Reminder: data is valuable; treat it right.


Is Continuing to Patch Windows XP a Mistake? -

This is a great question. I really believe that it *is* a mistake, but I also understand that Microsoft wants to help do what they can to keep users safe.


Amazon Is Trying to Control the Underlying Infrastructure of Our Economy

I'm not trying to alarm you or anything with this headline, but in related news, Walmart can be a jerk, too.


This Company Is Designing Drones To Knock Out Other Drones

Because they read the other article and know that this is the only way to save humanity.


Harry Potter: How the boy wizard enchanted the world

Because it's the 20th anniversary and I've enjoyed watching my kids faces light up when they watch the movies.


Vancouver is one of my favorites cities to visit and jog along the water:


FullSizeRender 1.jpg

By Joe Kim, SolarWinds EVP, Engineering & Global CTO


There continues to be pressure on government IT to optimize and modernize, and I wanted to share a blog written in 2016 by my SolarWinds colleague, Mav Turner.


Federal IT professionals are in the midst of significant network modernization initiatives that are fraught with peril. Modernizing for the cloud and achieving greater agility are attractive goals, but security vulnerabilities can all too easily spring up during the modernization phase.


A path paved with danger, ending in riches


Last year, my company, SolarWinds, released the results of a Federal Cybersecurity Survey showing that the road to modernization is marked with risk. Forty-eight percent of respondents reported that IT consolidation and modernization efforts have led to an increase in IT security issues. These primarily stem from incomplete transitions (according to 48 percent of respondents), overly complex management tools (46 percent), and a lack of training (44 percent).


The road to modernization can potentially lead to great rewards. Twenty-two percent of respondents actually felt that modernization can ultimately decrease security challenges. Among those, 55 percent cited the benefits of replacing old, legacy software, while another 52 percent felt that updated equipment offered a security advantage. Still more (42 percent) felt that newer software was easier to use and manage.


The challenge is getting there. As respondents indicated, issues are more likely to occur in the transitional period between out-with-the- old and in-with-the-new. During this precarious time, federal administrators need to be hyper aware of the dangers lurking just around the corner.


Here are a few strategies that can help.


Invest in training


Federal IT professionals should not trust their legacy systems or modern IT tools to someone without the proper skill sets or knowledge.


Workers who do not understand how to use, manage, and implement new systems can be security threats in themselves. Their inexperience can put networks and data at risk. Agencies must invest in training programs to help ensure that their administrators, both new and seasoned, are familiar with the deployment and management of modern solutions.


Maximize the budget


If the money is there, it’s up to federal CIOs to spend it wisely. Some funds may go to the aforementioned training, while others may go to onboarding new staff. Yet another portion could go to investing in new technologies that can help ease the transition from legacy to modernized systems.


Avoid doing too much at once


That transition should be gradual, as a successful modernization strategy is built a win at a time.


Administrators should start upgrades with a smaller set of applications or systems, rather than an entire infrastructure. As upgrades are completed, retrospective analyses should be performed to help ensure that any security vulnerabilities that were opened during the transition are now closed. Connected systems should be upgraded simultaneously. Further analyses should focus on length of time for the transition, number of staff required, and impact on operations, followed by moving on to the next incremental upgrade.


Find the full article on Federal News Radio.


I realized that I do so many of these "travelog" type posts that, like Tom's "Actuator" newsletter, I might as well have a snappy name to go with it. So here we are.


Next week I'll be heading out to Las Vegas with 20 of my SolarWinds peeps (including patrick.hubbard and Dez ) for a week of madness called CiscoLive. So what are my goals for this trip, besides claiming my very own NetVet badge and avoiding bursting into flame whenever I walk outside? (The weather forecasts temperatures between 105 and 111 F°.)


First off, CiscoLive kicks off once again with the unofficial event known as #KiltedMonday. While I will NOT be sporting a kilt, I have acquired a scarf woven with the official Jewish tartan and will be wearing it proudly...


...along with socks. Because #SocksOfCLUS are also A Thing, and we at SolarWinds are very proud to be offering one of three new patterns for anyone who shows up and registers for a THWACK account.


Jumping ahead, a lot of my focus is going to be on Wednesday morning, when Destiny and I will sit for Cisco exams. I'm renewing my CCNA and Destiny is going for the CCNA+Security cert - because of course she is! While I cannot wait to talk to people in the booth, I admit that whenever there's a quiet moment I will probably be huddled in a corner reviewing ACLs, OSPF routing, and IPv6.


Finally, I'm looking forward to renewing old friendships that have become a yearly tradition. I hope to get to spend time on the dCloudCouch with Silvia Spiva (@silviakspiva) and Anne Robotti (@arobotti); attempting to get in a podcast with both Lauren Friedman (@lauren) and also Amy Lewis (@CommsNinja), and finding a minute to chill with Roddie Hasan (@eiddor).


And, of course, that ignores the fact that 20 SolarWinds folks will all be in the same place at the same time for more than 30 minutes in a row. So you can expect jokes, videos, tweets, interviews, and more!


Next week I'll let you know what I actually DID see, as well as anything unexpected that I happened across.


User, Help Thyself

Posted by scuff Jun 23, 2017

I’m going to switch gears on the automation topic now. It’s natural to think of scripts, packages, images, tools, triggers, and actions when you think of automating IT tasks. We automate our technical things with more tech. But what if we removed ourselves from the equation some other way? Are there tasks that we are holding onto that we could use to empower other, non-technical humans to do instead?


Don’t rush off to outsource your monitoring checks to someone on Fiverr. Instead, we’re going to talk about what we can get our users to do for themselves, without breaking anything.


Self-service password reset - In a cloud SaaS world, we’re used to a lovely little “Forgot your password?” link that will send a reset link to our recorded email address. For better security, you’d want 2FA or some secret questions as well. Microsoft’s Azure Active Directory lets you enable this for your users and, for a change, it’s not on by default. If you have directory synchronization and password write-back turned on, presto! Your users have just reset their own on-premises AD password, too. If your AD isn’t connect to the cloud, a ton of third-party vendors jumped on this need to create paid tools of their own to achieve this for you. Might be worth a look if you have high help desk stats for password resets.


AD automation – While we’re on the subject of AD, how manual is your process for creating new user accounts? Have you played with CSVDE or PowerShell as a scripted input method? Could you take that to the next level and wrap a workflow around it that gets HR to enter the correct data (first name, last name, role, department, etc.) that could then feed into your script and run an automation user creation process? There are third- party tools that handle this, including the workflow and an approval step.


Azure Active Directory Premium also offers dynamic group membership. You can set attributes on a user object (such as, Department) and have a group that queries AAD and automatically adds/removes members based on that attribute, for access to resources. Now, if you could automate HR submitting a web form that changes the Department value in AD, you are now hands-off. Sounds good in theory, but is anyone using it?


Chatbots as help desks – We've previously talked about chatbots saving the world (or not). They are good at providing answers to FAQ-style queries, for sure. Facebook for Work and Microsoft Teams certainly think it’s important to support bots in their collaboration tools. Has anyone replaced their help desk with a bot, yet? Are users helping themselves with this modern day Clippy replacement? Or is this tech gone mad?


Aside from the bots, are we seeing collaboration tools enabling users to help each other with questions they may otherwise call the help desk for? Are we using those tools to communicate current IT known issues, to reduce incoming call volumes? Is it working?


Let me know if you’ve managed to automate yourself out of a process by enabling someone else to do it, instead.


IT Right Equals Might?

Posted by kong.yang Employee Jun 23, 2017

If I learned anything from Tetris, it’s that errors pile up and accomplishments disappear.

– Andrew Clay Shafer (@littleidea on Twitter).


In IT, we make our money and maintain our job by being right. And we have to be right more often than not because the one time we are wrong might cost us our job. This kind of pressure can lead to a defensive, siloed mentality. If might equals right, then look for an IT working environment that is conducive to hostilities and sniping.


I’ve witnessed firsthand the destructive nature of a dysfunctional IT organization. Instead of working as a cohesive team, that team was one in which team members would swoop in to fix issues only after a colleague made a mistake. It was the ultimate representation of trying to rise to the top over the corpses of colleagues. Where did it all go wrong? Unfortunately, that IT organization incentivized team members to outdo one another for the sake of excellent performance reviews and to get ahead in the organization. It was a form of constant hazing. There were no mentors to help guide young IT professionals to differentiate between right and wrong.


Ultimately, it starts and ends with leadership and leaders. If leaders allow it, bad behaviors will remain pervasive in the organization’s culture. Likewise, leaders can nip such troubling behavior in the bud if they are fair, firm, and consistent. That IT team’s individual contributors were eventually re-organized and re-assigned once their leaders were dismissed and replaced.


Rewards and recognition come and go. Sometimes it’s well-deserved and other times we don’t get the credit that’s due. Errors, failures, and mistakes do happen. Don’t dwell on them. Continue to [ learn and ] move forward. A career in IT is a journey and a long one at that. Mentees do have fond memories of mentors taking the time to help them become a professional. Lastly, remember that kindness is not weakness, but rather an unparalleled kind of strength.


I’ve attempted to locate a manager or company willing to commit to the pretense of corporate pushback against a hybrid mentality. I’ve had many conversations with customers who’ve had this struggle within their organizations, but few willing to go on record.


As a result, I’m going to relate a couple of personal experiences, but I’m not going to be able to commit customer references to this.


Let’s start with an organization I’ve worked with a lot lately. They have a lot of data of an unstructured type, and our goal was to arrive at an inexpensive “SMB 3.0+” storage format that would satisfy this need. We recommended a number of cloud providers, both hybrid and public, to help them out. The pushback came from their security team, who’d decided that compliance issues were a barrier to going hybrid. Obviously, most compliance issues have been addressed. In the case of this company, we, as a consultative organization, were able to make a good case for both the storage of the data, the costs, and an object-based model for access from their disparate domains. As it turned out, this particular customer chose a solution that placed a compliant piece of storage on-premises that could satisfy its needs, but as a result of the research we’d submitted for them, their security team agreed to be more open in the future to these types of solutions.


Another customer had a desire to launch a new MRP application and was evaluating hosting the application in a hybrid mode. In this case, the customer had a particular issue with relying on the application being hosted entirely offsite. As a result, we architected a solution wherein the off-prem components were designed to augment the physical/virtual architecture built for them onsite. This was a robust solution that ensured a guarantee of uptime for the environment with a highly available approach toward application uptime and failover. In this case, just what the customer had requested. The pushback in this solution wasn’t one of compliance because the hosted portion of the application would lean on our highly available and compliant data center for hosting. They objected to the cost, which seemed to us to be a reversal of their original mandate. We’d provided a solution based on their requests, but they changed that request drastically. In their ultimate approach, they chose to host the entire application suite in a hosted environment. Their choice to move toward a cloudy environment for the application, in this case, was an objection to the costs of their original desired approach.


Most of these objections were real-world, and decisions that our customers had sought. They’d faced issues they had not been entirely sure were achievable. In these cases, pushback came in the form of either security or cost concerns. I hoped we had delivered solutions that met their objections and helped the customers achieve their goals.


It’s clear that the pushback we’d received was due to known or unknown real-world issues facing their business. In the case of the first customer, we’d been engaged to solve their issues regardless of objections, and we found them a storage solution that gave them the best on-premises solution for them. But in the latter, by promoting a solution that was geared toward satisfying all they’d requested, we were bypassed in favor of a lesser solution provided by the application vendor. You truly win some and lose some.


Have you experienced pushback in similar situations? I'd love to hear all about it.


The Actuator - June 21st

Posted by sqlrockstar Employee Jun 21, 2017

Hope everyone enjoyed Father's Day this past weekend, and that your day was filled with good food and good times with family. This week's Actuator is timed with the summer solstice, the longest day of the year. But as any SysAdmin knows, the longest day of the year is any day you are working with XML.


As always, here are some links from the Intertubz that I hope will hold your interest. Enjoy!


Amazon to buy Whole Foods for $13.7 billion, wielding online might in brick-and-mortar world

In the biggest news story last week, Amazon agreed to purchase Whole Foods. I am cautiously optimistic for what this could mean with regards to world hunger. By purchasing Whole Foods, Amazon gets a brand name AND a distribution channel they need not build themselves. Combined with drone delivery, Amazon could find a way to provide food to remote locations. Heck, if Amazon partners with a real estate company such as McDonald's (who already feeds 1% of the world each day), Amazon could be feeding 5% of the global population within ten years.


Divide and Conquer: How Microsoft Researchers Used AI to Master Ms. Pac-Man

Good news for SkyNet fans, we've now created AI smart enough to defeat video games. It won't be long now before the AI decides that the best way to win is to not play and instead eliminate the game creators.


Complete list of wifi routers from WikiLeaks' Cherry Blossom release detailing CIA hacking tools

If your home router is on this list, you might want to make sure you've protected yourself against the exploits that have been publicly released.


Forget Autonomous Cars—Autonomous Ships Are Almost Here

And now I have something else to write about other than just autonomous cars. Autonomous ships!


Marissa Mayer Bids Adieu to Yahoo

Only in America can someone be given the opportunity to run an already failing corporation into the ground and then walk away with a quarter of a billion dollars.


Block Untrusted Apps Using AppLocker

For anyone looking to add an extra layer of protection against malware. As much as I know users are a large security surface area to control, I also know that a lot of SysAdmins take and run scripts they find from internet help forums. Running random scripts you find on blogs are also a risk. Be careful out there, folks.


20 Percent of Users Still Don’t Know about Phishing or Ransomware, Reveals Survey

That 20% seems like a low estimate, IMO.


For all the fathers out there:


Screen Shot 2017-06-20 at 12.47.16 AM.png


IT professionals are a hardworking group. We carry a lot of weight on our shoulders, a testament to our past and future successes. Yet, sometimes we have to distribute that weight evenly across the backs of others. No, this is not because we don’t want to do something. I’m sure that any of you, while capable of performing a task, would never ask another person to do something you wouldn’t willingly do yourself. No. Delegating activities to someone else is actually something we all struggle with.


Trust is a huge part of delegating. You're not only passing the baton of what needs to be done to someone else, but you’re also trusting that they’ll do it as well as you would, as quickly as you would, and -- this is the hard part -- that they'll actually do it.


As the world continues to evolve, transition, and hybridize, we are faced with this challenge more often. I’ve found there are some cases where delegation works REALLY well, and other cases where I’ve found myself banging my head against the wall, desk, spiked mace, etc. You know the drill.


One particular success story that comes to mind involves the adoption of Office 365. Wow! My internal support staff jumped for joy the day that was adopted. They went from having to deal with weird, awkward, and ridiculous Exchange or Windows server problems on a regular basis to... crickets. Sure, there were and still are some things that have to be dealt with, but it went from daily activity to monthly activity. Obviously, any full-time Exchange admin doesn't want to be replaced by Robot365, but if it's just a small portion of your administrative burden that regularly overwhelms, it's a good bet that delegating is a good idea. In this particular use-case, trust and delegation led to great success.


On the other hand, I’ve seen catastrophes rivaled only by the setting of a forest fire just for the experience of putting it out. I won’t name names, but I've had rather lengthy conversations with executives from several cloud service providers we all know and (possibly) love. Because I’m discussing trust and delegation, let’s briefly talk about what we end up trusting and delegating in clouds.


  • I trust that you won’t deprecate the binaries, libraries, and capabilities that you offer me
  • I trust that you won’t just up and change the features that I use and my business depends on
  • I trust that when I call and open a support case, you’ll delegate activities responsibly and provide me with regular updates, especially if the ticket is a P1


This is where delegating responsibility and trusting someone to act in your best interest versus the interests of themselves or some greater need beyond you can be eye-opening.


I’m not saying that all cloud service providers are actively seeking to ruin our lives, but if you talk to some of the folks I do and hear their stories, THEY might be the one to say that. This frightful tale is less about the fear and doubt of what providers will offer you, and more about being aware and educated about the things that could possibly happen, especially if you aren’t fully aware of the bad things that happen on the regular.


In terms of trust and delegation, cloud services should provide you with the following guarantees:

  • Trust that they will do EXACTLY what they say they will do, and nothing less. Make sure you are hearing contractual language around that guarantee versus marketing speak. Marketing messages can change, but contracts last until they expire.
  • Trust that things DO and WILL change, so be aware of any depreciation schedules, downtime activities, impacts, overlaps of changes, and dependencies that may lie within your business.
  • Delegate to cloud services only those tasks and support that may not matter to your production business applications. You want to gauge how well they can perform and conform to an SLA. It’s better to be disappointed early on when things don’t matter than to be in a fire-fight and go looking for support that may never come to fruition.


This shouldn't be read as an attack or assault on cloud services. Instead, view this as being more about enlightenment. If we don’t help make them better support organizations, they won’t know to and will not improve. They currently function on a build-it-and-they-will-come support model, and if we don’t demand quality support, they have no incentive to give it to us.


Wow! I went from an OMG Happy365 scenario to cloudy downer!


But what about you? What kinds of experiences with trust and delegation have you had? Successes? Failures? I’ll offer up some more of my failures in the comments if you’re interested. I would love to hear your stories, especially if you've had contrary experiences with cloud service providers. Have they gone to bat for you, or left you longing for more?


IT Is Everywhere

Posted by shidoshi1000 Employee Jun 20, 2017

By Joe Kim, SolarWinds Chief Technology Officer


This is evident from two surveys we conducted last year. First, we asked more than 800 employed, non-IT adult end-users in North America a series of questions about how they use technology at work, and the types of technologies being used within their organizations. We also asked more than 200 IT professionals to give their impressions on these end-users’ expectations. Here’s a sample of what we found:


Users are taking IT everywhere. Forty-seven percent of end-user respondents said they connect more electronic devices, whether personally or company-owned, to their employers’ networks than they did 10 years ago. In fact, they connect an average of three more devices than they did a decade ago, two of which they own themselves.


The cloud has taken IT outside the agency. Most organizations allow some form of cloud-based applications, such as Google® Drive or Dropbox®, and 53 percent of respondents said they use these applications at work. Forty-nine percent said they regularly use work-related applications outside the office, on either personally or company-owned devices. Our survey also found that end-users will occasionally use non-IT-sanctioned cloud applications, such as iTunes® or something similar, while at work.


IT professionals must manage technology that may be outside their comfort zones. They must be versed in cloud-driven applications, mobile devices, open source software, and, increasingly, hybrid IT environments that incorporate aspects of on-premises and outsourced components. They must also continually be aware of and monitor the security risks that these solutions – and the actions of end- users – can present, adding one more layer of complexity to an already intricate set of concerns.


Eighty-seven percent of end-user respondents said they expect their organizations’ IT professionals to help ensure the performance of the cloud-based applications they use at work. Further, 68 percent blamed their IT professionals if these applications did not work correctly (“Dropbox isn’t working! Someone call IT!”).


According to the IT is Everywhere survey, 62 percent of IT professional respondents felt that the expectation to support users’ personally-owned devices on their networks is significantly greater than it was 10 years ago. Meanwhile, 64 percent of IT professionals said that end-users expect the same time to resolution for issues with both cloud-based and local applications. The inference is that users do not draw a distinction between cloud and on-premises infrastructures, despite the many differences between the two, and the fact that hybrid IT operations can be exceedingly complex and difficult to manage.


All of this is to say that IT is indeed everywhere. It’s in our offices and homes. It’s on our desktops and smartphones. It’s onsite and in the cloud.


IT professionals are constantly on deck to help ensure always-on availability and optimal performance, regardless of device, platform, application, or infrastructure. The end-users don’t care, as long as things are working.


Find the full article on GovLoop.

As technology professionals, we live in an interruption-driven world; responding to incidents is part of the job. All of our other job duties go out the window when a new issue hits the desk. Having the right information and understanding the part it plays in the organization is key to handling these incidents with speed and accuracy. This is why it's critical to have the ability to compare apples-to-apples when it comes to the all-important troubleshooting process.


What is our job as IT professionals?

Simply put, our job is to deliver services to end-users. It doesn't matter if those end-users are employees, customers, local, remote, or some combination of these. This may encompass things as simple as making sure a network link is running without errors, a server is online and responding, a website is handling requests, or a database is processing transactions. Of course, for most of us, it's not a single thing, it's a combination of them. And considering the fact that 95 percent of organizations report having migrated critical applications and IT infrastructure to the cloud over the past year, according to the SolarWinds IT Trends Report 2017, visibility into our infrastructure is getting increasingly murky.


So, why does this matter? Isn't it the responsibility of each application owner to make sure their portion of the environment is healthy? Yes and no. Ultimately, everyone is responsible for making sure that the services necessary for organizational success are met. Getting mean time to resolution (MTTR) down requires cooperation, not hostility. Blaming any one individual or team will invariably lead to a room full of people pointing fingers. This is counterproductive and must be avoided. There is a better way: prevention via comprehensive IT monitoring.


Solution silos

Monitoring solutions come in all shapes and sizes. Furthermore, they come with all manner of targets. We can use solutions specific to vendors or specific to infrastructure layers. A storage administrator may use one solution while a virtualization and server administrator may use another and the team handling website performance a third solution. And, of course, none of these tools may be applicable to the database administrators.

At best, monitoring infrastructure with disparate systems can be confusing, at worst, it can be downright dangerous. Consider the simple example of a network monitoring solution seeing traffic moving to a server at 50 megs/second, but the server monitoring solution sees incoming traffic at 400 megs/second. Which one is right? Maybe both of them, depending on if they mean 50 MBps and 400 Mbps. This is just the start of the confusion. What happens if your virtualization monitoring tool reports in Kb/sec and your storage solution reports in MB/sec? Also, when talking about kilos, does it mean 1,000 or 1,024?


You can see how the complexity of analyzing disparate metrics can very quickly grow out of hand. In the age of hybrid IT, this gets even more complex since cloud monitoring is inherently different than monitoring on-premises resources. You shouldn't have to massage the monitoring data you receive when troubleshooting a problem. That only serves to lengthen MTTR.


Data normalization

In the past, I've worked in environments with multiple monitoring solutions in place. During multi-team troubleshooting sessions, we've had to handle the above calculations on the fly. Was it successful?  Yes, we were able to get the issue remedied. Was it as quick as it should have been? No, because we were moving data into spreadsheets, trying to align timestamps, and calculating differences in scale (MB, Mb, KB, Kb, etc.). This is what I mean by data normalization: making sure everyone is on the same page with regard to time and scale.


Single pane of glass

Having everything you need in one place with the timestamps lined up and everything reporting with the same scale — a single pane of glass through which you see your entire environment — is critical to effective troubleshooting. Remember, our job is to provide services to our end-users and resolve issues as quickly as possible. If we spend the first half of our troubleshooting time trying to line up data, are we really addressing the problem?


About the Post

This is a cross-post from my personal blog @ [Link].



At least once a week I read or hear the familiar refrain, “SQL Server® is a memory hog,” or “SQL Server uses all the memory.” If you, or anyone you know, are saying these things, I am here today to tell you something.




Just no.


Stop. Saying. This.


It’s like hearing fingernails on a chalkboard when people say such things. It’s time to put an end to this myth.


And, as always, you’re welcome.


SQL Server Is a Software Program

That’s right. SQL Server is a piece of software. And software programs are good at doing what they have been programmed to do. Typically, software programs are programmed and configured by humans.


That’s where you come in, my fellow humans.


SQL Server will, by design, read data pages from disk into memory. SQL Server will store as many pages as you tell it to store, and will only evict them from memory as needed. My conclusion, for which I’ve done no research, is that 95% of the people complaining about SQL Server using all the memory on a server are 100% responsible for not configuring SQL Server memory properly.


And there’s the crux of the problem. The people that don’t understand how SQL Server uses memory also don’t understand that it is up to them to decide how much memory SQL Server will use.


That’s the #hardtruth folks. It’s been you all along.


(Editor’s note: I think Oracle®/UNIX® folks don’t have these complaints about memory because Windows® makes it easier to see memory consumption in Task Manager. Perhaps this myth would have died a long time ago if it weren’t for giving RDP access to people who don’t understand how SQL Server works, or that Task Manager is a dirty, filthy liar. But I digress.)


95 < 100

For those Generation Next-ers out there (people who install software by clicking Next-Next-Finish), you should know that SQL Server will not try to use all the available memory for data pages. The default setting allows for SQL Server to dynamically manage the memory consumption and it will not allocate more than 95% of the total physical memory.


For those of us old experienced enough to remember database servers with 8GB of RAM, that 95% is close enough to “all” for appearance's sake. And SQL Server has other memory needs than just database pages. Over the years, we have seen different data objects share the buffer cache with data pages. These days we can query the sys.dm_os_memory_clerks dynamic management view to find out how much of our memory is assigned to the various memory clerks.


The bottom line is that 95% is not 100%. SQL Server will not try to use all the memory by default. And setting the minimum memory will not cause SQL Server to start allocating memory, either. SQL Server will not allocate pages without being asked to do so.


By a human, most likely.


Deciding the Max Memory Setting

Assuming you have gotten this far, you understand that you are responsible for how much memory SQL Server will use. The next logical question becomes: What should the max memory be set to by default?


I have no idea. And neither does anyone else. If someone tells you they know exactly how much memory your SQL Server needs they are either (1) lying, (2) trying to sell you something, or (3) both.


There is no shortage of formulas out there for prognosticating the initial amount of memory to set as a max value for SQL Server. I’ve even seen suggestions that you use the size of a database to guess at your max memory setting. That’s absurd. It’s not the size of the database that determines the amount of memory needed, it’s the workload that matters.


Here is the formula I offer to clients and customers that ask for help with finding a max memory setting. This formula assumes you are trying to right-size the memory for a dedicated database server (engine only, no SSAS, SSRS, SSIS, etc., or any other significant applications), and this is for physical servers (but holds mostly true for virtualized servers, too):


• Take physical memory (say, 128 GB RAM)

• Subtract memory for the O/S itself (1 GB for every 8 GB of RAM; 16 GB in this example)

• Subtract memory needed for thread stack size (the number of worker threads multiplied by thread size; typical example for a x64, 4 CPU system would be 512 * 2 = 1024 MB, or 1 GB in our example here)


That would give us a max memory setting of about 111 GB in this example. Again, this doesn’t consider any other applications that might be running. The formula also does not consider the use of features such as Columnstore indexes or In-Memory OLTP. These features will require you to adjust your settings further.


Once you arrive at your number, you set your max memory and then monitor memory consumption, adjusting the settings as necessary. The 111 is not an absolute, it is meant as a decent starting point in the absence of any other information regarding the specific workload for the server.


Do You Need More Memory?

This is a common question that comes up in SQL Server and memory discussions. How do you know if you need more?


The first thing you need to know is if memory is the resource constraint you are facing. If so, then yeah, maybe you need more memory. One way to know is if your instance is using all the memory you assigned following the formula above. Another is if you are seeing memory errors in the SQL error logs. Still another way to know is if you are seeing a lot of disk activity (because SQL Server is not able to keep pages in memory). Any one of those items could mean that you need to allocate more memory to your instance.


However, it could be the case that by adding more memory you end up hurting performance. For example, in a virtualized environment it could be possible that the additional memory is spread over physical NUMA cores, resulting in slower performance than if the entire memory could fit inside one NUMA core.


I advocate that you monitor memory consumption over time, noting if you are trending upwards. Measuring and monitoring memory consumption is the best way to understand if your database server needs more memory.


Anything else is just a wild guess.



It’s not SQL Server, it’s you.


You have been in control all along.


It’s about time you understand, and accept, that you are responsible for what SQL Server is doing.


Blaming SQL Server for using all the memory that you have allowed it to access is like blaming a coffee maker for using all the water you placed inside.


The Actuator - June 14th

Posted by sqlrockstar Employee Jun 14, 2017

Home again after a trip to Austin where I was filming sessions for THWACKcamp 2017. Excited yet? You should be because you can register now by going here. It happens in October, and although it's four months away, it feels like it will be here in a week. I can't wait for you to see all the buttery goodness we have in store for you this year!


As always, here are some links from the Intertubz that I hope will hold your interest. Enjoy!


Ex-Admin Deletes All Customer Data and Wipes Servers of Dutch Hosting Provider

Remember this the next time someone asks you for elevated permissions. Insider threats are a real thing, folks. (HT to Radioteacher for pointing me to this story over the weekend.)


Microsoft realigns its cloud, AI, data organizations

If only there was a sign telling us that traditional data storage technologies were shifting toward the cloud and integrating with new technologies, like machine learning and artificial intelligence. Then we might be able to better prepare ourselves for this shift.


It's so windy in Britain the electricity price went negative

Because every now and then I see someone comment about how renewable energy sources aren't able to produce enough energy to meet demand. I believe they can produce enough if we are willing to invest enough.


UK cops arrest man picked out by automatic facial recognition software

We are just one step away from arresting people because we *think* they are about to commit a crime.


Microsoft buys security-automation vendor Hexadite

Interesting acquisition in the wake of WannaCry, although I am certain the wheels were in motion for many months prior. I believe this is yet another example of how Microsoft is taking data security seriously, and being as proactive as possible to minimize risks as Azure continues to gobble up data.


Why You Shouldn’t Use SMS for Two-Factor Authentication (and What to Use Instead)

I was somewhat aware of the risks with using SMS, and I liked how this article was able to explain the issue and possible workarounds.


Gamestop hacked. Financial data of online shoppers accessed by crooks

Yep. We got a letter last week about this matter. The letter, however, didn't specify that it was for online purchases only, as this article indicates.


As soon as this Evil-Clown-as-a-Service (ECaaS) becomes available in the US, I know what I'm getting some folks for their birthday:




Want to know a secret?


I'm going to start at the end.


If your environment collects syslog and trap messages, no matter what vendor solution you are using, create a filtration layer that will take all those messages, process them, and forward just the useful ones along.


Now, moving from the end back to the beginning, here's what you want to do: Get some copies of Kiwi Syslog Server, set up a load balancer like an F5 to do UDP round robin between all those servers, and set rules on the first server to filter out everything but the alerts you want to keep. For the messages you want to keep, set up rules to transparently forward them to the system(s) that will process and act on them. Export that rule set and import it to the other servers sitting behind the load balancer. Finally, update all of the devices in your enterprise to send their trap and syslog messages to the VIP presented by the load balancer.


That's the secret! Now that I've explained it, the trick, the bottom line, are you curious to know WHY I am telling you all this?


This is why: I've seen the following scenario a half-dozen times. I'm brought in to consult on a monitoring project and someone announces, "My monitoring sucks! It's dog slow and just doesn't work. Find me something else!" So, I poke around and realize that all of their traps and syslog messages are going to a single system, which also happens to be the monitoring system. In Solarwinds terms, that's the primary poller.


In my experience, network devices generate a metric buttload (yes, that's a scientifically accurate measurement) of messages per hour. In more boring terms, we're talking about roughly 4,000 messages per hour per machine.


If you have a server that is trying to manage pinging a set of devices (and collecting and storing those metrics) along with pulling SNMP or WMI data from that same set of devices (again, and storing that data), along with presenting that information in the form of views and reports, and checking the database for exceeded thresholds to create alerts, and analyzing that data to provide baselines, and... Well, you get the point. Polling engines have a lot of work to do. And one of the ways they stay on top of that work is having a finely tuned scheduler that manages all those polling cycles.


If you then start throwing a few million spontaneous messages, which must be processed in real-time, what you have is a VERY unhappy system. What you have is monitoring solution that "sucks" through no fault of its own.


Once I am able to point this out to clients, the next question is, "Should we turn off syslog or traps?" Of course not. That is a rich and vital source of information. What you need is to put something in front of those messages to filter them out.


Which brings me back to the "filtration system."


BUT... there's a catch! The catch is that most syslog and trap receivers expect to also process those messages themselves - to create alerts, to store the data, etc. What is needed in my example is to be able to ignore the messages that are unimportant, but then FORWARD the ones that matter to another system that is able to act upon them. The challenge here is to forward them without changing the source machine.


Many trap and syslog handlers can forward messages, but they replace the original machine with itself as the source. That's not helpful when you want to correlate a syslog message with data collected another way, say SNMP polling, for example. To do that, you need to perform what is called "transparent" forwarding, which keeps the original source machine information intact.


Kiwi Syslog has done this for years. But not so with SNMP traps. For a variety of reasons, which I won't get into now, that capability hasn't existed until 9.6, the latest version.


Now that this essential function within your monitoring infrastructure is available (not to mention really, REALLY affordable) you can impact the performance of your monitoring system in a great big, positive way.


So, take a minute and check out the new version. Forwarding traps transparently isn't the only new feature, by the way. There's also IPv6 support, SNMP v3 support, use of VarBinds in output, logging to Papertrail, and more! Try it and let me know what you think in the comments below.


Filter Blog

By date:
By tag: