Skip navigation
1 2 3 Previous Next

Geek Speak

1,920 posts

Master of Your Virtual IT Universe: Trust but Verify at Any Scale

A Never Ending IT Journey around Optimizing, Automating and Reporting on Your Virtual Data Center



Automation is a skill that requires detailed knowledge, including comprehensive experience around a specific task. This is because you need that task to be fully encapsulated in a workflow script, template, or blueprint. Automation, much like optimization, focuses on understanding the interactions of the IT ecosystem, the behavior of the application stack, and the interdependencies of systems to deliver the benefits of economies of scale and efficiency to the overall business objectives. And it embraces the do-more-with-less edict that IT professionals have to abide by.


Automation is the culmination of a series of brain dumps covering the steps that an IT professional takes to complete a single task. These are steps that the IT pro is expected to complete multiple times with regularity and consistency. The singularity of regularity is a common thread in deciding to automate an IT process.


Excerpted from Skillz To Master Your Virtual Universe SOAR Framework


Automation in the virtual data center spans workflows. These workflows can encompass management actions such as provisioning or reclaiming virtual resources, setting up profiles and configurations in a one to many manner, and reflecting best practices in policies across the virtual data center in a consistent and scalable way.


Embodiment of automation

Scripts, templates, and blueprints embody IT automation. They are created from an IT professional’s best practice methodology - tried and true IT methods and processes. Unfortunately, automation itself cannot differentiate between good and bad. Therefore, automating bad IT practice will lead to unbelievable pain at scale across your virtual data centers.


To combat that from happening, keep automation stupid simple. First, automate at a controlled scale following the mantra, “Do no harm to your production data center environment.” Next, monitor the automation process from start to finish in order to ensure that every step executes as expected. Finally, analyze the results and use your findings to make necessary adjustments to optimize the automation process.


Automate with purpose

Start with an end goal in mind. What problems are you solving for with your automation work? If you can’t answer this question, then you’re not ready to automate any solution.


This post is a shortened version of the eventual eBook chapter. Stay tuned for elongated version in the eBook. Next week, I will cover reporting in the virtual data center.

I'm in Barcelona this week. It's been ten years since I've been here so it's hard for me to tell what has changed, but everything seems new. It still has a wonderful feel, and wonderful food. Yesterday I presented my session with David Klee to a packed room. I loved the energy of the session, and VMworld as a whole. I hope I get a chance to come back again.


Anyway, here's a bunch of stuff I found on the Intertubz in the past week that you might find interesting, enjoy!


You've been hacked. What are you liable for?

A nice reminder for those that aren't concerned about being hacked. You may have some liability you were not expecting.


SQL Server 2016 Express Edition in Windows containers

This may seem like a minor blip on the radar for some, but for those of us in the Microsoft Fanbois circles this is a huge step forward to what we believe will be total assimilation of all things data.


Amazon Partners With VMware to Extend Its Computing Cloud

Of course, SQL Server Express in a container pales in comparison to the big news of the week, and that is VMware partnering with Amazon to go head on with Microsoft for the enterprise hybrid cloud market.


How an IT Pro Kicked Hackers Off Surveillance Cameras

Score one for the good guys, I guess. Maybe we can get Rich a job working for the companies that continue to deploy unsecured IoT devices.


S.A.R.A. Seeks To Give Artificial Intelligence People Skills

Something tells me that maybe letting college students decide what is appropriate behavior for AI isn't the best idea.


Work Counts

Ever wonder how many people are just like you? Well, wonder no more!


Because I'll be away on Election day, I already filled out my absentee ballot and wanted to show you who I voted for:


bacon - 1.jpg

Whether seeking solutions to pressing problems, networking with like-minded professionals or researching products, professionals across nearly all industries benefit from participating in communities every day, and it’s no different for federal IT professionals.


Actively engaging in online IT communities can make all the difference by enabling connections with other federal IT pros who are encountering similar issues, providing educational content, offering insights from experts and providing a channel to share valuable feedback.


Peer-to-Peer Collaboration


Online IT communities offer diverse feedback on creative ideas, and productive discussions for problem-solving. You may find your “unique” problem is actually more common than originally thought.


For example, GovLoop features educational blogs, forums about everything from career to citizen engagement, training both online and in person, and group spaces to engage with content around topics such as technology, levels of government and occupations.


Another popular community is Reddit which often generates reams of answers from users who have experienced similar problems or are able to recommend resources to fix the issue. Reddit also features up-voting and down-voting on replies, making it easy to identify trusted answers from top-ranked users.


This sort of organic, peer-to-peer collaboration can help you solve problems quickly and with the confidence of having your answers come from trusted sources—professionals just like you from a wide variety of backgrounds and with vast ranges of expertise.


Direct Line to Vendors


Online communities can also connect you with the vendors whose products you rely on to get your job done. As you become more involved, you might transition from solely an information-seeker to an information-provider.


For example, the highly engaged end users that participate in SolarWinds’ more than 130,000 member strong IT community, THWACK, have influenced product directions and even go-to-market strategy based on their direct feedback and general discussion of industry trends and the company’s products. THWACK also features a Federal and Government space, which caters specifically to challenges that are unique to federal IT pros.


Career Development


Online IT communities also enable education that lends itself to career development. For example, many consider the SpiceWorks How-to forum a reliable place to develop IT skills and learn about the industry. Forums such as this and those found on thwack provide a venue for community members to learn best practices, get access to advice from experts in various fields and research developing trends that could impact your career in an environment where you have the option to ask questions and engage in conversations.


The Power of the Masses at Your Finger Tips


In summary, the access to a wide audience of peers that online IT communities provide can be invaluable to you as a federal IT pro. Membership and active participation within such communities can provide quick answers to problems, foster collaboration to ensure vendors are creating products that meet your needs and create new opportunities for your career development.


Find the full article on our partner DLT’s blog, TechnicallySpeaking.


The process of expanding operations or adopting new technology within an IT organization is sometimes met with caution. Rightfully so, given what’s at stake. Even the smallest configuration error, which can happen when you introduce new software or systems into a network, can spell disaster. Whether it leads to downtime, loss of data, the advent of security vulnerabilities, or compliance violations, the costs can be great for businesses of all sizes.


It’s not surprising, then, that some IT pros are hesitant to try out new software or test the latest SolarWinds offerings. But what it really boils down to is the fact that some IT pros don’t have the resources available to test a solution effectively without fearing these negative consequences.


What if I told you it was possible to test a solution like SolarWinds® Log & Event Manager (LEM) in a manner that was both safe and free for your business? Would you consider adding a powerful SIEM solution to your arsenal that tackles IT security and compliance? Well, you can! Introducing the LEM + GNS3® Integration Guide.


What is GNS3?


GNS3 is a multi-vendor tool that allows you to build, design, and test network configurations and software in a risk-free virtual environment. This technology eliminates the need for expensive physical testing by offering a network-attached or stand-alone virtual test bed, free of charge. With real-time network emulation, users can conduct proof-of-concept testing and troubleshooting on dynamic network configurations.


Download the GNS3 and SolarWinds LEM Integration Guide


Whether you’re a seasoned GNS3 pro that’s new to LEM, or a LEM user that’s interested in building a lab to experience the full functionality of the product within a safe and secure virtualized instance for testing or troubleshooting, this guide has something for you. In addition to instructing you on how to get started with VMWare®, Hyper-v®, GNS3, and LEM, this guide will help you understand some of the LEM basics to help ensure that you hit the ground running with this advanced security solution.


Click here for access to the guide and free 30-day trial of LEM!


To learn more about the partnership we’ve formed with GNS3, check out the GNS3 Group on THWACK:

Master of Your Virtual IT Universe: Trust but Verify at Any Scale

A Never Ending IT Journey around Optimizing, Automating and Reporting on Your Virtual Data Center


Optimization is a skill that requires a clear end-goal in mind. Optimization focuses on understanding the interactions of the IT ecosystem, the behavior of the application stack, and the interdependencies of systems inside and outside their sphere of influence in order to deliver success in business objectives.


If one were to look at optimization from a theoretical perspective, each instantiation of optimization would be a mathematical equation with multi-variables. Think multivariate calculus as an IT pro tries to find the maxima as other variables change with respect to one another.


Excerpted from Skillz To Master Your Virtual Universe SOAR Framework


Optimization in the virtual data center spans the virtual data center health across resource utilization and saturation while encompassing resource capacity planning and resource elasticity. Utilization, saturation, and errors play key roles in the optimization skill. The key question is: what needs to be optimize in the virtual data center?


Resources scalability

Similar to other IT disciplines, optimization in the virtual environment boils down to optimizing resources i.e. do more with less. This oftentimes produces an over-commitment of resources and the eventual contention issues that follow the saturated state. If the contention persists over an extended period of time or comes too fast and too furious, errors usually crop up. And that’s when the “no-fun” time begins.


Resource optimization starts with tuning compute (vCPUs), memory (vRAM), network and storage. It extends to the application and its tunable properties through the hypervisor to the host and cluster.


Sub-optimal scale


vCPU and vRAM penalties manifests in saturation and errors, which lead to slow application performance and tickets being opened. There are definite costs to oversizing and undersizing virtual machines (VMs). Optimization seeks to find the fine line with respect to the entire virtual data center environment.


To optimize compute cycles, look for vCPU utilization and their counters as well processor queue length. For instance, in VMware, the CPU counters to examine are: %USED, %RDY and %CSTP. %USED shows how much time the VM spent executing CPU cycles on the physical CPU. %RDY defines the percentage of time a VM wanted to execute but had to wait to be scheduled by the VMKernel. %CSTP is the percentage of time that a SMP VM was ready to run but incurred delay because of co-vCPU scheduling contention. The performance counters in Microsoft are System\Processor Queue Length, Process\% Processor Time, Processor\%Processor Time and Thread\% Processor Time.


To optimize memory, look for memory swapping, guest level paging and overall memory utilization. For VMware, the counters are SWP/s and SWW/s while for Microsoft, the counter is pages/s. For Linux VMs, leverage vmstat and the swap counters si and so, swap in and swap out respectively.


Of course, a virtualization maestro needs to factor in hypervisor kernel optimization/reclamation techniques as well as the application stack and the layout of their virtual data center infrastructure into their optimization process. 


This post is a shortened version of the eventual eBook chapter. For a longer treatment, stay tuned for the eBook. Next week, I will cover automation in the virtual data center.

If you didn’t have a chance to join some 350+ of your fellow IT and Security Pros at our Shields Up Panel: Network Security Fundamentals, Fight! THWACKcamp session – you’re in luck, we took some notes.


Our panel was comprised of Eric Hodeen, Byron Anderson, our moderator Patrick Hubbard, and me, c1ph3r_qu33n.


Compliance v Security was the theme this year, and we tackled 4 big questions:


  • Have security practitioners and business owners figured out how to work with compliance schemes instead of fighting them? 
  • Are you more or less secure when you put compliance first?
  • What benefits (or harms) do compliance schemes and checklists offer?
  • If you are new to compliance, where do you start first? 


Our panelists felt that security and compliance teams are generally getting along better. However, there are still times when a business owner looks only at the penalties or risks of non-compliance and doesn’t consider the impact to the business of following a standard blindly. This can be especially true of highly proscriptive standards like DISA STIGS (Defense Information Systems Agency - Security Technical Implementation Guidelines)[1], or NERC CIP (North America Electric Reliability Corporation – Critical Information Protection)[2]. The challenge for IT and security pros, is to effectively communicate the potential business impacts and to give the business owner the ammunition to argue for a waiver or request a compensating control.  This way your organization can reach an optimum balance of compliance risk vs business needs.


One of the misconceptions that business owners may have is that a compliance scheme comprehends all the organizations security risk, so nothing further needs to be considered. As practitioners we know that compliance schemes are negotiated or promulgated standards that take time to change. Adjusting for changes to the threat landscape and addressing new technology innovations in a rapid fashion are challenges for compliance schemes. Furthermore no compliance standard considers every nuance of every IT environment.


So that is one of the risks of taking a compliance only approach.  But no one on the panel felt compliance schemes don’t have value.  Like other good guidelines and checklists, such as the OWASP top ten[3], or the SANS Critical Security Controls[4], compliance checklists can add value to an organization, especially as assurance.  The panel was divided however, on whether you start with a checklist, or you end with a checklist.  The answer may depend on your organizations maturity.  If you’ve been doing security for a while, using a checklist to validate your approach may add an extra layer of assurance. If you are new to security, however, a good checklist can be a great asset as you get started in this new IT discipline. 


Speaking of getting started, we all had different ideas about what is your most important first step. One of us said default passwords, which insidiously have a way of creeping back into the organization – whether it’s from a new install, or a reset of an existing device – default passwords still haunt us.  Another panelist thought end users were the biggest challenge, and maintaining good security required strong user participation. Anyone who has dealt with ransomware or phishing knows how important it is to keep users informed of likely risks and good security hygiene.


VIDEO: Shields Up Panel: Network Security Fundamentals, Fight!


We all agreed that THWACKcamp was great fun and we hope to see you all next year. If you’ve got an issue you’d like to see the experts take a stab at, post your questions and we’ll put them in the idea basket for next year.







I'm heading to VMworld in Barcelona next week, so if you are there let me know as I would love to talk data and databases with you. I'm co-presenting with David Klee and we are going to be talking about virtualizing your database server. I have not been to Barcelona in 10 years, I'm looking forward to seeing the city again, even briefly.


Here's a bunch of stuff I found on the Intertubz in the past week that you might find interesting, enjoy!


Cloud by the Megawatt: Inside IBM’s Cloud Data Center Strategy

If you are like me you will read this article and think to yourself "Wait, IBM has a Cloud?"


VMware, AWS Join Forces in Battle for Enterprise Cloud Market

This partnership marks an important shift in the market and should cause some concern for Microsoft. That being said, I also see this partnership as a last-ditch effort to keep VMware relevant before being absorbed completely by Dell.


Here are the 61 passwords that powered the Mirai IoT botnet

Proving once again that the cobbler's children have no shoes, we have an army of devices built by people that should know better, but don't put into practice the basics of security.


Twitter, Microsoft, Google and others say they haven’t scanned messages like Yahoo

I feel I have heard this story before, and I think I know how it ends.


Are microservices for you? You might be asking the wrong question

"Change for the sake of change is rarely a sensible use of time." If only they taught this at all the charm schools known as the MBA.


Latency numbers every programmer should know

Not a bad start at a complete list, and I would argue that more than just programmers should know these numbers. I've had to explain the speed of light to managers before.


7 Times Technology Almost Destroyed The World

Here's hoping the robots can act a bit more like humans when it counts the most.


Autumn has arrived here in New England, and that means apple picking is in full swing:

apple - 1.jpg

One of the questions that comes up during the great debate on the security of Internet of Things (IoT) is the responsibility of device manufacturers to support those devices. When we buy a refrigerator or a toaster, we expect that device to last through the warranty date and well beyond. Assuming it is a well-made unit it may last for a long time. But what about devices that only live as long as someone else wants them to?

Time's Up!

Remember Revolv? The smart hub for your home that was purchased by Nest? They stopped selling the device in October 2014 after being purchased, but a year and a half later they killed the service entirely. The Internet was noticeably upset and cried out that Google and Nest had done a huge disservice to their customers by killing the product. The outcry was so fierce that Nest ended up offering refunds for devices.

The drive to create new devices for IoT consumption is huge. Since most of them require some kind of app or service to operate correctly, it also stands to reason that these devices are reliant on the app to work properly. In the case of Revolv, once the app was shut down the device was no longer able to coordinate services and essentially became a huge paperweight. A few companies have chosen to create a software load that allows devices to function in isolated mode, but those are few and far between.

The biggest concern for security here is what happens when those devices that are abandoned by still function are left to their own ends. A fair number of the devices used in the recent IoT botnet attacks were abandonware cameras that were running their last software update. Those devices aren't going to have security holes patched or get new features. The fact that they work at all owes more to them being IP-based devices than anything else.

Killing In The Name Of IoT

However, if those manufacturers had installed a kill switch instead of allow the cameras to still work it would have prevented some of the chaos from the attack. Yes, the buyers of those cameras would have been irritated that the functionality was lost. But it could have made a massive security issue easier to deal with.

Should manufacturers be responsible for installing a software cut-out that allows a device to be remotely disabled when the support period expires? That's a thorny legal question. It opens the manufacturer up to lawsuits and class action filings about creating products with known defects. But it also raises the question of whether or now these same manufacturers should have a greater duty to the safety of the Internet.

And this isn't taking into account the huge issues with industrial IoT devices. Could you imagine what might happen if an insulin pump or an electrical smart meter was compromised and used as an attack vector? The damage could be catastrophic. Worse yet, even with a kill switch or cut-out to prevent transmission, neutering those devices renders them non-functional and potentially hazardous. Medical devices that stop working cause harm and possibly death. Electrical meters that go offline create hazards for people living in homes.

Decisions, Decisions

There's no easy answer to all these problems. Someone is going to be mad no matter what we decide. Either these devices live on in their last known configuration and can be exploited or they get neutered when they shutdown. The third option, having manufacturers support devices forever, isn't feasible either. So we have to make some choices here. We have to stand up for what we think is right and make it happen.

Make sure your IoT policy spells out what happens to out-of-support devices. Make sure your users know that you are going to implement a traffic kill switch if your policy spells it out. Knowledge is key here. Users will understand your reasons if communicated ahead of time. And using tools from Solarwinds to track those devices and keep tabs on them helps you figure out when it's time to implement those policies. Better to have it sorted out now than have to deal with a problem when it happens.


Image courtesy of Spirit-Fire on Flickr


I'd think I'd like to mirror a session title from the recent ThwackCamp and subtitle this particular post "Don't Hate Your Monitoring." We all face an enormous challenge in monitoring our systems and infrastructure, and in part that's caused by an underlying conflict:


monitor_all.jpg Color_Overload.jpg

Image Courtesy D Sharon Pruitt


This is a serious problem for everybody. We want to monitor everything we possibly can. We NEED to monitor everything we can, because heaven help us if we miss something important because we don't have the data available. At the same time, we cannot possibly cope with the volume of information coming into our monitoring systems; it's overwhelming, and trying to manually sift through to find the alerts or data that actually matter to your business. And then we wonder why people are stressed, and why we have a love/hate relationship with our monitoring systems!


How can the chaos be minimized? Well, some manual labor is required up front, and after that it will be an iterative process that's never truly complete.


Decide what actually needs to be monitored

It's tempting to monitor every port on every device, but do you really need to monitor every access switch port? Even if you want to maintain logs for those ports for other reasons, you'll want to filter alerts for those ports so that they don't show up in your day to day monitoring. If somebody complains about bad performance, then digging in to the monitoring and alerting is a good next step (maybe the port is fully utilized, or spewing errors constantly), but that's not business critical, perhaps unless that's your CEO's switchport.


Focus on which alerts you generate in the first place

  • Use Custom Properties to allow identification of related systems so that alerts can be generated in an intelligent way using custom labels to identify related systems.
  • Before diving into the Alert Suppression tab to keep things quiet, look carefully at Trigger Conditions and try to add intelligent queries in order to minimize the generation of alerts in the first place. The trigger conditions allow for some quite complex nested logic which can really help make sure that only the most critical alerts hit the top of your list.
  • Use trigger conditions to suppress downstream alerts (e.g if a site router is down, don't trigger alerts from devices behind that router that are now inaccessible)


Suppress Alerts!

I know I just said not to dive into Alert Suppression, but it's still useful as the cherry on top of the cream that is carefully managed triggers.

  • It's better in general to create appropriate rules governing when an alert is triggered than to suppress it afterwards. Alert suppression is in some ways rather a blunt tool; if the condition is true, all alerts are suppressed.
  • One way to achieve downstream alert suppression is to add a suppression condition to devices on a given site that queries for the status of that site's edge router; if the router status is not "Up", the condition becomes true, and it should suppress the triggered alerts from that end device. This could also be achieved using Trigger Conditions, but it's cleaner in my opinion to do it in the Alert suppression tab. Note that I said "not Up" for the node status rather than "Down"; that means that the condition will evaluate to true for any status except Up, rather than explicitly requiring it to be only "Down". The more you know, etc...


Other features that may be helpful

  • Use dependencies! Orion is smart enough to know the implicit dependencies of, say, CPU and Memory on the Host in which they are found, but site or application-level dependencies are just a little bit trickier for Orion to guess. The Dependencies feature allows you to create relationships between groups of devices so that if the 'parent' group is down, alerts from the 'child' group can be automatically suppressed. This is another way to achieve downstream alert suppression at a fairly granular level.
  • Time-based monitoring may help for sites where the cleaner unplugs the server every night (or the system has a scheduled reboot), for example.
  • Where approptiate, consider using the "Condition must exist for more than <x> minutes" option within Trigger Conditions to avoid getting an alert for every little blip in a system. This theoretically slows down your notification time, but can help clear out transient problems before they disturb you.
  • Think carefully about where each alert type should be sent. Which ones are pager-worthy, for example, versus ones that should just be sent to a file for historical record keeping?


Performance and Capacity Monitoring

  • Baselining. As I discussed in a previous post, if you don't know what the infrastructure is doing when things are working correctly, it makes it even harder to figure out what's wrong when then there's a problem. This might apply to element utilization, network routing issues, and more. This information doesn't have to be in your face all the time, but having it to hand is very valuable.




Everything so far talks about how to handle alerting when events occur. This is "reactive" monitoring, and it's what most of us end up doing. However, to achieve true inner peace we need to look beyond the triggers and prevent the event from happening in the first place. Obviously there's not much that can be done about power outages or hardware failures, but in other ways we can help ourselves by proactively.


Proactive monitoring basically means preempting avoidable alerts. Solarwinds software offers a number of features to forecast and plan for capacity issues before they become alerts. For example, Virtualization Manager can warn of impending doom for VMs and their hosts; Storage Resource Monitor tracks capacity trends for storage devices; Network Performance Manager can forecast exhaustion dates on the network; User Device Tracker can monitor switch port utilization. Basically, we need to use the forecasting/trending tools provided to look for any measurement that looks like it's going to hit a threshold, check with the business to determine any additional growth expected, then make plans to mitigate the issue before it becomes one.


Hating Our Monitoring


We don't have to hate our monitoring. Sadly, the tools tend to do exactly what we tell them to, and we sometimes expect a little too much from them in terms of having the intelligence to know which alerts are important, and which are not. However, we have the technology at our fingertips, and we can make our infrastructure monitoring dance, if not to our tune (because sometimes we need something that just isn't possible at the moment), then at least to the same musical genre. With careful tuning, alerting can largely be mastered and minimized. With proactive monitoring and forecasting, we can avoid some of those alerts in the first place. After all -- and without wishing to sound too cheesy -- the best alert is the one that never triggers.

For the unprepared, managing your agency’s modern IT infrastructure with all its complexity can be a little scary. Evolving mandates, the constant threat of a cyber-attack and a connected workforce that demands access to information when they want it, where they want it, places more pressure on the government’s IT professionals than ever. And at the heart of it all is still the network.


At SolarWinds we know today’s government IT pro is a Bear Grylls-style survival expert. And in true Man vs. Wild fashion, the modern IT pro needs a network survival guide to be prepared for everything.


Assess the Network


Every explorer needs a map. IT Pros are no different, and the map you need is of your network. Understanding your networks capabilities, needs and resources is the first step of network survival.


Ask yourself the following questions:


  • How many sites need to communicate?
  • Are they located on the intranet, or externally and accessed via a datacenter?
  • Is the bulk of my traffic internal, or is it all bound for the Internet? How about any key partners and contractors?
  • Which are the key interfaces to monitor?
  • Where should deep packet inspection (DPI) agents go?
  • What is the scope and scale of what needs to be monitored?


Acknowledge that Wireless is the Way


What’s needed are tools like wireless heat maps to manage over-subscribed access points and user device tracking tools that allow agencies to track rogue devices and enforce their BYOD policies. The problem is that many of these tools have traditionally been cost-prohibitive, but newer options open doors to implementing these technologies you might not be aware of.


Prepare for the Internet of Things


The government can sometimes be slower to adopt new technology, but agencies are increasingly experimenting with the Internet of Things. How do you overcome these challenges? True application firewalls can untangle the most sneaky device conversation, get IP address management under control and get gear ready for IPv6. They can also classify and segment your device traffic; implement effective quality of service to ensure that critical business traffic has headroom; and of course, monitor flow.


Understand that Scalability is Inevitable


It is important to leverage capacity forecasting tools, configuration management, and web-based reporting to be able to predict and document scalability and growth needs so you can justify your budget requests and stay ahead of infrastructure demands.


Just admit it already—it’s All About the Application


Everything we do is because of and for the end-users. The whole point of having a network is to achieve your mission and support your stakeholders. Seek a holistic view of the entire infrastructure, including the impact of the network on application issues and don’t silo network management anymore.


A Man is Only as Good as His Tools


Having sophisticated network monitoring and management tools is an important part of arming IT professionals for survival, but let’s not overlook the need for certifications and training, so the tools can be used to effectively manage the network.


Revisit, Review, Revise


What’s needed to keep your network running at its peak will change, so your plans need to adapt to keep up. Constantly reexamine your network to be sure that you’re addressing changes as they arise. Successful network management is a cyclical process, not a one-way journey.


Find the full article on Federal Technology Insider.

A Neverending IT Journey around Optimizing, Automating, and Reporting on Your Virtual Data Center



The journey of one begins with a single virtual machine (VM) on a host. The solitary instance in a virtual universe with the vastness of the data center as a mere dream in the background. By itself, the VM is just a one-to-one representation of its physical instantiation. But virtualized, it has evolved, becoming software defined and abstracted. It’s able to draw upon a larger pool of resources should its host be added to a cluster. With that transformation, it becomes more available, more scalable, and more adaptable for the application that it is supporting.


The software abstraction enabled by virtualization provides the ability to quickly scale across many axes without scaling their overall physical footprint. The skills required to do this efficiently and effectively are encompassed by optimization, automation, and report. The last skill is key because IT professionals cannot save their virtual data center if no one listens to and seeks to understand them. Moreover, the former two skills are complementary. And as always, actions speak louder than words.




In the following weeks, I will cover practical examples of optimization, automation, and reporting in the virtual data center. Next week will cover optimization in the virtual data center. The week after will follow with automation. And the final week will discuss reporting. In this case, order does matter. Automation without optimization consideration will lead to work being done that serves no business-justified purpose. Optimization and automation without reporting will lead to insufficient credit for the work done right, as well as misinforming decision makers of the proper course of actions to take.


I hope you’ll join me for this journey into the virtual IT universe.

Last week was Microsoft Ignite in Atlanta. I had the privilege of giving two presentations, one of which was titled "Performance Tuning Essentials for the Cloud DBA." I was thinking of sharing the slides, but the slides are just there to enhance the story I was telling. So I've decided instead to share the narrative here in this post today, including the relevant images. As always, you're welcome.


I started with two images from the RightScale 2016 State of the Cloud Report:



The results of that survey help to show that hybrid IT is real, it's here, and it is growing. Using that information, combined with the rapid advances we see in the technology field with each passing year, I pointed out how we won't recognize IT departments in five years.


For a DBA today, and also the DBA in five years, it shouldn't matter where the data resides. The data can be either down the hall or in the cloud. That's the hybrid part, noted already. But how does one become a DBA? Many of us start out as accidental DBAs, or accidental whatevers, and in five years there will be accidental cloud DBAs. And those accidental cloud DBAs will need help. Overwhelmed at first, the cloud DBA will soon learn to focus on their core mission:




Once the cloud DBA learns to focus on his or her core mission (recovery), they can start learning how to do performance troubleshooting (because bacon ain't free). I believe that when it comes to troubleshooting, it is best to think in buckets. For example, if you are troubleshooting a virtualized database server workload, the first question you should be asking yourself is, "Is the issue inside the database engine or is it external, possibly within the virtual environment?" In time, the cloud DBA learns to think about all kinds of buckets: virtual layers, memory, CPU, disk, network, locking, and blocking. Existing DBAs already have these skills. But as we transition to being cloud DBAs, we must acknowledge that there is a gap in our knowledge and experience.


That gap is the network.


Most DBAs have little to no knowledge of how networks work, or how network traffic is utilized. A database engine, such as SQL Server, has little knowledge of any network activity. There is no DMV to expose such details, and a DBA would need to collect O/S level details on all the machines involved. That's not something a DBA currently does; we take networks for granted. To a DBA, networks are like the plumbing in your house. It's there, and it works, and sometimes it gets clogged.


But the cloud demands that you understand networks. Once you go cloud, you become dependent upon networks working perfectly, all the time. One little disruption, because someone didn't call 1-800-DIG-SAFE before carving out some earth in front of your office building, and you are in trouble. And it's more than just the outage that may happen from time to time. No. You need to know about your network as a cloud DBA for the following reasons: RPO, RTO, SLA, and MTT. I've talked before about RPO ands RTO here, and I think anyone reading this would know what SLA means. MTTI might be unfamiliar, though. I borrowed that from adatole. It stands for Mean Time To Innocence, and it is something you want to keep as short as possible, no matter where your data resides.


You may have your RPO and RTO well-defined right now, but do you know if you can meet those metrics right now? Turns out the internet is a complicated place:




Given all that complexity, it is possible that data recovery may take a bit longer than expected. When you are a cloud DBA, the network is a HUGE part of your recovery process. The network becomes THE bottleneck that you must focus on first and foremost in any situation. In fact, when you go cloud, the network becomes the first bucket you need to consider. The cloud DBA will need to be able to know and understand in five minutes or less if the network is the issue first, before spending any time on trying to tune a query. And that means the cloud DBA is going to have to understand what is clogging that pipe:




Because when your phone rings, and the users are yelling at you that the system is slow, you will want to know that the bulk of the traffic in that pipe is Pokemon Go, and not the data traffic you were expecting.


Here's a quick list of tips and tricks to follow as a cloud DBA.


  1. Use the Azure Express! Azure Express Route is a dedicated link to Azure, and you can get it from Microsoft or a managed service provider that partners with Microsoft. It's a great way to reduce the complex web known as the internet, and give you better throughput. Yes, it costs extra, but only because it is worth the price.
  2. Consider Alt-RPO, Alt-RTO. For those times when your preferred RPO and RTO needs won't work, you will want an alternative. For example, you have an RPO of 15 minutes, and an RTO of five minutes. But the network is down, so you have an Alt-RPO of an hour and an Alt-RTO of 30 minutes, and you are storing backups locally instead of in the cloud. The business would rather be back online, even to the last hour, as opposed to waiting for the original RPO/RTO to be met.
  3. Use the right tools. DBAs have no idea about networks because they don't have any tools to get them the details they need. That's where a company like SolarWinds comes in to be the plumber and help you unclog those pipes.


Thanks to everyone that attended the session last week, and especially to those that followed me back to the booth to talk data and databases.


In previous posts, I've written about making the best of your accidental DBA situation.  Today I'm going to give you my advice on the things you should focus on if you want to move from accidental DBA to full data professional and DBA.


As you read through this list, I know you'll be thinking, "But my company won't support this, that's why I'm an accidental DBA." You are 100% correct.  Most companies that use accidental DBAs don't understand the difference between developer and DBA, so many of these items will require you to take your own initiative. But I know since you are reading this you are already a great candidate to be that DBA.




Your path to becoming a DBA has many forks, but I'm a huge fan of formal training. This can be virtual or in-person. By virtual I mean a formal distance-learning experience, with presentations, instructor Q&A, hands-on labs, exams and assignments. I don't mean watching videos of presentations. Those offerings are covered later.


Formal training gives you greater confidence and evidence that you learned a skill, not that you only understand it. Both are important, but when it comes to that middle-of-the-night call alerting you that databases are down, you want to know that you have personal experience in bringing systems back online.



Conferences are a great way to learn, and not just from invited speakers. Speaking with fellow attendees, via the hallway conferences that happen in conjunction with the formal event,  gives you the opportunity to network with people who boast a range of skill levels. Sharing resource tips with these folks is worth the price of admission.


User Groups and Meetups

I run the Toronto Data Platform and SQL Server Usergroup and Meetup, so I'm a bit biased on this point. However, these opportunities to network and learn from local speakers are often free to attend.  Such a great value! Plus, there is usually free pizza. Just saying. You will never regret having met other data professionals in your local area when you are looking for you next project.


Online Resources

Online videos are a great way to supplement your formal training. I like Pluralsight because it's subscription-based, not per video. They offer a free trial, and the annual subscription is affordable, given the breadth of content offered.


Webinars given by experts in the field are also a great way to get real-world experts to show and tell you about topics you'll need to know. Some experts host their own, but many provide content via software vendor webinars, like these from SolarWinds.



Blogs are a great way to read tips, tricks and how tos. It's especially important to validate the tips you read about. My recommendation is that you validate any rules of thumb or recommendations you find by going directly to the source: vendor documentation and guidelines, other experts, and asking for verification from people you trust. This is especially true if the post you are reading is more than three months old.


But another great way to become a professional DBA is to write content yourself.  As you learn something, get hands-on experience using it, write a quick blog post. Nothing makes you understand a topic better than trying to explain it to someone else.



I've learned a great deal more about databases by using tools that are designed to work with them. This can be because the tools offer guidance on configuration, do validations and/or give you error messages when you are about to do something stupid.  If you want to be a professional DBA, you should be giving Database Performance Analyzer a test drive.  Then when you see how much better it is at monitoring and alerting, you can get training on it and be better at databasing than an accidental DBA with just native database tools.



The most important thing you can do to enhance your DBA career is to get hands-on with the actual technologies you will need to support. I highly recommend you host your labs via the cloud. You can get a free trial for most. I recommend Microsoft Azure cloud VMs because you likely already have free credits if you have an MSDN subscription. There's also a generous 30-day trial available.

I recommend you set up VMs with various technologies and versions of databases, then turn them off.  With most cloud providers, such as Azure, a VM that is turned off has no charge except for storage, which is very inexpensive.  Then when you want to work with that version of software, you turn on your VM, wait a few minutes, start working, then turn it off when you need to move on to another activity.


The other great thing about working with Azure is that you aren't limited to Microsoft technologies.  There are VMs and services available for other relational database offerings, plus NoSQL solutions. And, of course, you can run these on both Windows and Linux.  It's a new Microsoft world.


The next best thing about having these VMs ready at your fingertips is that you can use them to test new automation you have developed, test new features you are hoping to deploy, and develop scripts for your production environments.


Think Like a DBA, Be a DBA

The last step is to realize that a DBA must approach issues differently than a developer, data architect, or project manager would. A DBA's job is to keep the database up and running, with correct and timely data.  That goal requires different thinking and different methods.  If you don't alter your problem-management thinking, you will likely come to different cost, benefit, and risk decisions.  So think like a DBA, be a DBA, and you'll get fewer middle-of-the-night calls.

Thanks to everyone that stopped by the booth at Microsoft Ignite last week, it was great talking data and databases. I'm working on a summary recap of the event so look for that as a separate post in Geek Speak later this week.


In the meantime, here's a bunch of stuff I found on the Intertubz in the past week that you might find interesting, enjoy!


Will IoT become too much for IT?

The IoT is made up of billions of unpatched and unmonitored devices, what could possibly go wrong?


Largest ever DDoS attack: Hacker makes Mirai IoT botnet source code freely available on HackForums

This. This is what could go wrong.


Clinton Vows To Retaliate Against Foreign Hackers

I don't care who you vote for, there is no way you can tell me you think either candidate has any idea what to do about the Cyber.


Marissa Mayer Forced To Admit That She Let Your Mom’s Email Account Get Hacked

For example, here is one of the largest tech companies making horrible decisions about 500 MILLION accounts being hacked. I have little faith in anyone when it comes to data security except for Troy Hunt. Is it too late to elect him Data Security Czar?


California OKs Self-Driving Vehicles Without Human Backup

Because it seemed weird to not include yet another link about self-driving cars. 


BlackBerry To Stop Making Smartphones

The biggest shock I had while reading this was learning that BlackBerry was still making smartphones. 


Fake island Hy-Brasil printed for 500 years

I'm going with the theory that this island was put there in order to catch people making copies of the original work, but this article is a nice reminder why crowd-sourced pieces of work (hello Wikipedia) are often filled with inaccurate data.


At Microsoft Ignite last week patrick.hubbard found this documentation gem:


ports - 1.jpg

I’ve come to a crossroads. Regular SolarWinds Lab viewers and new THWACKcamp attendees might have noticed my fondness for all things programmable. I can’t help smiling when I talk about it; I came to enterprise IT after a decade as a developer. But if you run across my spoor outside of SolarWinds, you may notice a thinly-veiled, mild but growing despair. On the flight back from Microsoft® Ignite last week, I reluctantly accepted reality: IT, as we know it, will die.


Origin of a Bummer


On one hand, this should be good news because a lot of what we deal with in IT is, quite frankly, horrible and completely unnecessary. I’m not referring to managers who schedule weekly all-hands that last an hour because that’s the default meeting period in Outlook®. Also not included are 3:00am crisis alerts that prompt you to stumble to the car with a baseball hat because the issue is severe enough to take out the VPN, too. Sometimes it’s okay to be heroic, especially if that means you get to knock off early at 2:00pm.


The perennial horror of IT is boredom. Tedium. Repetitive, mindless, soul-crushing tasks that we desperately want to remediate, delegate, or automate, but can’t because there’s no time, management won’t let us, or we don’t have the right tools.


All of this might be okay, except for two things: accelerating complexity and the move to the cloud. The skinny jeans-clad new kids can’t imagine any other way, and many traditional large enterprise IT shops also hit the cloud hookah and discovered real benefits. Both groups recognized dev as a critical component, and their confidence comes from knowing that they can and will create whatever their IT requires to adapt and realize the benefits of new technology.


No, the reason this is a bummer – if only for five-or-so-years – is that it’s going to hit the people I have the greatest affinity for the hardest: small to medium business IT, isolated independent department IT in large organizations, and superstar admins with deep knowledge in highly specialized IT technology. In short, those of you who’ve worn all the hats I have at one point or another over the last two decades.


I totally understand the reasonable urge to stand in front of a gleaming Exchange rack and tell all the SaaS kids to get off your lawn. But that’s a short-term solution that won’t help your career. In fact, if you’re nearing or over 50, this is an especially critical time to be thinking about the next five years. I watched some outstanding rack-and-stack app infrastructure admins gray out and fade away because they resisted the virtualization revolution. Back then, I had a few years to get on board, gain the skills to tame VMs, and accelerate my career.


This time, however, I’m actively looking ahead, transitioning my education and certification, and working in production at least a little every week with cloud and PaaS technology. I’m also talking to management about significant team restructuring to embrace new techniques.


Renewed Mission


Somewhere over Louisiana I accepted the macro solution that we’ll each inevitably face, but also my personal role in it. We must tear down IT as we know it, and rebuild something better suited to a data center-less reality. We’ll abandon procedural ticket-driven change processes, learn Sprints, Teaming, Agile, and, if we’re lucky, get management on board with a little Kanban, perhaps at a stand-up meeting.


And if any or all of that sounds like New Age, ridiculous mumbo jumbo, that’s perfectly okay. That is a natural and understandable reaction of pragmatic professionals who need to get tish done. My role is to help my peers discover, demystify, and develop these new skills. Further, it’s to help management stop thinking of you as rigidly siloed and ultimately replicable when new technology forces late-adopting organizations into abrupt shifts and spasms of panicked change.


But more than that, if these principles are even partially adopted to enable DevOps-driven IT, life is better. The grass really is greener. I’ve seen it, lived it, and, most surprising to this skeptical, logical, secret introvert, I’ve come to believe it. My job now is to combine my fervor for the tools we’ll use with a career of hard-won IT lessons and do everything I can to help. Don’t worry. This. Is. Gonna. Be. Awesome.

Filter Blog

By date:
By tag: