Skip navigation
1 15 16 17 18 19 Previous Next

Geek Speak

2,081 posts

When there are application performance issues, most IT teams focus on the hardware, after blaming and ruling out the network, of course. If an application is slow, the first thought is to add hardware to combat the problem. Agencies have spent millions throwing hardware at performance issues without a good understanding of the true bottlenecks slowing down an application.


But a recent survey on application performance management by research firm Gleanster LLC reveals that the database is the No. 1 source of issues with performance. In fact, 88 percent of respondents cite the database as the most common challenge or issue with application performance.


Understanding that the database is often the cause of application performance issues is just the beginning; knowing where to look and what to look for is the next step. There are two main challenges to trying to identify database performance issues:


There are a limited number of tools that assess database performance. Tools normally assess the health of a database (is it working, or is it broken?), but don’t identify and help remediate specific database performance issues.


Database monitoring tools that do provide more information don’t go much deeper. Most tools send information in and collect information from the database, with little to no insight about what happens inside the database that can impact performance.


To successfully assess database performance and uncover the root cause of application performance issues, IT pros must look at database performance from an end-to-end perspective.


The application performance team should be performing wait-time analysis as part of regular application and database maintenance. This is a method that determines how long the database engine takes to receive, process, fulfill and return a request for information. A thorough wait-time analysis looks at every level of the database and breaks down each step to the millisecond.


The next step is to look at the results, then correlate the information and compare. Maybe the database spends the most time writing to disk; maybe it spends more time reading memory. Understanding the breakdown of each step helps determine where there may be a slowdown and, more importantly, where to look to identify and fix the problem.


We suggest that federal IT shops implement regular wait-time analysis as a baseline of optimized performance. The baseline can help with change management. If a change has been implemented, and there is a sudden slowdown in an application or in the database itself, a fresh analysis can help quickly pinpoint the location of the performance change, leading to a much quicker fix.


Our nearly insatiable need for faster performance may seem like a double-edged sword. On one hand, optimized application performance means greater efficiency; on the other hand, getting to that optimized state can seem like an expensive, unattainable goal.


Knowing how to optimize performance is a great first step toward staying ahead of the growing need for instantaneous access to information.


Find the full article on Government Computer News.

What is VM sprawl ?

VM sprawl is defined as a waste of resources (compute : CPU cycles and RAM consumption) as well as storage capacity due to a lack of oversight and control over VM resource provisioning. Because of its uncontrolled nature, VM sprawl has adverse effects on your environment’s performance at best, and can lead to more serious complications (including downtime) in constrained environments.


VM Sprawl and its consequences

Lack of management and control over the environment will cause VMs to be created in an uncontrolled way. This means not only the total number of VMs in a given environment, but also how resources are allocated to these VMs. You could have a large environment with minimal sprawl, but a smaller environment with considerable sprawl.


Here are some of the factors that cause VM sprawl:


  • Oversized VMs: VMs which were allocated more resources than they really need. Consequences:
    • Waste of compute and/or storage resources
    • Over-allocation of RAM will cause ballooning and swapping to disk if the environment falls under memory pressure, which will result in performance degradation
    • Over-allocation of virtual CPU will cause high co-stops, which means that the more vCPUs a VM has, the more it needs to wait for CPU cycles to be available on all the physical cores at the same moment. The more vCPUs a VM has, the less likely it is that all the cores will be available at the same time
    • The more RAM and vCPU a VM has, the higher is the RAM overhead required by the hypervisor.


  • Idle VMs: VMs up and running, not necessarily oversized, but being unused and having no activity. Consequences:
  • Waste of computer and/or storage resources + RAM overhead at the hypervisor level
  • Resources wasted by Idle VMs may impact CPU scheduling and RAM allocation while the environment is under contention
  • Powered Off VMs and orphaned VMDKs eat up space resources



How to Manage VM sprawl

Controlling and containing VM sprawl relies on process and operational aspects. The former covers how one prevents VM sprawl from happening, while the latter covers how to tackle sprawl that happens regardless of controls set up at the process level.



On the process side, IT should define standards and implement policies:


  • Role Based Access Control which defines roles & permissions on who can do what. This will greatly help reduce the creation of rogue VMs and snapshots.
  • Define VM categories and acceptable maximums: while not all the VMs can fit in one box, standardizing on several VM categories (application, databases, etc.) will help filter out bizarre or oversized requests. Advanced companies with self-service portals may want to restrict/categorize what VMs can be created by which users or business units
  • Challenge any oversized VM request and demand justification for potentially oversized VMs
  • Allocate resources based on real utilization. You can propose a policy where a VM resources will be monitored during 90 days after which IT can adjust resource allocation if the VM is undersized or oversized.
  • Implement policies on snapshots lifetime and track snapshot creation requests if possible


In certain environments where VMs and their allocated resources are chargeable, you should contact your customers to let them know that a VM needs to be resized or was already resized (based on your policies and rules of engagement) to ensure they are not billed incorrectly. It is worthwhile to formalize your procedures for how VM sprawl management activities will be covered, and to agree with stakeholders on pre-defined downtime windows that will allow you to seamlessly carry any right-sizing activities.



Even with the controls above, sprawl can still happen. It can be caused by a variety of factors. For example, you could have a batch of VMs provisioned for one project, but while they passed through the process controls, they can sit idle for months eating up resources because the project could end up being delayed or cancelled and no one informed the IT team.


In VMware environments where storage is thin provisioned at the array level, and where Storage DRS is enabled on datastore clusters it’s also important to monitor the storage consumption at the array level. While storage capacity will appear to be freed up at the datastore level after a VM is moved around or deleted, it will not be released on the array and this can lead to out-of-storage conditions. A manual triggering of the VAAI Unmap primitive will be required, ideally outside of business hours, to reclaim unallocated space. It’s thus important to have, as a part of your operational procedures, a capacity reclamation process that is triggered regularly.


The usage of virtual infrastructure management tools with built-in resource analysis & reclamation capabilities, such as Solarwinds Virtualization Manager, is a must. By leveraging software capabilities, these tedious analysis and reconciliation tasks are no longer required and dashboards present IT teams with immediately actionable results.



Even with all the good will in the world, VM sprawl will happen. Although you may have the best policies in place, your environment is dynamic and in the rush that IT Operations are, you just can’t have an eye on everything. And this is coming from a guy whose team successfully recovered 22 TB of space previously occupied by orphaned VMDKs earlier this year.

(to those who saw the earlier post, I apologize for the confusion. This is the correct link to use!)


Our first foray into Wiley Brands' Dummies series - "Network Monitoring for Dummies" - has been a runaway success at conventions and trade shows, with copies literally disappearing off our display when our backs are turned.


But we realize that not everyone has $3k to drop to visit us at CiscoLive, MS:Ignite, VMWorld, and the rest. So we're publishing the link here. Feel free to download and share the monitoring glory with colleagues, or even pass a copy to management!


I BEAT THEM TO FIRING ME! (Part Two) Fight Back

Why network configuration, change and compliance management (NCCCM) is a must

Inspired by former Citibank employee sentencing

(Part Two)


We've all heard horror stories about the disgruntled employee who pillages the office supply closet and leaves the building waving an obscene gesture, security badge skittering across the parking lot in his wake. Rage-quit is a thing, folks, and it's perfectly reasonable to be afraid that someone with high-level access, someone who could make changes to a network, might do so if they get mad enough. This happens more often than anyone would like to think about, and it's something that needs to be addressed in every organization. I felt like we should talk about this and discuss ways to help control and slow the damage of said employees and their bad will. Bottom line: we need to be aware of these situations and have a plan for recovery when things like this happen.



The gist of the story is simple: there was an employee who wiped out critical network configurations to about 90% of his former company's infrastructure.  Monday he was sentenced on charges of criminal vandalism. So, I realize the article above is technically in the past, but it brings up a great starter conversation about how IT organizations can stop criminal vandalism by actually using NCCCM products to protect ourselves and others from any type of disastrous events. Sometimes you need that brief pause or slight inconvenience to help you think straight and not go over the edge. This post can also help keep your butt out of, well, jail .


Today, we are going to talk about some of the risks of not having NCCCM software:



  1. Real-time change notification not enabled.
    • There is no tracking, idea, or reference to when changes are being made via maintenance plans, change requests, or malicious intent.
      • Being able to see network changes and know the timing helps you to be proactive, and gives you immediate remediation action for your network.
    • Who's on first base, and did someone slide in to home base?
      • When you have more than a couple of network engineers, documentation can be lacking and, well, you're busy, right? Being able to track when changes happen and who made them allows you to find and discover who, when, and what was changed, even when it's a week later.
      • Being able to compare the change that was made to existing is key to correlating issues after a change was made. All of a sudden, traffic is not flowing, or it's restricted, and you find out it was an error in the config change.
    • Someone is on your network changing your critical devices and wiping them clean.
      • Receive alerts so you don't find this type of information out when it's too late. Be able to log in, and after receiving the alert, restore to previous config.
  2. Approval process not in use.
    • No change auditing.
      • Being able to make changes without approval or a process sets you up for human error or worse: attacks.
      • Implementing an approval process allows you to have an auditing system that shows that more than one person approved a change.
      • Use this with real-time change notification to see if anyone outside your team is making changes. Either allow them into your NCCCM, or delete or lock out their login info to the devices.
    • No one can verify that you are making the change, or even what that change was.
      • When you have a larger team, you delegate changes or areas of functionality. Having an approval process verifies that the correct changes are being made. That gives you an extra set of eyes on the changes that are being made, which adds another level of detection to human error.
    • One person has complete access to your devices at a control level.
      • When you give people straight access to network devices there is a single point of failure. Taking an extra step creates a safe zone of recognition, training, and the ability to track changes and implementations on your network.
  3. Advanced change alert not enabled.
    • Not having an escalation alert set up can leave you with no configurations on your devices when you come into work the next day.
      • Set up escalation alerts based on more than one action.
        • Create a mass change alert if X amount of syslog changes happen within five minutes: Alert Manager NOW.
        • Mute these when implementing maintenance plans. more info by adatole
  4. Backups you are saving to your desktop or network drive (when you remember).
    • If a crisis happens, the great news is that network devices just need to be told what to do. But if you are like me and don't remember every line of code for hundreds of devices, then you better implement a backup system NOW.
      • If you have backups being stored, recovery is a click away with an NCCCM.
      • Compare starting to running to make sure a reboot won't cancel your changes.
      • Verify you have backups in secure locations so downtime is minimized and quickly averted.
        • I generally implement server side and network share drive backups. Make your server accessible with security verification lockdown in case someone tries to delete the backups (this happens because they don't want you to recover).
  5. Recovery procedures not in place.
    • Can your team recover from an emergency without you being on site?
      • Have a plan and practice with your team. You have to have a plan to be able to recover from maintenance plans gone wrong all the way to disaster recovery.  This takes practice, and should be something the whole team discusses so that you are better engaged. It helps to have an open mind to see how others may offer solutions to each potential problem suggested.
    • Setup an automatic password change template to be easily used in case of a potential issue within or outside your organization.
    • Use your NCCCM to monitor your configurations for potential issues or open back doors within your network.
      • Sometimes people will start allowing access within your network watching your configurations with a compliance reporting service allows you to detect and remediate quickly to stop these types of security breaches in their tracks.


If your curious on setup check this out:More info Security and SolarWinds NCM


Stay tuned for part two, I'll showcase how each one of these can be used in response to security!


Now that is a few things you should be able to use within any NCCCM software package.  This should also be something you revisit consistently to reevaluate and assess your situation and how to better protect yourself.

Let's dive into the mindset and standard methodologies around the security aspect:


This isn't just for technology this is in general things to be aware of and to implement on your own.  The ability to look at these with a non-judging eye and see them as just ways to hold off malicious attacks or ill will.


  1. There needs to be a clear exit strategy for anyone that is going to be fired or removed from a position with potential harm.
    • But he is such a nice guy?  Nice guys can turn bad.
    • When this information is being circulated you need to do what's best for your career as well as the company you work for and go on the defense.
      • Bring in specialized help organizations that can come in assess and prevent issues before they are terminated or moved
      • Make sure you verify all traffic and location they were involved in
        • Any passwords etc that were globally known NEEDS CHANGED NOW not LATER
        • Check all management software and pull rights to view only in the remainder days then delete access immediately after termination
        • Verify all company technology is accounted for (Accounting and inventory within your NCCCM is vital to maintain diligence on awareness of property and access to your network)
  2. Monitoring of team
    • Some may not be happy with a decision to terminate an employee and feel betrayed
    • Monitor their access and increase awareness to their actions
      • If you see them logging in to more routers and switches than ever before might setup a meeting...
      • See them going outside of their side and digging into things they should not, meeting time
      • Awareness is key and an approval process and change detection is key to preventing damage
  3. Security policies
    • You're only as good as the policy in place
      • Dig into your policies and make sure they are current and relevant
      • If you seriously have things like "If they call from desk phone reset password over the phone" type of security measures please REVISIT these.
        • Re-read that last statement
    • Make sure your team is signing acknowledgement of what they can and cannot do
      • Easier to prosecute when they have signed and agreed
    • Verify your security policies to your network devices
      • NCCCM compliance reporting setup for your needs is a great way to stay ahead of these items
      • You can find back doors on your network that people have setup to go around security policies this way. 


     I cannot obviously solve every issue, but at least help to point you into some good directions and processes.  If any of you want to jump in and add to this, please do I'm always interested in other people's methods of security.  The main point is to be aware of these situations, have a plan and recover when things like this happen.


Thank you,




Follow me on Twitter:



The Private cloud

Posted by arjantim Jul 30, 2016

In a private cloud model, the control of a secure and unique cloud environment to manage your resources is done by your IT department. The difference with public cloud is that the pool of resources is accessible only by you and therefore it makes management much easier and secure.


So, if you require a dedicated resource, based on performance, control, security, compliance or any other business aspect, the private cloud solution might just be the right solution for you.


More and more organisations are looking for the flexibility and scalability of cloud solutions. But many of these organisations struggle with business and regulatory requirements that keep them from being the right candidate for public or private cloud offerings, they think.


It can be that you work within a highly regulated environment that is not suitable for public cloud, and you don't have the internal resources to set up or administer suitable private cloud infrastructure. On the other hand, it might just be that you have specific industry requirements for performance that aren't yet available in the public cloud.


In those cases it could just be that the private cloud as an alternative to the use of public cloud, is a great opportunity. A private cloud enables the IT department, as well as the applications itself, to access IT resources as they are required, while the datacentre itself is running in the background. All services and resources used in a private cloud are defined in systems that are only accessible to the user and are secured towards external access. The private cloud offers many of the advantages of the public cloud but at the same time it minimises the risks. Opposed to many public clouds, the criteria for performance and availability in a private cloud can be customised, and compliance to these criteria can be monitored to ensure that they are achieved.


As a cloud or enterprise architect a couple of things are very important in the cloud era. You should know your application (stack) and the  way it behaves. By knowing what your application needs, you can determine which parts of the application could be placed where, so private or public. A good way to make sure you know your application is using the DART principle:


Discover          -           Show me what is going on

Alert                -           Tell me when it breaks or is going bad

Remediate      -           Fix the problem

Troubleshoot   -           Find the root cause



If  you run the right tools within your environement, it should be easy to discover what is going on in your environment and where certain bottlenecks are, and how your application is behaving and what the requirements for it are, the step to hybrid is much easier to make, but that is for another post, first I'll dive into public cloud a little further next time.

Hybrid IT is used to cover all manners of IT-ism especially those that span services an IT organization is delivering and services being delivered by someone outside of the IT organization. The technology constructs that are present in the current IT state, where services are continually delivered, integrated, and consumed on any device at any given time, are giving rise to hybrid IT adoption. The challenge for IT professionals is to unlock the potential of Hybrid IT without getting caught up in the churn and burn scenario of tech hype and tech debt. IT rigor and discipline must be part of the equation. And this is where monitoring as a discipline comes into play.


At BrightTALK’s Cloud and Virtualization Summit, I presented on monitoring as a discipline as the key to unlocking hybrid IT’s potential. The recording is available to view on BrightTALK’s website and it's hyperlinked below.




Let me know what you think of it in the comment section below.


(Zen Stones by Undeadstawa on DeviantArt)


Over the years, I've observed that despite running multiple element and performance management systems, most organizations still don't truly understand their IT infrastructure. In this post I'll examine how it's possible to have so much information on hand yet still have a large blind spot.




What does discovery mean to you? For most of us I'm guessing that it involves ICMP pings, SNMP community strings, WMI, login credentials and perhaps more in an attempt to find all the manageable devices that make up our infrastructure: servers, hypervisors, storage devices, switches, routers and so forth. We spin up network management software, perhaps a storage manager, virtualization management, performance management, and finally we can sleep safely knowing that we have full visibility and alerting for our compute, storage and networking infrastructure.


At this point I'd argue that the infrastructure discovery is actually only about 50% complete. Why? Because the information gathered so far provides little or no data that can be used to generate a correlation between the elements. By way of an analogy you could say that at this point all of the trees have been identified, labeled and documented, but we've yet to realize that we're standing in the middle of a forest. To explain better, let's look at an example.


Geographical Correlation

Imagine you have a remote site at which we are monitoring servers, storage, printers and network equipment. The site is connected back to the corporate network using a single WAN link, and—horrifyingly—that link is about to die. What do the monitoring systems tell us?


  • Network Management: I lost touch with the edge router and six switches.
  • Storage Management: I lost touch with the storage array.
  • Virtualization Management: I lost touch with these 15 VMs.
  • Performance Management: These elements (big list) are unresponsive.


Who monitors those systems? Do the alerts all appear in the same place, to be viewed by the same person? If not, that's the first issue, as spotting the (perhaps obvious) relationship between these events requires a meat-bag (human) to realize that if storage, compute and network all suddenly go down, there's likely a common cause. If this set of alerts went in different directions, in all likelihood the virtualization team, for example, might not be sure whether their hypervisor went down, a switch died, or something else, and they may waste time investigating all those options in an attempt to access their systems.

Centralize your alert feeds

Suppressing Alerts

If all the alerts are coming into a single place, the next problem is that in all likelihood the router failure event led to the generation of a lot of alerts at the same time. Looking at it holistically, it's pretty obvious that the real alert should be the loss of a WAN link; everything else is a consequence of losing the site's only link to the corporate network. Personally in that situation, I'd ideally like the alert to look like this:


2016/07/28 01:02:03.123 CRITICAL: WAN Node <a.b.c.d> is down. Other affected downstream elements include (list of everything else).


This isn't a new idea by any means; alert suppression based on site association is something that we should all strive to achieve, yet so many of us fail to do so. One of the biggest challenges with alert monitoring is being overwhelmed by a large number of messages, and the signal to noise ratio makes it impossible to see the important information. This is a topic I will come back to, but let's assume it's a necessary evil.

Suppress unnecessary alert noise

Always On The Move

In addition to receiving several hundred alerts from the devices impacted by the WAN failure, now it seems the application team is troubleshooting an issue with the e-commerce servers. The servers themselves seem fine, but the user-facing web site is generating an error when trying to populate shipping costs during the checkout process. For some reason the call to the server calculating shipping costs isn't able to connect, which is odd because it's based in the same datacenter as the web servers.


The security team is called in and begins running a trace on the firewall, only to confirm that the firewall is correctly permitting a session from the e-commerce server to an internal address on port tcp/5432 (postgres).


The network team is called in to find out why the TCP session to shipsrv01.ecomm.myco.corp is not establishing through the firewall, and they confirm that the server doesn't seem to respond to ping. Twenty minutes later, somebody finally notices that the IP returned for shipsrv01.ecomm.myco.corp is not in the local datacenter. Another five minutes later, the new IP is identified as being in the site that just went down; it looks like somebody had moved the VM to a hypervisor in the remote site, presumably by mistake, when trying to balance resources across the servers in the data center. Nobody realized that the e-commerce site had a dependency on a shipping service that was now located in a remote site, so nobody associated the WAN outage with the e-commerce issue. Crazy. How was anybody supposed to have known that?


It seems that despite having all those management systems I'm still a way from having true knowledge of my infrastructure. When I post next, I'll look at some of the things I'd want to do in order to get a better and more holistic view of my network so that I can embrace the inner peace I desire so much.


The Actuator - July 27th

Posted by sqlrockstar Employee Jul 27, 2016

Just when you thought 2016 couldn't get crazier you wake up to find that Verizon has bought Yahoo and that you are more interested in reading about the drone that delivered a Slurpee. Welcome to my world.


Here are the items I found most amusing from around the Internet. Enjoy!


Verizon to Purchase Yahoo’s Core Business for $4.8 Billion

I'm shocked Yahoo is worth even that much. I'm also hoping that someone will give me $57 million to do nothing.


Canadian Football League Becomes First Pro Football Organization To Use Sideline Video During Games

As our technology advances at an ever increasing pace, and is applied in new situations, it is up to someone in IT to make it all work. It's all about the data folks, as data is the most valuable asset any company (or team) can own.


Nearly Half of All Corporate Data is Out of IT Department’s Control

Honesty, I think that number is much higher. 


GOP delegates suckered into connecting to insecure Wi-Fi hotspots

I am certain the GOP leaders were tech savvy enough not to fall for this trick, right?


Snowden Designs a Device to Warn if Your iPhone’s Radios Are Snitching

Showing what he's been doing with his free time while living in exile, Snowden reveals how our phones have been betraying us for years.


Status Report: 7 'Star Trek' Technologies Under Development

With the release of the new Star Trek movie last week I felt the need to share at least one Star Trek link. But don't get your hopes up for warp drive or transporters anytime soon.


I wanna go fast: HTTPS' massive speed advantage

"If you wanna go fast, serve content over HTTPS using HTTP/2."


Watch The First Slurpee Delivery By Drone

Because who doesn't love a Slurpee in the summertime?


Meanwhile, in Redmond:

a - 1 (3).jpg

turkducken_fire_rz.jpgI need to deep fry a turbaconducken.


This isn't a want, no. This is a primal need of mine.


I feel so strongly about this that it's on my bucket list. It is positioned right below hiring two private investigators to follow each other, and right above building an igloo with the Inuit.


Deep frying a turkey is a dangerous task. You can burn your house down if you are not careful. Why take the risk? Because the end result, a crispy-juicy turkey bathed in hot oil for 45 minutes, is worth the effort. Or so I've been told. Like I said, it's on my bucket list.


Being the good data professional that I am I started planning out how to prepare for the day that I do, indeed, deep fry my own turkey. As I laid out my plans it struck me that there was a lot of similarity between both an exploding turkey and the typical "database is on fire" emergency many of us know all too well.


So here's my list for you to follow for any emergency, from exploding turkeys to databases catching fire and everything in between. You're welcome.


Don't Panic


People who panic are the same people who are not prepared. A little bit of planning and preparation go a long way to helping you avoid "panic mode" in any emergency situation. Whenever I see someone panicking (like ripping out all their network cables just because their mouse isn't working) it is a sure sign that they have little to no practical experience with the situation at hand.


Planning will help you from feeling the need to panic. If your database is on fire you can recover from backups, because you prepared for such a need. And if your turkey explodes you can always go to a restaurant for a meal.


Rely on all your practice and training (you have practiced this before, right)? Emergency response people train in close to real life situations, often. In fact, firefighters even pay people to burn down their spare barns.


Go to your do have a checklist, right? And a process to follow? If not you may find yourself in a pile of rubble, covered in glitter.


Assess the Situation


Since you aren't panicking you are able to calmly assess the situation. A turkey on fire inside your oven would require a different response than a turkey that explodes in a fireball on your deck and is currently burning the side of your house. Likewise, an issue with your database that affects all users will require a different set of troubleshooting steps than an issue affecting only some users or queries.


In order to do a proper assessment of the situation you will be actively gathering data. For database servers you are likely employing some type of monitoring and logging tools. For turkeys, it's likely a thermometer to make certain it has completely thawed before you drop it into the hot oil.


You also need to know your final goal. Perhaps your goal is to stop your house from being engulfed in flames. Perhaps your goal is to get the systems back up and running, even if it means you may have some data loss.


Not every situation is the same. That's why a proper assessment is necessary when dealing with emergencies...and you can't do that while in a panic.


Know Your Options


Your turkey just exploded after you dropped it into a deep fryer. Do you pour water on the fire quickly? Or do you use a fire extinguisher?


Likewise, if you are having an issue with a database server should you just start rebooting it in the hopes that it clears itself up?


After your initial assessment is done you should have a handful of viable options to explore at that point. You need to know the pros and cons for each of these options. That's where the initial planning comes handy, too. Proper planning will reduce panic, allow you to assess the situation, and then you can understand all your viable options along with the pros and cons. See how all this works together?


It may help for you to phone a friend here. Sometimes talking through things can help, especially when the other person has been practicing and helping all along.


Don't Make Things Worse


Pouring water on the grease fire on your deck is going to make the fire spread more quickly. And running 17 different DBCC commands isn't likely to make your database issue any better, either.


Don't be the person that makes things worse. If you are able to calmly assess the situation, and you know your options well, then you should be able to make an informed decision that doesn't make things worse. Also, don’t focus on blame. Now isn't the time to worry about blame. That will come later. If you focus on fault, you aren’t working on putting out the fire right now. You might as well grab a stick and some marshmallows for making s’mores while your house burns to the ground.


Also, a common mistake here is done by people who try to do many things at once, specifically for database issues. If you make multiple changes then you may never know what worked, or the changes you make may cancel each other out leaving you still with a system offline. Know the order of the actions you want to take and do them one at a time.


And it wouldn't hurt you to take a backup now, before you start making changes, if you can.


Learn From Your Mistakes


Everyone makes mistakes, I don't care what their marketing department may tell you. Making mistakes isn't as big of a concern as not learning from your mistakes. If you burned your house down the past two Thanksgivings, don't expect a lot of people showing up for dinner this year.


Document what you’ve done, even if it is just a voice recording.  You might not remember all the details afterwards, so take time to document events while they are still fresh in your memory.


Review the events with others and gather feedback along the way as to how things could have been better or avoided. Be open to criticism, too. There's a chance the blame could be yours. If that's the case, accept that you are human and lay out a training plan that will help you to avoid making the same mistake in the future.


I'm thankful that my database server isn't on fire. But if it was, I know I'd be prepared.



Many agencies are already practicing excellent cyber hygiene; others are still in implementation phases. Regardless of where you are in the process, it is critical to understand that security is not a one-product solution. Having a solid security posture requires a broad range of products, processes and procedures.


Networks, for example, are a critical piece of the security picture; agencies must identify and react to vulnerabilities and threats in real time. You can implement automated, proactive security strategies that will increase network stability and have a profound impact on the efficiency and effectiveness of the overall security of the agency.


How can agencies leverage their networks to enhance security? Below are several practices you can begin to implement today, as well as some areas of caution.


Standardization. Standardizing network infrastructure is an often-overlooked method of enhancing network performance and security.


Start by reviewing all network devices and ensure consistency across the board. Next, make sure you’ve got multiple, well-defined networks. Greater segmentation will provide two benefits: greater security, as access will not necessarily be granted across each unique segment, and greater ability to standardize, as segments can mimic one another to provide enhanced control.


Change management. Good change management practices go a long way toward enhanced security. Specifically, software that requires a minimum of two unique approvals before changes can be implemented can prevent unauthorized changes. In addition, make sure you fully understand the effect changes will have across the infrastructure before granting approval.


Configuration database. It’s important to have a configuration database for backups, disaster recovery, etc. If you have a device failure, being able to recover quickly can be critical; implementing a software setup that can do this automatically can dramatically reduce security risks. Another security advantage of a configuration database is the ability to scan for security-policy compliance.


Compliance awareness. Compliance can be a complicated business. Consider using a tool that automates vulnerability scanning and FISMA/DISA STIG compliance assessments. Even better? A tool that also automatically sends alerts of new risks by tying into the NIST NVD, then checking that information against your own configuration database.


Areas of caution:

Most security holes are related to inattention to infrastructure. In other words, inaction can be a dangerous choice. Some examples are:


Old inventory. Older network devices inherently have outdated security. Invest in a solution that will inventory network devices and include end-of-life and end-of-support information. This also helps forecast costs for new devices before they quit or become a security liability.


Not patching. Patching and patch management is critical to security. Choose an automated patching tool to be sure you’re staying on top of this important task.


Unrestricted bring-your-own-device policies. Allow BYOD, but with restrictions. Separate the unsecure mobile devices on the network and closely monitor bandwidth usage so you can make changes on the fly as necessary.


There is no quick-and-easy solution, but tuning network security through best practices will not only enhance performance, but will also go a long way toward reducing risks and vulnerabilities.


Find the full article on Government Computer News.

In my previous post, I listed some best practices for help desk IT pros to follow to save time resolving issues. The responses I received from that post made me realize that the best solution for one IT organization may not necessarily be the same for another. An organization’s size, business model, functional goals, organizational structure, etc. create unique challenges for those charged with running the help desk function, and these factors directly affect IT support priorities.


With this knowledge in mind, I decided to take a different approach for this post. Below, I have listed some of the easy ways that help desk organizations – irrespective of their differences – can improve their help desk operations through automation to create a chaos-free IT support environment.


  1. Switch to centralized help desk ticketing
    Receiving help desk requests from multiple channels (email, phone, chat, etc.), and manually transferring them onto a spreadsheet creates a dispersed and haphazard help desk environment. Switching to a centralized help desk ticketing system will help you step up your game and automate the inflow of incidents and service requests.
  2. Automate ticket assignment and routing
    Managing help desk operations manually can lead to needless delays in assigning tickets to the right technician, and potential redundancy if you happen to send the same requests to multiple technicians. To avoid this, use a ticketing system that helps you assign tickets to technicians automatically, based on their skill level, location, availability, etc.
  3. Integrate remote support with help desk
    With more people working remotely, traditional help desk technicians have to adapt and begin to resolve issues without face-to-face interactions. Even in office settings, IT pros tend to spend about 30% of their valuable time visiting desks to work on issues. By integrating a remote support tool into your help desk, you can resolve issues remotely, taking care of on- and off-site problems with ease.
  4. Resolve issues remotely without leaving your desk
    A recent survey by TechValidate states that 77% of surveyed help desk technicians feel that using remote support decreased their time-to-resolution of trouble tickets. Using the right remote support tool helps you easily troubleshoot performance issues and resolve complex IT glitches without even leaving your desk.


These are some of the simple yet powerful ways that organizations can create a user-friendly help desk.

Are you managing your help desk the hard way or the easy way?

Four Easy Ways to Create a Chaos-free Help Desk.png


To download this infographic, click here. Share your thoughts on how you reduce workload and simplify help desk support in the comments section."Sore throat from talking, sore feet from walking, sore face from smiling. Must be @CiscoLive."


Before I dig in to what I saw, as well as what I think about what I saw, I have to take a moment to shout out my gratitude and thanks to the amazing SolarWinds team that assembled for this convention. In fact, I had so much shouting to do that I wrote a whole separate post about it that you can read here. I hope you'll take a moment to share in the sheer joy of being part of this team.


But remember to come back here when you're done.


Okay, you're back now? Let's dive in!


CLUS is About Connections

As much as you may think Cisco Live is about networking (the IT kind), technology, trends, and techniques, the reality is that much of the attraction for a show like this is in making personal connections with groups of like-minded folks. While that's true of most conventions, Cisco Live, with 27,000 attendees this year, offers a larger quantity and wider variety of people with the same range of experience, focus, and background as you. You can sit at a table full of 33-year-old voice specialists who started off as Linux® server admins. It might be a bit of a trick to find them in the sea of humanity, socks_kilt.jpgbut they are there. Usually they are attending the same sessions you are; you just have to look around.


Beyond the birds-of-a-feather aspect, Cisco Live gives you a chance to gather with people who share your particular passion - whether it's for a brand, technology, or technique - and learn what's in store in the coming months.


And what would geek-based IT culture be if all of that social interaction didn't include some completely off-the-wall goofiness? An offhand joke last year blossomed into the full-fledged #KiltedMonday, with dozens (if not hundreds) of attendees sporting clan colors and unshaved legs to great effect.


Speaking of legs, many people's legs were also festooned with a riot of colors, as #SocksOfCLUS, also started to take hold. You might even say it got a leg up on the convention this year. (I'll be here all week, folks.)



The Rise of DevNet

During the show, news broke that cloud-based development environment Cloud9 had been acquired by Amazon Web Services® (AWS), which prompted my fellow Head Geek Patrick Hubbard to tweet:

Next, @awscloud grabs @Cloud9. One day all will be #AWS and #Azure. Learn #DevOps fellow geeks.


That truth was already clearly being embraced by the folks at Cisco®.


Over the last three Cisco Live events I've attended, the growth of the area devoted to DevNet - the network engineer-flavored DevOps  - has grown substantially. It’s expanded from a booth or two at Cisco Live 2015, to a whole section of the floor in Berlin 2016, to a huge swath of non-vendor floor space last week. Two dozen workstations were arranged around a model train, and attendees were encouraged to figure out ways to code the environment to change the speed and direction of the train. Fun!


Meanwhile, three separate theaters ran classes every hour on everything from programming best practices to Python deep dive tutorials.


I found this much more engaging and effective than the other statements that network engineers need to learn to code due to the pressure of a SPECIFIC technology (say SDN). While it might be true, I much prefer that the code be presented FIRST, and then let IT pros figure out what cool things we want to do with it.

devnet1.jpg devnet2.jpg devnet3.jpg



Still Figuring IT Out

It is clear that the convention is both emotionally and financially invested in the evolving trends of SDN, IoT, and cloud/hybrid IT. Vast swaths of the show floor are dedicated to it, and to showing off the various ways it might materialize in real, actual data centers and corporate networks.


But the fact is that none of those things are settled yet. Settling, perhaps. Which is fine. I don't need a convention (much less a single vendor) to announce, "And lo, it was good" about any technology that could change the landscape of infrastructures everywhere.


For things like SDN and IoT, Cisco Live is the place you go once a year to check in, see how the narrative has changed, and elbow the person next to you at the session or display and say, "So, are you doing anything with this? No? Me, neither.”


The View from the Booth

Back at Starship SolarWinds (aka our booth), the story was undeniably NetPath™. People sought us out to see it for themselves, or were sent by others (either back at the office or on the show floor) to come check it out. The entire staff demonstrated the belle of the NPM 12 ball constantly throughout the day, until the second day, when we had booth visitors start demo-ing it THEMSELVES to show OTHER visitors (who sometimes turned out to be people they didn't even know). The excitement about NetPath was that infectious.


We also witnessed several interactions where one visitor would assure another that the upgrade was painless. We know that hasn’t always been the case, but seeing customers brag to other customers told us that all of our due diligence on NPM 12, NCM 7.5, NTA 4.2, and SRM 6.3 (all of which came out at the beginning of June) was worth all the effort.


Not that this was the only conversation we had. The new monitoring for stacked switches was the feature many visitors didn't know they couldn't live without, but left texting their staff to schedule the upgrade for. The same goes for Network Insight - the AppStack-like view that gives a holistic perspective on load balancers like F5®s and the pools, pool members, and services they provide.


We also had a fair number of visitors who were eager to see how we could help them solve issues with automated topology mapping, methods for monitoring VDI environments, and techniques to manage the huge volume of trap and syslog that larger networks generate.


And, yes, those are all very network-centric tools, but this is Cisco Live, after all. That said, many of us did our fair share of showing off the server and application side of the house, including SAM, WPM, and the beauty that is the AppStack view.


Even more thrilling for the SolarWinds staff were the people who came back a second time, to tell us they had upgraded THAT NIGHT after visiting us and seeing the new features. They didn’t upgrade to fix bugs, either. They couldn’t live another minute without NetPath, Network Insight (F5 views), switch stack monitoring, NBAR2 support, binary config backup, and more.


We all took this as evidence that this was one of the best releases in SolarWinds history.


CLUS vs SWUG, the Battle Beneath the Deep

In the middle of all the pandemonium, several of us ran off to the Shark Reef to host our first ever mini-SWUG, a scaled-down version of the full-day event we've helped kick off in Columbus, Dallas, Seattle, Atlanta, and, of course, Austin.


Despite the shortened time frame, the group had a chance to get the behind-the-scenes story about NetPath from Chris O'Brien; to find out how to think outside the box when using SolarWinds tools from Destiny Bertucci (giving them a chance to give a hearty SWUG welcome to Destiny in her new role as SolarWinds Head Geek); and hear a detailed description of the impact NetPath has had in an actual corporate environment from guest speaker Chris Goode.


The SolarWinds staff welcomed the chance to have conversations in a space that didn't require top-of-our-lungs shouting, and to have some in-depth and often challenging conversations with folks that had more than a passing interest in monitoring.


And the attendees welcomed the chance to get the inside scoop on our new features, as well as throw out curveballs to the SolarWinds team and see if they could stump us.



Cisco Live was a whirlwind three days of faces, laughter, ah-ha moments, and (SRSLY!) the longest walk from my room to the show floor INSIDE THE SAME HOTEL that I have ever experienced. I returned home completely turned around and unprepared to get back to work.


Which I do not regret even a little. I met so many amazing people, including grizzled veterans who’d earned their healthy skepticism, newcomers who were blown away by what SolarWinds (and the convention) had to offer, and the faces behind the twitter personas who have, over time, become legitimate friends and colleagues. All of that was worth every minute of sleep I lost while I was there.


But despite the hashtag above, of course I have regrets. Nobody can be everywhere at once, and a show like Cisco Live practically requires attendees to achieve a quantum state to catch everything they want to see.

  • I regret not getting out of the booth more.
  • Of course, THEN I'd regret meeting all the amazing people who stopped in to talk about our tools.
  • I regret not catching Josh Kittle's win in his Engineering Deathmatch battle.
  • I regret not making it over to Lauren Friedman's section to record my second Engineers Unplugged session.
  • And I regret not hearing every word of the incredible keynotes.


Some of these regrets I plan to resolve in the future. Others may be an unavoidable result of my lack of mutant super-powers allowing me to split myself into multiple copies. Which is regrettable, but Nightcrawler was always my favorite X-Man, anyway.


#SquadGoals for CLUS17:

Even before we departed, several of us were talking about what we intended to do at (or before) Cisco Live 2017. Top of the list for several of us was to be ready to sit for at least one certification exam at Cisco Live.


Of course, right after that our top goal was to learn how to pace ourselves at the next show.


Somehow I think one of those goals isn't going to make it.




Congratulations! You are our new DBA!


Bad news: You are our new DBA!


I'm betting you got here by being really great at what you do in another part of IT.  Likely you are a fantastic developer. Or data modeler.  Or sysadmin. Or networking guy (okay, maybe not likely you are one of those…but that's another post).  Maybe you knew a bit about databases having worked with data in them, or you knew a bit because you had to install and deploy DBMSs.  Then the regular DBA left. Or he is overwhelmed with exploding databases and needs help. Or got sent to prison (true story for one of my accidental DBA roles). I like to say that the previous DBA "won the lottery" because that's more positive than LEFT THIS WONDERFUL JOB BEHIND FOR A LIFE OF CRIME.  Right?


I love writing about this topic because it's a role I have to play from time to time, too.  I know about designing databases, can I help with installing, managing, and supporting them?  Yes. For a while.


Anyway, now you have a lot of more responsibility than just writing queries or installing Oracle a hundred times a week.  So what sorts of things must a new accidental DBA know is important to being a great data professional?  Most people want to get right in to finally performance tuning all those slow databases, right?  Well, that's not what you should focus on first.


The Minimalist DBA


  1. Inventory: Know what you are supposed to be managing.  Often when I step in the fill this role, I have to support more servers and instances that anyone realized were being used.  I need to know what's out there to understand what I'm going to get a 3 AM call for.  And I want to know that before that 3 AM call. 
  2. Recovery: Know where the backups are, how to get to them, and how to do test restores. You don't want that 3 AM call to result in you having to call others to find out where the backups are. Or to find out that that there are no backups, really.  Or that they actually are backups of the same corrupt database you are trying to fix.  Test that restore process.  Script it.  Test the script.  Often.  I'd likely find one backup and attempt to restore it on my first day of the job.  I want to know about any issues with backups right away.
  3. Monitor and Baseline: You need to know BEFORE 3 AM that a database is having problem. In fact, you just don't want any 3 AM notifications.  The way you do that is by ensuring you know not only what is happening right now, but also what was happening last week and last month.  You'll want to know about performance trends, downtime, deadlocks, slow queries, etc.  You'll want to set up the right types of alerts, too.
  4. Security: Everyone knows that ROI stands for return on investment.  But it also stands for risk of incarceration.  I bet you think your only job is to keep that database humming.  Well, your other job is to keep your CIO out of jail.  And the CEO.  Your job is to love and protect the data.  You'll want to check to see how sensitive data is encrypted, where the keys are managed and how other security features are managed.  You'll want to check to see who and what has access to the data and how that access is implemented.  While you are at it, check to see how the backups are secured.  Then check to see if the databases in Development and Test environments are secured as well.
  5. Write stuff down: I know, I know.  You're thinking "but that's not AGILE!"  Actually, it is.  That inventory you did is something you don't want to have to repeat.  Knowing how to get to backups and how to restore them is not something you want to be tackling at 3 AM.  Even if your shop is a "we just wing it" shop, having just the right amount of modeling and documentation is critical to responding to a crisis.  We need the blueprints more than just to build something. 
  6. Manage expectations: If you are new to being a DBA, you have plenty to learn, plenty of things to put in place, plenty of work to do.  Be certain you have communicated what things need to be done to make sure that you are spending time on the things that make the most sense.  You'll want everyone to love their data and not even have to worry that it won't be accessible or that it will be wrong.


These are the minimal things one needs to do right off the bat.  In my next post, I'll be talking about how to prioritize these and other tasks.  I'd love to hear about what other tasks you think should be the first things to tackle when one has to jump into a an accidental DBA role.


The Actuator - July 20th

Posted by sqlrockstar Employee Jul 20, 2016

I'm back from the family London vacation and ready to get to work. I was unplugged for much of last week, focusing on the world around me. What I found was a LOT of healthy discussions about #Brexit, David Cameron leaving, and if you can eat a Scotch egg cold (HINT: You can, but you shouldn't, no matter what the clerk at Harrod's tells you.)


With almost 2,000 unread items in my RSS feeder I had lots of material to root through looking for some links to share with you this week. Here are the ones I found most amusing from around the Internet. Enjoy!


PokemonGO and Star Trek TNG's 'The Game'

Happy to see I wasn't the only one who made this connection, but what I'd like next is for someone to make an augmented reality game for finding bottlenecks in the data center.


Security Is from Mars, Application Delivery Is from Venus

I liked the spin this article took on the original book theme. Looking forward to the follow-up post where the author applies the business concepts of cost, benefits, and risk to their marriage.


Microsoft Wins Landmark Email Privacy Case

Reversing a decision from 2014 where the US government thought it was OK to get at data stored outside our borders, this ruling is more in line with current technology advances and, let's be honest, common sense.


How boobytrapped printers have been able to infect Windows PCs for over 20 years

I seem to recall this being a known issue for some time now, so I was shocked to see that the patch was only just released.


6 Workplace Rules that Drive Everyone Crazy

"Underwear must be worn at all times." Good to know, I guess, but is this really a problem for an office somewhere?


Microsoft Apologizes for Inviting "Bae" Interns to Night of "Getting Lit" in Lame Letter

Another public misstep for Microsoft with regards to social values. This letter is more than just poorly worded, it underlines what must be considered acceptable behavior for Microsoft employees.


High traffic hits the operations team

The Olympics are just around the corner!


Wonderful view for King Charles I last week, looking from Trafalgar Square down to Big Ben glowing from the setting sun:

IGNR9895 copy.jpg

As network engineers, administrators, architects, and enthusiasts we are seeing a trend of relatively complicated devices that all strive to provide unparalleled visibility into the inner workings of applications or security. Inherent in these solutions is a level of complexity that challenges network monitoring tools, it seems that in many cases vendors are pitching proprietary tools that are capable of extracting the maximum amount of data out of a specific box. Just this afternoon I sat on a vendor call in which we were doing a technical deep dive of a next-generation firewall with a very robust feature set with a customer. Inevitably the pitch was made to consider a manager of managers that could consolidate all of this data into one location. While valuable in its own right for visibility, this perpetuates the problem of many “single panes of glass”.


I couldn’t help but think, what we really need is the ability to follow certain threads of information across many boxes, regardless of manufacturer—these threads could be things like application performance or flows, security policies, etc. Standards-based protocols and vendors that are open to working with others are ideal as it fosters the creation of ecosystems. Automation and orchestration tools offer this promise, but add on additional layers of intricacy in the requirements of knowing scripting languages, a willingness to work with open source platforms, etc.


Additionally, any time we seem to abstract a layer or simplify it, we lose something in the process—this is known as generation loss. Generation loss is the result of compounding this across many devices or layers of management tends to result in data that is incomplete or worse inaccurate, yet this is the data that we are intending to use to make our decisions.


Is it really too much to ask for simple and accurate? I believe this is where the art of simplicity comes into play. The challenge of creating an environment in which the simple is useful and obtainable requires creativity, attention to detail, and an understanding that no two environments are identical. In creating this environment, it is important to address what exactly will be made simple and by what means. With a clear understanding of the goals in mind, I believe it is possible to achieve these goals, but the decisions on equipment, management systems, vendors, partners, etc. need to be well thought through and the right amount of time and effort must be dedicated to it.

Filter Blog

By date:
By tag: