Skip navigation
1 15 16 17 18 19 Previous Next

Geek Speak

2,105 posts

Whether providing support to end-users or fellow employees, there are countless benefits to leveraging an IT Help Desk solution. From ticketing and service management, to IT asset management and more, these solutions have a lot to offer in the way of optimizing the efficiency of the support function of a business.


IT help desk admins are busy people. They are constantly juggling service requests and trouble tickets, either bouncing from workstation to workstation to provide hands-on support, or offering assistance via phone or web.


Though help desk solutions offer the advantage of tracking and managing all aspects involved in battling support issues, sometimes there’s nothing like a face-to-face, rather a screen-to-screen visit to improve this interaction.


In this blog, I’ll detail five benefits of the combined use of an IT help desk solution and remote support tools, a power punch combo that is sure to have an impact on the delivery of IT support.

  1. Simplifying IT Service Management (ITSM) – By integrating remote support tools into a help desk solution, help desk technicians can manage ticketing automation and SLAs, and seamlessly interact with end-users to resolve their issues. Establishing remote connections to the end-user’s desktop directly from within help desk tickets and chat windows provides a gateway to greater transparency. It also eliminates the need to visit users personally, simplifying the ITSM process from end-to-end.

  2. Increasing operational efficiency – As mentioned above, help desk solutions remove the need for manually managing tickets, creating performance reports, assigning tickets to technicians, etc. Using such a tool allows IT support pros to focus on the task at hand: remediating support issues. By integrating remote support capabilities, this enhances their performance, providing direct access to problematic end-user machines and IT systems, enabling them to diagnose performance problems and troubleshoot immediately from anywhere.
  3. Automating IT support – An ideal help desk system automates IT support operations, including ticketing management, IT asset discovery and management, escalations, change/approval management, and more. The addition of remote support capabilities helps to streamline IT support from ticket creation to resolution, all within a single console.
  4. Lowering time to resolution – Enabling remote support tools within a help desk solution gives IT help desk admins the ability to explore problems right from the source, rather than shuffling through screen shots or spending time deciphering written or verbal details, which are often provided by the technical layman. This empowers them to accelerate the resolution process.
  5. Improving customer satisfaction – IT issues can bog down business, causing trouble for the end-users and the company as a whole. Having these tools at their disposal allows help desk technicians to quickly diagnose and resolve issues, so end-users and fellow employees can get back to business. When coupled with the ability to track and measure the performance of help desk technicians, this strengthens a company’s ability to provide excellent support. And this, in turn, translates to greater customer satisfaction over time.


Interested in learning more about the benefits of incorporating remote support capabilities into your help desk solution? Watch this quick two-minute video to discover the value

SolarWinds® Web Help Desk customers receive from integrating DameWare® Remote Support into their solution.


Do you have other examples of how remote support has helped your help desk achieve greater levels of success? Share a comment to fill us in!

About a year ago, we published the "Monitoring 101" ebook on THWACK and the response was incredible. At the time, I said:


"Despite the relatively maturity of monitoring and systems management as a discrete IT discipline, I am asked - year after year and job after job - to give an overview of what monitoring is."


The page has been viewed over 7,000 times and the document has over 350 downloads by people who want to get up to speed on monitoring themselves, or to share with new members to their team, or to help educate management or external consumer teams about what is possible to achieve with a little bit of knowledge and the right tools. Best of all, the information is completely tool-agnostic so it "works" whether you use SolarWinds solutions or... one of those other guys.


The response to this material online, at trade shows, at at SolarWinds User Groups was so great that we started to present it interactively in SolarWinds Lab episodes (Lab 37, lab 41, and lab 42) and webinars. We even turned part of it into a "Dummies" series guide!!


We recognize that not everyone wants to spend 40 minutes watching a video or reading a 20+ page PDF, so we're offering you another option. Following the success of our "Self Paced Packet Inspection Training" email-based course, we've taken some of the essential concepts from Monitoring 101 and turned it into a FREE seven-day email course. (Join the THWACK page and register here.)


Lessons will explain not only how to perform various monitoring tasks, but why and when you should use them. The lessons are self-contained, meaning that there are no cliffhangers or please-sign-up-for-our-next-course-to-find-out-more ridiculousness. We have broken them into manageable chunks of information, delivery to your inbox means you don't have to remember to go to some website and open a course, and you can work on each lesson at your pace and on your schedule.


Like the Monitoring 101 guide, the email course provides an introduction to monitoring for someone who is familiar with computers and IT in general, but not with monitoring as a discipline. As such, (almost) no former knowledge or experience is required.


Having the right tool for the job is more than half the battle. But, it’s not the whole battle, and it’s not even where the skirmish started. To build an effective monitoring solution, you must first learn the underlying concepts. You have to know what monitoring is before you can set up what monitoring does.


Use this link to find out more and sign up!

In my last post on this topic, I described some scenarios where an outage was significantly extended primarily because although the infrastructure had been discovered and was being managed, a true understanding was still being elusive. In this post, I will meditate on what else we might need in order to find peace with our network or, as network defender so eloquently put it, Know the network. Be the network.


Managing Your Assets


Virtualization is both the best and the worst thing to have happened to our data centers. The flexibility and efficiency of virtualization has changed the way we work with our compute resources, but from a management perspective it's a bit of a nightmare. If you have been around a baby at some point in your life, you may recognize that when a baby is very small you can put it down on its back and it will lie there in the same spot; you can leave the room for five minutes, and when you come back, the baby will be exactly where you put it. At some point though, it all changes. If you put the baby down, you'd better watch the poor thing constantly because it has no interest in staying put; leave the room for five minutes and you may spend the next half an hour trying to find out exactly where it got to. And so it is - or at least it feels - with virtualization. A service that in the past would have had a fixed location on a specific server in a particular data center is now simply an itinerant workload looking for a compute resource on which to execute, and as a result it can move around the data center at the will of the server administrators, and it can even migrate between datacenters. This is the parental equivalent of leaving the room for five minutes and coming back to find that your baby is now in next door's back yard.


The reason we should care about this is because understanding the infrastructure properly means understanding dependencies.


Know Your Dependencies


Let's look at some virtual machines (VMs) and see where there might be dependencies:




In this case, there are users on the Internet connecting to a load balancer, which will send their sessions onward to, say, VM1. In a typical environment with element management systems, we can identify if any of the systems in this diagram fail. However, what's less clear is what the impact of that failure will be. Let's say the SERVER fails; who is affected? We need to know immediately that VM1, VM2, and VM3 will be unavailable, and from an application perspective, I need to know those those virtual machines are my web farm, so web services will be unavailable. I can identify service-level problems because I know what my VMs do, and I know where they are currently active. If a VM moves, I need to know about it so that I can keep my dependencies accurate.


The hypervisor shown is using NFS mounts to populate what the VMs will see as attached storage. It should be clear from the diagram that if anything in the path between the hypervisor and the SAN fails, while the Hypervisor will initially be the one complaining, it won't take too long before the VMs complain as well.


From an alerting perspective, knowing these dependencies means that:

  • I could try to suppress or otherwise categorize alerts that are downstream from the main error (e.g. I don't want to see 100 NFS alerts per second when I already know that the SAN has failed);
  • I know what impact a particular failure will have on the services provided by the infrastructure.


Application Level Dependencies


It may also be important to dig inside the servers themselves and monitor both the server's performance as well as the applications on that server, and the server's view of the infrastructure. For example, if it is reported that MS SQL Server has a problem on a particular node, we can infer that applications dependent on that database service will also be impacted. It's possible that everything in the infrastructure is nominally ok, but there is an application or process problem on the server itself, or perhaps the VM is simply running at capacity. I will say that tools like Solarwinds' Server & Application Monitor are very helpful when it comes to getting visibility beyond the system level, and when used with knowledge of an application's service role this can make a huge difference when it comes to pre-empting problems and quickly identifying the root cause of emergent problems and using that information to ignore the downstream errors and focus on the real issue.


Long Distance Relationships


Let's take this a step further. Imagine that VM3 in the local datacenter runs a database that is used by many internal applications. In another location there's a web front end which accesses that database. An alert comes in that there is a high level of dropped packets on the SAN's connection to the network. Shortly after, there are alerts from VM3 complaining about read/write failures. It stands to reason that if there's a problem with the SAN, there will be problems with the VMs because the hypervisor uses NFS mounts for the VM hard drives.  We should therefore fully anticipate that there will be problems with the web front end even though it's located somewhere else, and when those alerts come in, we don't want to waste any time checking for errors with the web server farm or the WAN link. In fact, it might be wise to proactively alert the helpdesk about the issue so that when a call comes in, no time will be wasted trying to replicate the problem or track down the issue. Maybe we can update a status page with a notice that there is potential impact to some specific services, and thus avoid further calls and emails. Suddenly, the IT group is looking pretty good in the face of a frustrating problem.


Service Oriented Management


One of the biggest challenges with element management systems is in the name; they are managing technologies not services. Each one, reasonably enough, is focused on managing a specific technology to the best of its abilities, but without some contextual information and an understanding of dependencies, the information being gathered cannot be used to its full potential. That's not to say that element management systems have no value; far from it, and for the engineers responsible for that technology, they are invaluable. Businesses, however, don't typically care about the elements, they care about services. When there's a major outage and you call the CTO to let them know, it's one thing to tell them that SAN-17 has failed! but the first question a CTO should be asking in response is What services are impacted by this? If your answer to that kind of question would be I don't know, then no matter how many elements you monitor and how frequently you poll them, you don't fully know your infrastructure and you'll never reach a state of inner peace.


I'm curious to know whether Thwack users feel like they have a full grip on the services using the infrastructure and the dependencies in place, not just a view of the infrastructure itself?



In my next post, I'll be looking at more knowledge challenges: baselining the infrastructure and identifying abandoned compute resources.

It’s a problem many have faced, myself included, in this industry. You spend years honing your craft on a particular skill, narrowed focus, and always learning. Then your organization decides to do a 180, and you’re either out of a job or low person on the totem pole. Think Novell!


How can one dedicate oneself to the task at hand, while attempting to remain relevant to an industry constantly in flux?


Having had this happen to me, I vowed that I’d not allow it to happen again. Of course, that’s easier said than done.


I’ve not had the experience of a developer, having always been an infrastructure guy, though I have supported many. I think that it may be more difficult to maintain or build a skillset in programming, while simultaneously trying to build skills in entirely new platforms, but I do imagine that this is what it’d take to not get phased out.


Following are a list of things that have come naturally to me over the years, and some ways in which to accomplish them.


  • Be Curious


In my case, there is always so much going on, new startups in software tools, storage, and orchestration are being launched practically every day.  To choose some particular piece of this tech, and learn it, if only to be able to speak intelligently about it, takes a bit of effort, but can be accomplished by reading white papers, or attending webinars. Even more beneficial would be to attend trade shows, accept meeting requests from sales teams who’ve targeted you, and most importantly, talk. User Groups, Meetups, and the like have proven to me to be highly effective entrees into newer technologies. Seek out the technologies that interest you, and can be solutions that appear ideal to the problem at hand.


  • Pursue your passions


Be aware that often, your time will be limited, and thus your productivity in launching into technology that you’ve not seen previously, will likely be hampered, if not thwarted entirely, but don’t despair. If you’re truly passionate about a given piece of tech, or the solution to a given issue, that passion will drive you forward to truly learn what you need on it.


Look at the big picture within your organization, take a look at potential needs that are not being fulfilled, or fulfilled well. Evaluate, and pursue the viable candidate(s). Once you’ve fully researched, you can present the findings to management. By establishing your interest and willingness to go above and beyond the call of your day to day job. You’ll then get to pursue this technology and have the opportunity to keep your skills fresh.


  • Look at market trends, and key players


When evaluating the next tool (Piece of Hardware, or software) you’d like to learn, acknowledge that dying technology is probably not a great place to go. You’ll want to know the pros and cons of yesterday’s stuff, as it relates to the solution of a problem, but more relevant are the newer, more elegant solutions to a problem. For example, look how far remote access into an organization has come since the introduction of a VPN into the IT landscape. Your needs will be secure, manageable, scalable, as well as easy to maintain. Some older remote access technologies require huge amounts of maintenance, and might even have difficulty keeping up with threats. There are new ways of accomplishing this, which seem almost revolutionary in comparison. Once you’ve narrowed your choices, you’ll want to make a deeper dive into them.


But to be sure, if you’re evaluating a product or solution set toward a particular goal, it’ll be far more enjoyable to do this, when the solution you’re hoping to solve is compelling to you.


Also, nothing can make your star shine more than taking on and implementing a highly beneficial solution that the organization never even considered. That would involve selling your solution to management, negotiating within workforce, vendors, and contractors as well to make it happen. A successful rollout is highly satisfactory.


Always remember, complacency is the enemy.

Heading to Vegas on Sunday for VMWorld next week. I've got one session, a panel discussion, and I will also be hosting a Meet the Experts session. Oh, and I've also got an Experts and Espresso session as well as an in-booth session. So, yeah, it's a fairly busy week for me. That being said, if you are heading to VMWorld please stop by and say hello, I'd love the opportunity to connect with you while we are there. 


Anyway, here are the items I found most amusing from around the Internet. Enjoy!


Microsoft open sources PowerShell; brings it to Linux and Mac OS X

In related news, Hell has frozen over. From what I can tell, Hell must be full of Linux admins because those are the folks complaining the most about this announcement.


Usain Bolt and the Fastest Men in the World Since 1896 – on the Same Track

Because I love it when someone takes data and presents it in unique ways such as this post.


All the National Food Days

Did you know there are 214 "food" days during the year? Well, now you do. And don't forget September 5th is fast approaching this year!


"Daddy is Working" Light

I would get this except my kids would know on the door to ask me if I forgot to turn it of because they needed something anyway.


All Public Cloud roads lead to Hybrid Infrastructure

Some very interesting thoughts in this article. To me, the idea that a developer spins up a prototype in AWS isn't surprising, as I liken that to the enabling of Shadow IT. What surprises me is how most don't discuss the security and privacy concerns of developers taking such action. That's probably because if you do raise such questions you end up being labeled a "roadblock" to progress. 


Pentagon Has a New Data Center Consolidation Plan

If your company isn't yet in the Hybrid IT world then I just want you to know that the US entity that moves slower than molasses in already there.


65 Business Jargon Phrases to Stop Using and What Do Use Instead

This isn't even a comprehensive list, But if I were allowed an ask it would be that we don't punch a puppy with creating our own list.


Remember when Apple stores sold software? Well, this was the only software I could find the other day at my local Apple store (sorry, it's just Apple now). If Steve Jobs weren't already dead, this would have killed him for certain:



The Hybrid Cloud

Posted by arjantim Aug 23, 2016

Keeping it easy this post. It'll be just a small recap to build some tension for the next few posts. In the last two posts, I talked a little about the private and public cloud, and it is always difficult to write everything with the right words. So, I totally agree with most of the comments made, and I wanted to make sure a couple of them were addressed in this post. Let’s start with the cloud in general:




A lot of you said that the cloud is just a buzzword (or even just someone else’s computer).


I know it’s funny, and I know people are still trying to figure out what cloud is exactly, but for now we (our companies and customers) are calling it cloud. And I know we techies want to set things straight, but for now let’s all agree on calling it cloud, and just be done with it (for the sake of all people that still see the computer as a magical box with stardust as its internal parts, and unicorns blasting rainbows as their administrators.)


The thing is, I like the comments because I think posts should always be written as conversation starters. We are here to learn from each other, and that’s why we need these comments so badly.


The private cloud (or infrastructure) is a big asset for many of the companies we work for. But they pay a lot of money to just set up and maintain the environment, where the public cloud just gives them all these assets and sends a monthly bill. Less server cost, less resource cost, less everything, at least that’s what a lot of managers think. But as a couple of the comments already mentioned, what if things go south? What if the provider goes bankrupt and you can't access your data anymore?


In the last couple of years, we've seen more and more companies in the tech space come up with solutions, even for these kinds of troubles. With the right tools, you could make sure you’re data is accessible, even if your provider goes broke and the lights go out. Companies like Zerto, Veeam, Solarwinds, VMware and many more are handing you tools to use the clouds as you want them, while still being in control and able to see what is going on. We talked about DART and SOAR, and these are very important in this era and the future ahead. We tend to look at the marketing buzz and forget that it's their way of saying that they often don't understand half of the things we do or say, and the same goes for a lot of people outside the IT department. In the end they just want it KISS, and that where a word like "cloud" comes from. But let's go back to hybrid.


So what is hybrid exactly? A lot of people I talk to are always very outspoken about what they see as hybrid cloud. They see the hybrid cloud as the best of both worlds, as private and hybrid clouds combined. For me, the hybrid cloud is much more than that. For me, it can be any combination, even all public, but shared among multiple providers (multi-cloud anybody?!? ), or private and public clouds on-premises, and so on. In the end, the cloud shouldn't matter; it should just be usable.


For me, the hybrid solution is what everybody is looking for, the one ring to rule them all. But we need something software-defined to manage it all.


That's why my next post will be about the software-defined data center. It's another buzzword, I know, but let's see if we can learn a bit more from each other on where the IT world is going to, and how we can help our companies leverage the right tools to build the ultimate someone else’s computer.


See you in Vegas next week?!?

With the federal government’s Cloud First Policy nearly four years old, most agencies already have a clear understanding of the promised values of cloud computing.


That said, there is still plenty of uncertainty and concerns about moving to a cloud environment. How will you secure your data and monitor your applications? Will a cloud environment make your job obsolete? How will your agency manage the changes?


In reality, however, moving to the cloud can have less of an impact than one might imagine. Data will continue to be secure, applications will continue to perform and job security will not change.  You don’t have to lose control.


Today’s environment


Today, you’re encrypting your data, using performance monitoring tools, tracking resource usage and evolving requirements (memory, CPU, etc.), tracking service-level agreements (SLAs) and much more – all considered best practices.


The key is to understand the differences between application requirements and deployment practices.


Protecting data in the cloud simply entails knowing what requirements you must meet, and learning how to do that in the cloud. So the more clarity you have on how your applications work today, the easier your migration will be.


If you understand your application resource contentions, you will know how much memory and CPU your database has been using, but you also need a clear understanding of the source of bottlenecks. This knowledge will ensure you get the capacity you need while meeting your performance requirements.


Your cloud environment


Your cloud environment might actually look quite similar to your data center-hosted environment.


From a security perspective, there are many options available in the cloud. Remember, meeting strict federally mandated security requirements is a cloud provider's bread and butter. All cloud providers that are compliant with the Federal Risk and Authorization Management Program (FedRAMP) meet FISMA-moderate requirements.


It is likely you will end up in a hybrid environment, so you should find a set of monitoring tools that allow you to monitor applications both in the cloud and in your own data center. The key metrics you already track – application performance, memory usage, CPU utilization – should continue to be tracked in the cloud.


Look for tools that allow you to see both sides through a single pane of glass, providing complete visibility across the entire environment. These types of tools provide stability throughout the transition and ease migration.


As for job security, remember that most of the work you do today will continue. You will still be responsible for application performance optimization, for example, but the applications will simply be in a different location. You’ll be tracking performance metrics relate directly to potential cost savings for your agency. Tuning, enhancing efficiency, optimizing resources (cost) and evaluating current practices may also become a larger part of many federal IT jobs.


Focus on data security and optimizing performance, and continue to track resource usage, evolving requirements and SLAs. And remember, the more rigorously you monitor and manage your applications today, the easier – and more cost effective – your migration will be.


Find the full article on Government Computer News.


What is the difference in the guest OS measurement of CPU and memory in a virtual machine (VM) versus the CPU and memory utilization of the host server? Why does each matter? Each serves their own purpose. Perspective is key to properly utilize them as you optimize your virtual environment.


At a high level, the CPU and memory utilization reported by the host server takes into account all the VMs on that system, their scheduling, and any privileged instructions as consumption takes place against that host’s resources. CPU and memory reported from this perspective is at a system level. Optimization requires having a holistic view of the system, the VMs on it, and their applications since over-commitment of system resources can cause bottlenecks.


The guest OS measurement of CPU and memory is measured from the perspective of the VM with respect to those resources that have been provisioned for it. These metrics have neither awareness of the VMkernel and its scheduling nor the physical system’s overall system metrics and the system’s scheduling though it can definitely be impacted by what happens on the system. Point in case, noisy neighbor VMs. Optimization focuses on understanding the behavior of the VM and its application. For instance, the threadedness of the application can come into play. A single-threaded application does not benefit from additional vCPUs because the app won’t be able to take advantage. Plus, those unused vCPUs waste pCPU and can hurt overall system performance, which in turn, can affect that VM's performance.


The end-goal remains delivering application Quality-of-Service that is acceptable to end-users. Both points-of-view are important in how one designs and implements their virtualization infrastructure as well as proper resource allocation to VMs and across clusters of host systems. They both serve purposes in root-causing bottlenecks and proper remediation of those issues.


Share your thoughts in the comments section. Below are additional materials for reference.



VMware’s CPU scheduler white paper:


VMware Knowledge Base Article:

How much storage do you have? What storage are you using? When will you need more?


These seem to be simple questions, and that's often how they're approached. But if you're a systems administrator "on the hook" for storage, you know they're actually devilishly complex to answer. Storage utilization metrics have been a challenge from the beginning, and it's only getting more complex in our world of storage virtualization, shared storage, and cloud storage! Let's get back to the basics and think about the root questions here and how we can solve them.


Storage Utilization: Why Do You Care?


Here's a radical statement coming from a "storage guy" like me: Don't obsess about storage capacity for its own sake, since it's basically free these days.


Now that I've given that a moment to sink in, let's consider the value of a byte. I recently bought 10 TB of storage at retail for under $250, including tax. That's a whole lot of storage! In fact, it's enough capacity to hold the core financial and operational data of a midsize company! 5 TB, 6 TB, 8 TB, and even 10 TB hard disk drives are readily available today, and they're available for under $1,000!


With prices like these, why do we even care about storage capacity anymore?

  1. Storage performance remains very expensive, even though widespread use of flash and SSD's have radically opened up what's possible in terms of performance
  2. Advanced storage features are expensive, too, and they're what we're really paying for when we buy enterprise storage
  3. IT budgets, approvals, and group ownership remain confounding factors in making efficient use of storage


In other words, even though storage capacity is free, everything else still costs a whole lot of money! Storage management has never really been about the bytes themselves. It's all about making data available to the business on demand, every time. And that's a much bigger issue!


Yet all of those things (performance, features, and bureaucracy) are linked to those core questions at the top of the page. When people ask how much storage they have, how much they're using, and how much is available what they're really asking is, "can the storage environment support my needs?" Therefore, answering capacity questions requires thought and consideration. And the answer is never a simple number!


How Much Storage Do You Have?


You have more than enough storage today. Otherwise, you'd be in "crisis mode" installing more storage, not reading blog posts! But it's painfully difficult to answer even this basic question!


Most businesses purchase storage on a per-project or per-department basis, and this is probably the worst possible way to do it. It encourages different groups to be misers, hiding storage from each other lest someone else take it. After all, if engineering paid for a storage array, why should sales be able to use some of it?


Even if people have the best intentions, "orphan storage" is common. As servers and applications come off-line, "shared" storage systems are the last to go. It's typical for companies to have many such storage arrays still soldiering on, supporting a few leftover servers in a corner somewhere. Some storage is "orphaned" even before it's ever used, having been purchased and saved for an application that never needed it.


A few years back, I used to do audits of enterprise storage environments, and I always started with an in-person tour of the data center. My most-common question was, "what's that storage array over there?" And the answer was always illuminating and a little embarrassing for someone. But that wasn't my goal; I just wanted to get a baseline of what was in place today.


What Storage Are You Using?


Even storage systems in active use might not meet your needs. Outdated and over-burdened arrays are just as common as orphans. And duplicate data is everywhere in a modern datacenter!


Companies often purchase systems that match their budgets rather than their needs, leading to odd mis-matches between application criticality and system capability. And vendor sales representatives are as much to blame, pushing inappropriate systems on companies.


The key question in terms of storage in use is suitability to task, not just capacity used. Are applications I/O constrained? There's plenty of storage performance to be had, thanks to SSD's and flash storage arrays! Do you need to actively share and move data? Today's arrays have fantastic snapshot, cloning, and replication features just waiting to be exploited! And there's specialized software to manage storage, too!


My storage audits also included a census of data in active use. I had the systems and database administrators tell me how much data was actively used in their applications. Once again, the answer was shocking to jaded managers used to talking about petabytes and usually exabytes. Indeed, most large companies had only a small amount of active data: I often recorded single-digit percentages! But this too is misleading, since modern systems need test and development, backup and archiving, and space to grow.


When Will You Need More Storage?


A critical question is, "what kind of storage will you need in the future?" Applications are going in all directions today: Some need scalable, API-driven cloud storage, others need maximum performance, while more need integration and data management features. The old standby storage arrays aren't going to cut it anymore.


Don't just look to expand existing infrastructure, which is likely out of date and inappropriately used anyway. Consider instead what you want to do with data. There are lots of wonderful solutions out there, from large and small companies, if you're willing to look beyond what you have.


Encourage your company to move to a "storage utility" model, where a general storage budget goes to IT instead of being doled out on a per-project basis. Then you can develop a multi-product storage service with SAN, NAS, Object/Cloud, and even Software-Defined Storage. And maybe you can stay ahead of the question in the future.


Another option is to purchase per-application but be very conservative when selecting. Try to keep variety to a minimum and don't over-purchase. Over-buying leads either to orphans or "hop-ons", and neither is good for business.


Regardless, make sure the storage meets the needs of the application. Isn't that better than just worrying about capacity?


I am Stephen Foskett and I love storage. You can find more writing like this at, connect with me as @SFoskett on Twitter, and check out my Tech Field Day events.



How to use network configuration, change, and compliance management (NCCCM) and other monitoring software in response to an actual security breach.


If you have not read part one, I would suggest that you give that an overview, so you can understand fully how and why this comes into play. For those that are ready for part two, welcome back!  I'll attempt to share some assessments of an internal sabotage and how to use things like monitoring and management software to see and recovery.  The best way to respond is by thinking ahead, having clear steps to prevent, and halt further damage.


Today, we are going to dive into a couple of scenarios, and directly assess ways to be alerted to and address situations that may be taking place within your organization.  Now, should we all live like we have a monkey on our back/shoulder?  No, but it doesn't hurt to have a little healthy "skepticism" about unusual things that are happening around you.  Being aware of your surroundings allows you to fight back and take back control of hiccups along the way.



Internal Planning Possible Sabotage:

Things to look for visually as well as with monitoring and management software.


  • Unusual behavior (after a confrontation or write-up has happened) - thank you sparda963 I forgot to place when to look for this
    • This can be obviously aggressive, but the one often overlooked is "overly" nice and helpful.
      • Yes, this sound condescending and I understand that concern but think of this as out of character.  They now want to help higher levels with mission critical information or configurations.  They want to "watch" you command line interface to a device.  They are "contributing" to get to know where key points are.  These are things that are outside of their scope.
    • Aggressive well the writing is on the wall at that point and if secretive comes into play then watch out and plan accordingly.
    • Use Real-time change notifications, approval systems, and compliance to help you see changes made, and users added to devices of monitoring management software.
      • Make sure that you have a script to remove access to devices ahead of time.  One that you can fill in the blank for the user ID and take permissions away quickly.
      • Verify you have alerts set up to notify you with quick access to the devices through a management software so you can cancel access levels and revert changes quickly.
  • Logon's found in unusual servers by said person
    • Use a Log Event Monitor to help you be alerted with strange behavior to login attempts and places.
    • Know your monitoring software and have quick pages to deny access to accounts quickly
  • New Users
    • Use a Log Event Monitor to alert you to new account creations.  You need to know when these were created and had a trail on these to remove.
  • Job creation for mass configuration changes
    • Verify through an approval system all changes on your network.  An excellent way to do this is with an NCCCM product and enable the approval system to be fully active.  You will want at least a 2 level approval system to help prevent issues and possible changes.
    • Real-time change notification with segmented emails for critical devices. 
    • Backups to be quickly accessible and found in multiple locations to ensure access during a breach.


Internal Execution of Sabotage:

Things to do if you find yourself under attack

(Network Side)

  • First things first
    • Log Event Monitoring - should be alerting you to access violations, additions of accounts, or deleting of accounts
    • TACACS - should be enabled and in full use for auditing within your monitoring and management software choices
    • Real-Time change notifications should be sending emails immediately to the correct people with an escalation of higher up network engineers on your team.
  • Now to fight back!
    • If they are opening firewalls to gain access you need to shut these down and stop traffic immediately.  You will need to have a plan on a script for a shut all or use something like Firewall Security Manager or Network Configuration Manager to implement commands from a stored location.
      • Allows time to figure out the user and what is going on while you can have the floodgate closed.
      • Addressed in a security protocol to enable you to have this authority.  Saving you and your company a lot of money when you are trying to prevent a massive break-in.
    • If they are deleting router configs
      • Real-time change notification (RTN) alerts should be sent out to you to bring you up to speed.
        • Use a script to deny access to the user that made the change shown in the RTN email.
        • Revert configurations from within your NCCCM software and get these back online
      • Verify users that have access
        • Use a compliance report to check access levels and remove where needed.
        • CONTINUE to monitor these reports
      • Check you Approval system
        • Verify who has access
        • Change passwords to all monitoring and management software logins.
          • I have had a customer that would set these up to one password for all that he would create if in crisis.  Allowing a quick shutdown of software usage to gain control when an attack was ensuing.
    • Verify critical application status
      • Log event monitor - check logs to see if access has been happening outside of usual
      • NetPath or something similar for pathways to check accessibility or changes
      • NCCCM - Verify all changes that have occurred within the past seven days minimum as this could only be the first wave of intrusion.
      • Network performance monitor to verify any malware or trojans that could be lingering and sending data on your network.
        • Volumes filling up and being alerted to this
        • Interface utilization skyrocketing
        • NetFlow monitor showcasing high amounts of unusual traffic or NO traffic history is essential here.


Security gut check:

Things to go over with yourself and team to make sure your security and plans for recovery are current.



  • understand and know what is critical information within your organization
  • Where are your system boundaries
  • Pinpoint your security documentation


  • Setup a meeting with your team over the above pre-assessment
  • Review your security information
  • Practice scenarios that "could" happen within your networks
  • Setup session controls
  • Verify maintenance plans
  • Ensure mapping of your critical networking connections with critical applications
  • Ensure your policies are relevant today as they were when first created
  • Verify entry points of concerns
    • Internal/External
  • System and Network Exposures


Team Analysis

  • Where are your vulnerabilities?
  • What are your Countermeasures?
  • What is the impact if breached?
  • Who can segment and take on sections of security recommendations?



  • Implement new security plans as defined and found above.
  • Set up a meeting review for at least three months later to make sure all vulnerabilities are known and addressed.
  • Verify that the plan is accessible for your team to review so they are aware of actions to take.
  • Sign an agreement within your team to follow these protocols.



Well, that is a lot to cover, whef!  Once again everyone's networks and infrastructures are different.  You and I understand that.  The main point is how to use tools to help you stay ahead and be able to fight back with minimal damage.  Having a recovery plan and consistently updating these to new vulnerabilities is vital to stay ahead.  You can shift these and use for outside attacks as well.  Security is a fluid dance and ever changing so don't be stuck sitting on the outside looking in. 



Thank you,





In my previous post, I wrote about becoming an Accidental DBA whether or not you had that title formally.  I described the things a Minimalist DBA should focus on before jumping into performance tuning or renaming that table with the horribly long name (RETAIL_TRANSACTION_LINE_ITEM_MODIFIER_EVENT_REASON I <3 you.)   In today's post I want to cover how you, personally, should go about prioritizing your work as a new Accidental DBA.  You


  Most accidental DBAS perform firefighter-like roles: find the fire, put it out, rush off to the next fire and try to fight it as well. Often without the tools and help they need to prevent fires. Firefighting jobs are tough and exhausting.  Even in IT.  But they never allocate time to prevent fires, to maintain their shiny red fire trucks, or to practice sliding down that fire pole.


How to Prioritize your Accidental DBA Work


  1. Establish a good rule of thumb on how decisions are going to be made.  On a recent project of mine, due to business priorities and the unique type of business, we settled on Customer retention, legal and application flexibility as our priorities.  Keep our customers, keep our CIO out of jail, and keep in business. Those may sound very generic, but I've worked in businesses where customer retention was not a number one priority. In this business, which was membership and subscription based, we could not afford to lose customers over system issues.  Legal was there to keep our CIO and CEO out of jail (that's what ROI stands for: Risk of Incarceration).  Application flexibility was third because the whole reason for the project was to enable business innovation to save the company.

    Once you have these business priorities, you can make technical and architectural decisions in that context.  Customer retention sounds like a customer service issue, but it's a technical one as well.  If the system is down, customers can not be customers.  If their data is wrong, they can't be customers.  If their data is lost, they can't be customers. And so on.  Every decision we made first reflected back to those priorities.

  2. Prioritize the databases and systems.  Sure, all systems are important.  But they have a priority based on business needs. Your core selling systems, whatever they might be, are usually very high priority.  As are things like payroll and accounting.  But maybe that system that keeps track of whether employees want to receive a free processed meat ham or a chunk of processed cheese over the holidays isn't that high on the list.  This list should already exist,  at least in someone's head.   There might even be an auditor's report that says if System ABC security and reliability issues aren't fixed, someone is going to go to jail.  So I've heard.  And experienced. 

  3. Automate away the pain…and the stupid.  The best way to help honor those priorities is to automate all the things. In most cases, when an organization doesn't have experienced or dedicated DBAs, their data processes are mostly manual, mostly reactive, and mostly painful.  This is the effect of not having enough knowledge or time to develop, test, and deploy enterprise-class tools and scripts.  I understand that this is the most difficult set of tasks to put at a higher priority if all the databases are burning down around you. Yes, you must fight the fires, but you you must put a priority on fire reductions.  Otherwise you'll just be fighting bigger and more painful fires.

    Recovery is the most important way we fight data fires.  No amount of performance tuning, index optimization, or wizard running will bring back lost data.  If backups are manual, or automated and never tested, or restores are only manual, you have a fire waiting to happen. Head Geek Tom LaRock sqlrockstar says that "recovery is Job #1 for a DBA".  It certainly is important. A great DBA automates all backups and recovery. If you are recovering manually, you are doing it wrong.

      Other places where you want automation is in monitoring and alerting.  You want to know something is going on even before someone smells smoke, not when users are telling you the database is missing.  If your hard drive is running out of space, it is generally much faster to provision more resources or free up space than it is to recover a completely down system.  Eventually you'll want to get to the point where many of these issues are taken care of automatically.  In fact, that's why they invented cloud computing.

Get Going, the Alarm Bell is Ringing!


Become the Best DBA: A Lazy DBA. Lazy DBAs automate the stuff out of everything.  And lazy DBAs know that automating keeps databases from burning. They automate away the dumb mistakes that happen when the system is down, they automate test restores,  they automate away the pain of not knowing they missed setting a parameter when they hit ENTER.  They know how to get to the fire, they know what to do and they fix it.

The Best DBAs know when database are getting into trouble
, long before they start burning down.

The Best DBAs don't panic
.  They have a plan, they have tools, they have scripts.  When smoke starts coming out of the database, they are there.  Ready to fight that fire.  They are ready because they've written stuff down. They've trained.  They've practiced.  How many clicks would it take you to restore 10 databases?  Would you need to hit up Boogle first to find out how to do a point-in-time restore? Do you know the syntax, the order in which systems have to be restored? Who are the other people you have to work with to fix this fire?


As a new DBA, you should be working on automation every day, until all that work frees up so much of your time you can work on performance tuning, proper database design, and keeping your database fire truck shiny.

I'm in Austin this week to film more SolarWinds Lab episodes and conduct general mischief. Lucky for me Delta had a MUCH better Monday this week than last week.


Anyway, here are the items I found most amusing from around the Internet. Enjoy!


Bungling Microsoft singlehandedly proves that golden backdoor keys are a terrible idea

If you have ever wanted to run Linux on your Surface, now is your chance! I'm look at you adatole.


Millions of Cars Vulnerable to Remote Unlocking Hack

This is why I drive an 2008 Jeep. It's like the Battlestar Galactica: too old to be hacked by new technology.


Scammers sneak into customer support conversations on Twitter

I've noticed a handful of fake support accounts popping up lately, so this is a good reminder to be careful out there.


Best Fighter Jet In History Grounded By Bees

No, seriously. And I dig the part where they decided not to just kill the bees, but to relocate them. Nice touch.


Walmart and the Multichannel Trap

Wonderful analysis of where Walmart is headed, and why Amazon is likely to run all retail in the future. After living through what Walmart did to Vlasic 20 years ago, this seems fitting to me.


Networking Needs Information, Not Data

Nice post that echos what I've been saying to data professionals for a few years now. There is a dearth of data analysts in the world right now. You could learn enough about data analytics in a weekend that will impact your career for the next 20 years, but only if you get started.


Even Michael Phelps knows it's football season, right kong.yang ?


data_security.jpgData is an incredibly important asset. In fact, data is the MOST important asset for any company, anywhere. Unfortunately, many continue to treat data as an easily replaced commodity.


But we’re not talking about a database administrator’s (DBA) iTunes library. We’re talking highly sensitive and important data that can be lost or compromised.


It’s time to stop treating data as a commodity. We need to create a secure and reliable data recovery plan. And we can get that done by following a few core strategies.


Here are the six easy steps you can take to prevent data loss.


Build a Recovery Plan

Novice DBAs think about backups as the starting point for data loss. It is the experienced senior DBAs that know the starting point is building the recovery plan.


The first thing to do here is to establish a Recovery Point Objective (RPO) that determines how much data loss is acceptable. Understanding acceptable risk levels can help establish a baseline understanding of where DBAs should focus their recovery efforts. Then, work on a Recovery Time Objective (RTO) that shows how long the business can afford to be without its data. Is a two-day restore period acceptable, or does it have to be 15 minutes?


Finally, remember that “high availability” and “disaster recovery” are different. A DBA managing three nodes with data flowing between each may assume that if something happens to one node the other two will still be available. But an error in one node will undoubtedly get replicated across all of them. You better have a recovery plan in place when this happens.


If not, then you should consider having an updated resume.


Understand That Snapshots != Database Backups

There’s a surprising amount of confusion about the differences between database backups, server tape backups, and snapshots. Many administrators have a misperception that a storage area network (SAN) snapshot is good enough as a database backup, but that snapshot is only a set of data reference markers. The same issue exists with VM snapshots as well. Remember that a true backup is one that allows you to recover your data to a transactionally consistent view at a specific point in time.


Also consider the backup rule of three, where you save three copies of everything, in two different formats, and with one off-site backup. Does this contain hints of paranoia? Perhaps. But it also perfectly illustrates what constitutes a backup, and how it should be done.


Make Sure the Backups Are Working

There is only one way to know if your backups are working properly, and that is to try doing a restore. This will provide assurance that backups are running -- not failing -- and highly available. This also gives you a way to verify if your recovery plan is working and meeting your RPO and RTO objectives.


Use Encryption

Data-at-rest on the server should always be encrypted, and there should also be backup encryption for the database as well as the database backups. There are a couple of options for this. DBAs can either encrypt the database backup file itself, or encrypt the entire database. That way, if someone takes a backup, they won’t be able to access the information without a key.


DBAs must also ensure that if a backup device is lost or stolen, the data stored on the device remains inaccessible to users without proper keys. Bio-level encryption tools like BitLocker can be useful in this capacity.


Monitor and Collect Data

Real-time data collection and real-time monitoring should also be used to help protect data. Combined with network monitoring and other analysis software, data collection and monitoring will improve performance, reduce outages, and maintain network and data availability.


Collection of data in real-time allows administrators to perform proper data analysis and forensics, making it easier to track down the cause of an intrusion, which can also be detected through monitoring. Together with log and event management, DBAs have the visibility to identify potential threats through unusual queries or suspected anomalies. They can then compare the queries to their historical information to gauge whether or not the requests represent potential intrusions.


Test, Test, Test

This is assuming a DBA has already tested backups, but let’s make it a little more interesting. Let’s say a DBA is managing an environment with 3,000 databases. It’s impossible to restore them every night; there’s simply not enough space or time.


In this case, DBAs should take a random sampling of their databases to test. Shoot for a sample size representing at least 95 percent of the 3,000 databases in deployment, while leaving a small margin of error (much like a political poll). From this information DBAs can gain confidence that they will be able to recover any database they administer, even if that database is in a large pool. If you’re interested in learning more, check out this post, which gets into further detail on database sampling.



Data is your most precious asset. Don’t treat it like it’s anything but that. Make sure no one is leaving server tapes lying around cubicles, practice the backup rule of three, and, above all, develop a sound data recovery plan.

The term "double bind" refers to an instance where a person receives two or more conflicting messages, each of them negating the other. Addressing one message creates a failure in the other, and vice-versa. A double bind is an unsolvable puzzle resulting in a no-win situation.


As a federal IT professional, you’ve probably come across a double bind or two in your career, especially in regard to your network and applications. The two depend on each other but, often, when something fails, it’s hard to identify which one is at fault.


Unlike a true double bind, though, this puzzle actually has a two-step solution:


Step 1: Check out your network


Throughout history, whenever things slow down, the first reaction of IT pros and end-users alike has been to blame the network. So let’s start there – even though any problems you might be experiencing may not be the poor network’s fault.


You need to monitor the overall performance of your network. Employ application-aware monitoring and deep-packet inspection to identify mission-critical applications that might be creating network issues. This can help you figure out if the issue is a network or application problem. If it’s a network problem, you’ll be able to quickly identify, and resolve it.


What if it’s not the network? That’s where step two comes in.


Step 2: Monitor your application stack


Federal agencies have become reliant upon hundreds of applications. Each of these applications is responsible for different functions, but they also work together to form a central nervous system that, collectively, keeps things running. Coupled with a backend infrastructure, this application stack forms a critical yet complex system in which it can be difficult to identify a malfunctioning application.


Solving this challenge requires cultivating an application-centric view of your entire application stack, which includes not just the applications themselves, but the all the components that help them operate efficiently, including the systems, storage, hypervisors and databases that make up the infrastructure.


Consolidating management of your infrastructure internally and maintaining control of your application stack can help. Maintaining an internal level of involvement and oversight, even over cloud-based resources, is important and this approach gives you the control you need to more easily pinpoint and quickly address problems.


Of course, the best way to remediate issues is to never have them at all, but that’s not entirely possible. What you can do is mitigate the chances for problems by weighing performance against financial considerations before making changes to your network or applications.


For example, for many Defense Department agencies, the move to the cloud model is driven primarily by the desire for cost savings. While that’s certainly a benefit, you cannot discount the importance of performance when it comes to compute, storage and networking technologies, which are just as important.


These early considerations – combined with a commitment to network monitoring and a complete application stack view – can save you tons of money, time and trouble. Not to mention keeping you out of some serious binds.


Find the full article on Defense Systems.

Capacity Planning 101

The objective of Capacity Planning is to adequately anticipate current and future capacity demand (resource consumption requirements) for a given environment. This helps to accurately evaluate demand growth, identify growth drivers and proactively trigger any procurement activities (purchase, extension, upgrade etc.).


Capacity planning is based primarily on two items. The first one is analyzing historical data to obtain organic consumption and growth trends. The second one is predicting the future by analyzing the pipeline of upcoming projects, taking also in consideration migrations and hardware refreshes. IT and Business must work hand-in-hand to ensure that any upcoming projects are well-known in advance.


The Challenges with Capacity Planning or “the way we’ve always done it”


Manual capacity planning by running scripts here and there, exporting data, compiling data and leveraging Excel formulas can work. However, there are limits of one’s time availability, and at the expense of not focusing into higher priority issues.


The time spent on manually parsing data, reconciling and reviewing can be nothing short of a huge challenge, if not a waste of time. The larger an environment grows, the larger the dataset will be, the longer it will take to prepare capacity reports. And the more manual the work is, the more it is prone to human errors.  While it’s safe to assume that any person with Excel skills and a decent set of instruction can generate capacity reports, the question remains about their accuracy. It’s also important to point out that new challenges have emerged for those who like manual work.


Space saving technologies like deduplication and compression have complicated things. What used to be a fairly simple calculation of linear growth based on growth trends and YoY estimates is now complicated by non-linear aspects such as compression and dedupe savings. Since both compression and deduplication ratios are dictated by the type of data as well as the specifics of the technology (see in-line vs. at-rest deduplication, as well as block size), it becomes extremely complicated to factor this into a manual calculation process. Of course, you could “guesstimate” compression and/or deduplication factors for each of your servers. But the expected savings can also fail to materialize for a variety of reasons.


Typical mistakes in capacity management and capacity planning involve space reclamation activities at the storage array level. Rather, the lack of  awareness and  activities on the matter. Monitoring storage consumption at the array level without relating with the way storage has been provisioned at the hypervisor level may result in discrepancies. For example, not running Thin Provisioning Block Space Reclamation (through the VMware VAAI UNMAP primitive) on VMware environments may lead some individuals to believe that a storage array is reaching critical capacity levels while in fact a large portion of the allocated blocks is no longer active and can be reclaimed.


Finally, in manual capacity planning, any attempt to run “What-If” scenarios (adding n number of VMs with a given usage profile for a new project) are wild guesses at best. Even while having the best intentions and focus, you are likely to end up either with an under-provisioned environment and resource pressure, or with an over-provisioned environment with idle resources. While the latter is preferable, this is still a waste of money that might’ve been invested anywhere else.


Capacity Planning – Doing It Right


As we’ve seen above, the following factors can cause incorrect capacity planning:

  • Multiple sources of data collected in different ways
  • Extremely large datasets to be processed/aggregated manually
  • Manual, simplistic data analysis
  • Key technological improvements not taken into account
  • No simple way to determine effects of a new project into infrastructure expansion plans


Additionally, all of the factors above are also prone to human errors.


Because the task of processing data manually is nearly impossible and also highly inefficient, precious allies such as Solarwinds Virtualization Manager are required to identify real-time issues, bottlenecks, potential noisy neighbors as well as wasted resources. Once these wasted resources are reclaimed, capacity planning can provide a better evaluation of the actual estimated growth in your environment.


Capacity planning activities are not just about looking into the future, but also about managing the environment as it is now. The link between Capacity Planning and Capacity Reclamation activities is crucial. Just as you want to keep your house tidy before planning an extension or improving it with new furniture, the same needs to be done with your virtual infrastructure.


Proper capacity planning should factor in the following items:

  • Central, authoritative data source (all the data is collected by a single platform)
  • Automated data aggregation and processing through software engine
  • Advanced data analysis based on historical trends and usage patterns
  • What-If scenarios engine for proper measurement of upcoming projects
  • Capacity reclamation capabilities (Managing VM sprawl)




Enterprises must consider whether capacity planning done “the way we’ve always done it” is adding any value to their business or rather being the Achilles heel of their IT strategy. Because of its criticality, capacity planning should not be considered as a recurring manual data collection/aggregation chore that is assigned to “people who know Excel”. Instead, it should be run as a central, authoritative function that measures current usage, informs about potential issues and provides key insights to plan future investments in time.

Filter Blog

By date:
By tag: