Skip navigation

In my time at Tech Field Day, I've heard a lot of discussion about monitoring products. Sometimes these talks get contentious, with folks pointing out that a certain feature is "useless" while another (usually missing) one is absolutely critical for a product to be taken seriously. Then one day it clicked: Monitoring isn't one thing; there are lots of different tasks that can be called "monitoring"!

Diff'rent Strokes

Let's consider storage. Administrators are concerned with configuration and capacity. IT management is worried about service levels and cost. Operations worries about failures and errors. The vendor has a whole set of parameters they're tracking. Yet all of these things could be considered "monitoring"!

Realizing that monitoring means different things to different people, I've come to look at monitoring tools differently too. And it's really brought them into focus for me. Some tools are clearly designed for IT, with troubleshooting and capacity planning as the focus. Others are obviously management-focused, with lots of pretty charts about service level objectives and chargeback. And so on.

Given the diversity of these tools, it's no wonder that they appear controversial when viewed by people with different perspectives. Systems administrators have a long-standing disdain for cost accounting tools, so it's no wonder they flinch when presented by a tool that focuses on that area. And they'd be sorely disappointed if such a monitoring package didn't show hardware errors, even though these wouldn't be appropriate for an IT management audience.

So What Cha Want?

Without getting too "zen" I think I can safely say that one must look inside oneself before evaluating monitoring products. What features do you really need? What will help you get your job done and look like a superstar? What insight are you lacking? You must know these points first before you even consider looking at a flashy new product.

And some products sure are flashy! I'll admit, I've often been sucked in by a sleek, responsive HTML5 interface with lots of pretty graphics and fonts. But sometimes a simple text-mode tool can be much more useful! Never underestimate the power of "du -sk * | sort -n", "iostat -x 60", and "df` -h"! But high-level tools can be incredibly valuable, too. I'd rather have a tool surface the critical errors than try to "awk" my way through a log file...

Consider too whether you're ready to take advantage of the features offered. No amount of SLO automation will help you develop the SLA's in the first place. And does any company really have an effective cost model? The best tools will help you build understanding even if you don't have any starting inputs.

Stephen's Stance

I can't tell you how often I've tried out a new monitoring tool only to never return to look at the output. Monitoring isn't one thing, and the tools you use should reflect what you need to accomplish. Consider your objectives and look for tools that advance them rather than trying out every cool tool on the market.

 

I am Stephen Foskett and I love storage. You can find more writing like this at blog.fosketts.net, connect with me as @SFoskett on Twitter, and check out my Tech Field Day events.

Hello from Las Vegas and VMWorld! I'd love to tell you more about all that is happening here but, you know what they say, right?

 

Anyway, here are the items I found most amusing from around the Internet. Enjoy!

 

4 Reasons to Run Your Own Email Server

No, nope, never, and not going to happen. Running your own email server is not a smart idea.

 

Windows 10 update breaks PowerShell and Microsoft won't fix it until next week

So, Powershell runs on Linux, but it can't run on Windows 10. Good to know.

 

This is Parenthood

Seems about right.

 

The self-driving car is old enough to drink and drive

Interesting article on the evolution of automobile autonomy. I had no idea that we were dong research on this 60 years ago.

 

Pokémon Go loses its luster, sheds more than 10 million users

Oh noes! Has the bubble burst already for this? But I'm only on level 6!

 

If you don't pay attention, data can drive you off a cliff

There is a dearth of people in the workforce that can analyze data properly. This article helps point out a handful of ways people make mistakes every day.

 

Google Explains What Went Wrong to Cause PaaS Outage

Remember kids, even the Cloud can go down. Don't listen to anyone that tries to tell you different, or dismisses outages as "small events".

 

My new shoes arrived in time for VMWorld:

 

cool-bacon-shoes-design.jpg

MEME.jpg

Blog based on my "knee jerk" response to an article on an NSA breach

 

                So when you first read this article, you will notice that there are groups of hackers that are auctioning off exploits of devices.  May seem like no big deal but think about this. You have a group of people that are preying on your first line of defense and profiting on making these exploits available.  Irritation set to the highest level for one simple reason. NOT EVERYONE HAS A SECURITY TEAM. Ok, now that I feel better to commence the discussion on how they did this and why you may be concerned.

 

                Exploiting firewalls, you are now placing into the world factory defaults and settings that people may overlook or not think about when protecting your network.  Creating a gateway for script kitties and ill-willed individuals to try now and do harm just because the day ends in “Y”.  An example of why I constantly preach about compliance reports and their ability to help you protect your network and not forget the little things.

 

Some of the vulnerabilities listed were things like:

Buffer overflow in OpenLDAP

SNMP exploits on devices

Scripting advisement to gain more havoc

And much more…

 

So how do we guard against these untimely and devastating breaches?  One answer, stop ignoring security needs.  There are several free resources that help you protect yourself.  I realize a lot of people may or may not know these so I thought I would put together a few.

 

Common Vulnerabilities and Exposures

https://cve.mitre.org/

National Vulnerability Database

https://web.nvd.nist.gov

 

                If you read any of my NCM blogs, you would know that it has a firmware vulnerability data. Checking the NIST and advises you of security holes on your Cisco devices. Not a “catch-all” by any means but helps you to be aware and proactively having security checks every day by default.  Then as always there are compliance reports with even federal compliance reports right out of the box. Allowing you to lean on what others have created to ensure that you are crossing your T’s and dotting your I’s within your security needs.

 

                These are all ways we can try to use products to help us every day and have a direction to head in instead of ignoring or just simply put don’t make the time to address.  Monitoring and management software needs to be an everyday defensive tool.  To help offer guidance with your security needs and allowing you to work on security today and tomorrow.  Security teams can lean on monitoring\management solutions.  It’s not just for people that are lacking the funding for a security team it’s for everyone to stand together and help stand up to people exploiting for hire.

 

                Circling back to my last opinion on this article.  For hire, exploits are just as bad as hackers with ransomware.  These were merely saying “hey, pay me and I’ll tell you how you can do some damage” where ransomware is more “Hey, I encrypted or stole your data give me $$$ to (maybe) get it back.”  Is there a difference in the level of punishment if ever caught? I think there is not and we need to have better ways to prosecute and track down these criminals.  What’s your thoughts?  I’m always open to opinions and love hearing all of your comments!

 

~Dez~

Follow me on Twitter @dez_sayz

You’ve probably heard a lot about how government agencies need to move on from legacy technology in order to become more agile and built for the future, but what about moving on from a legacy culture?

 

The traditional approach consisted of development and operations teams working in separate silos, each with their own roles and responsibilities. This mindset is in direct contrast to the DOD’s modernization initiative, which seeks streamlined operations and open collaboration.

 

A cultural movement

 

DevOps can be described as a cultural movement that breaks down siloes by combining development and operations teams into a cohesive whole. It offers developers the ability to have a seat at the operations table, and operations managers to actively participate in application development. It fosters greater agility and the ability to develop and deploy IT initiatives faster than ever before and, as such, helps support the agile and iterative practices outlined in the U.S. Digital Services Playbook and technology initiatives like the Joint Information Environment.

 

There are still some significant hurdles that federal agencies must clear if they’re to get on the DevOps train.

 

First, agencies must be willing to eschew decades of traditional work processes and adapt to this new approach. This is a very important starting point toward implementing DevOps.

 

Achieving success with DevOps requires a significant level of adaptability and commitment from everyone within your agency. It’s not enough to have a few forward-thinking developers and operations managers. Everyone needs to get behind the DevOps mentality and be willing to forsake walled gardens and waterfall approaches in favor of collaboration and shared responsibilities.

 

This culture must be embraced, nurtured, and maintained. You also need to be willing to modernize your technology along with your thinking, because technology certainly plays a role in enabling the culture shift. After all, what good is changing the culture if it doesn’t have the technology to back it?

 

As such, DevOps requires the support of highly adaptable and automated solutions that support your team’s goals of continuous innovation and delivery. Solutions integral to the support of DevOps include configuration management that enables automation, a code repository, monitoring and logging tools.

 

Monitoring tools give developers and ops managers the visibility into how the code performs and how the system is running and allows them to quickly and easily identify any faults or resource contention in the application infrastructure. Automation tools allow for rapid releases and scaling tasks as well as auto-remediation of known issues. Lastly, logging tools provide the necessary play-by-play of what’s happening in the DevOps environment that is essential for troubleshooting.

 

If you and your agency are ready to move toward a DevOps culture, you can prepare yourself for some significant benefits. DevOps allows your agency to embrace a cultural framework that adapts to disruptive technology. This positions you well for both the present and the future of IT, all while giving you the chance to play an important role in that future.

 

For more information, check out the DevOps panel from thwackCamp 2015.

 

  Find the full article on our partner DLT’s blog, TechnicallySpeaking.

In my previous posts, I have covered how to manage VM sprawl and how to do proper Capacity Planning. In this post, I would like to share my experience of Solarwinds Virtualization Manager 6.3.

 

Today’s virtualized data centers are dynamic environments where a myriad of changes (provisioning, snapshotting, deletions etc.) executed by numerous people are difficult to track. Furthermore, virtualization has lifted the traditional physical limits in terms of resource consumption where a workload was bound to a physical server. For consumers, even private data centers have turned into magical clouds where unicorns graze and endlessly expand existing capacity in flash of rainbow. Unfortunately, down to earth administrators know that unlike the universe, data centers have a finite amount of resources available for consumption.

 

With this, maintaining a healthy data center environment while attempting to satisfy consumers is a challenge for many administrators. Let’s see how Solarwinds Virtualization Manager 6.3 helps tackle this challenge.

 

VM Sprawl Defeated

As highlighted in my previous article, even with the best intentions in the world there is still some organic sprawl that will make its way in your data center because no matter how carefully you cover your back with processes. The VM Sprawl console of VMAN 6.3 allows administrators to immediately see sprawl-related issues and address them before they start causing serious performance issues.

 

The VM sprawl dashboard covers the following sprawl issues :

  • Oversized / Undersized VMs
  • VMs with large snapshots
  • Orphaned VMDKs (leftover VMDK files not linked to any existing VM)
  • VMs suffering from high co-stops
  • Idle VMs
  • Powered Off VMs

 

While it’s good to detect sprawl issues, it’s even better to address them as soon as possible. What I find clever with VMAN 6.3 is that for all of the issues enumerated above, administrators can remediate these from within VMAN, without having to jump from the monitoring tool to their vSphere client or PowerShell. The amount of information provided in each panel is adequate and thus there are no misunderstandings about identifying the culprits and remediating the problems.

 

I like the fact that everything is presented in a single view, there is no need to run reports here and there to determine how the VMs should be right sized as well as no treasure hunting to find orphaned vmdk files.

 

 

Doing Capacity Planning

VMAN 6.3 has a dedicated Capacity Planning dashboard that will highlight current resource consumption, trends/expected depletion date for CPU, RAM and Storage as well as network I/O usage. Here again, a simple but complete view of what matters: do I still have enough capacity? Is a shortage in sight? When should I start making preparations to procure additional capacity?

 

Besides the Capacity Planning dashboard, VMAN 6.3 is equipped with a Capacity Planner function that enables administrators to simulate the outcome of a wide variety of “what-if” scenarios, with the necessary granularity. I do appreciate the ability to use three options for modeling: peak, 95th percentile and 75th percentile. Peak will take in consideration usage spikes, which can be in some cases necessary if the workloads cannot tolerate any contention/resource constraint situation. The two latter make it possible to “smoothen” the data used for modeling, by eliminating usage spikes in the calculation. While the benefit may not be immediately apparent in smaller environments, it can have a decisive financial impact on larger clusters.

 

Corollary to the Capacity Planning activities is the Showback dashboard. Provided that you have organized your resources in folders, you are able to show users what they are actually consuming. You can also run chargeback reports where you can define pricing for consumed resources. These can be helpful not only from a financial perspective but also from a political one as they help, in most mentally stable environments, to bring back a level of awareness and accountability into how resources are consumed. If a division has successfully deployed their new analytics software which ends up starving the entire environment, showback/chargeback will be decisive to explain the impact of their deployment (and obtain or coerce their eventual financial contribution to expanding capacity).

 

Going farther

Time Travel, a feature which correlates alerts with time flow, is a powerful aid in troubleshooting and performing root cause analysis. By snapshotting at regular intervals metrics from the environment, you are able to understand what events were happening at a given point in time. The sudden performance degradation of a VM becomes easier to investigate by reviewing what happened in parallel. Now you can determine whether the issue was caused by intensive I/O on a shared storage volume or if there was extremely high network traffic that caused congestion problems.

 

VMAN 6.3: The Chosen One?

VMAN 6.3 provides an end-to-end virtualization management experience that covers not only analysis, correlation and reporting but also actionable insights. It empowers administrators with the necessary tools to have a full overview of their data center health. Last but not least, the integration with Solarwinds Orion platform and other management software from Solarwinds (Network Performance Monitor, Database Performance Analyzer, etc.) provides enterprises with a true and unique single pane of glass experience (a term I use extremely rarely due to its abuse) to monitor their entire data center infrastructure.

 

So is Solarwinds the Chosen One that will bring balance to the data center? No, you – the administrator- are the Chosen One. But you will need the help of the Solarwinds Force to make the prophecy become a reality. Use it wisely.

Whether providing support to end-users or fellow employees, there are countless benefits to leveraging an IT Help Desk solution. From ticketing and service management, to IT asset management and more, these solutions have a lot to offer in the way of optimizing the efficiency of the support function of a business.

 

IT help desk admins are busy people. They are constantly juggling service requests and trouble tickets, either bouncing from workstation to workstation to provide hands-on support, or offering assistance via phone or web.

 

Though help desk solutions offer the advantage of tracking and managing all aspects involved in battling support issues, sometimes there’s nothing like a face-to-face, rather a screen-to-screen visit to improve this interaction.

 

In this blog, I’ll detail five benefits of the combined use of an IT help desk solution and remote support tools, a power punch combo that is sure to have an impact on the delivery of IT support.

  1. Simplifying IT Service Management (ITSM) – By integrating remote support tools into a help desk solution, help desk technicians can manage ticketing automation and SLAs, and seamlessly interact with end-users to resolve their issues. Establishing remote connections to the end-user’s desktop directly from within help desk tickets and chat windows provides a gateway to greater transparency. It also eliminates the need to visit users personally, simplifying the ITSM process from end-to-end.

  2. Increasing operational efficiency – As mentioned above, help desk solutions remove the need for manually managing tickets, creating performance reports, assigning tickets to technicians, etc. Using such a tool allows IT support pros to focus on the task at hand: remediating support issues. By integrating remote support capabilities, this enhances their performance, providing direct access to problematic end-user machines and IT systems, enabling them to diagnose performance problems and troubleshoot immediately from anywhere.
  3. Automating IT support – An ideal help desk system automates IT support operations, including ticketing management, IT asset discovery and management, escalations, change/approval management, and more. The addition of remote support capabilities helps to streamline IT support from ticket creation to resolution, all within a single console.
  4. Lowering time to resolution – Enabling remote support tools within a help desk solution gives IT help desk admins the ability to explore problems right from the source, rather than shuffling through screen shots or spending time deciphering written or verbal details, which are often provided by the technical layman. This empowers them to accelerate the resolution process.
  5. Improving customer satisfaction – IT issues can bog down business, causing trouble for the end-users and the company as a whole. Having these tools at their disposal allows help desk technicians to quickly diagnose and resolve issues, so end-users and fellow employees can get back to business. When coupled with the ability to track and measure the performance of help desk technicians, this strengthens a company’s ability to provide excellent support. And this, in turn, translates to greater customer satisfaction over time.

 

Interested in learning more about the benefits of incorporating remote support capabilities into your help desk solution? Watch this quick two-minute video to discover the value

SolarWinds® Web Help Desk customers receive from integrating DameWare® Remote Support into their solution.

 

Do you have other examples of how remote support has helped your help desk achieve greater levels of success? Share a comment to fill us in!

About a year ago, we published the "Monitoring 101" ebook on THWACK and the response was incredible. At the time, I said:

 

"Despite the relatively maturity of monitoring and systems management as a discrete IT discipline, I am asked - year after year and job after job - to give an overview of what monitoring is."

 

The page has been viewed over 7,000 times and the document has over 350 downloads by people who want to get up to speed on monitoring themselves, or to share with new members to their team, or to help educate management or external consumer teams about what is possible to achieve with a little bit of knowledge and the right tools. Best of all, the information is completely tool-agnostic so it "works" whether you use SolarWinds solutions or... one of those other guys.

 

The response to this material online, at trade shows, at at SolarWinds User Groups was so great that we started to present it interactively in SolarWinds Lab episodes (Lab 37, lab 41, and lab 42) and webinars. We even turned part of it into a "Dummies" series guide!!

 

We recognize that not everyone wants to spend 40 minutes watching a video or reading a 20+ page PDF, so we're offering you another option. Following the success of our "Self Paced Packet Inspection Training" email-based course, we've taken some of the essential concepts from Monitoring 101 and turned it into a FREE seven-day email course. (Join the THWACK page and register here.)

 

Lessons will explain not only how to perform various monitoring tasks, but why and when you should use them. The lessons are self-contained, meaning that there are no cliffhangers or please-sign-up-for-our-next-course-to-find-out-more ridiculousness. We have broken them into manageable chunks of information, delivery to your inbox means you don't have to remember to go to some website and open a course, and you can work on each lesson at your pace and on your schedule.

 

Like the Monitoring 101 guide, the email course provides an introduction to monitoring for someone who is familiar with computers and IT in general, but not with monitoring as a discipline. As such, (almost) no former knowledge or experience is required.

 

Having the right tool for the job is more than half the battle. But, it’s not the whole battle, and it’s not even where the skirmish started. To build an effective monitoring solution, you must first learn the underlying concepts. You have to know what monitoring is before you can set up what monitoring does.

 

Use this link to find out more and sign up!

In my last post on this topic, I described some scenarios where an outage was significantly extended primarily because although the infrastructure had been discovered and was being managed, a true understanding was still being elusive. In this post, I will meditate on what else we might need in order to find peace with our network or, as network defender so eloquently put it, Know the network. Be the network.

 

Managing Your Assets

 

Virtualization is both the best and the worst thing to have happened to our data centers. The flexibility and efficiency of virtualization has changed the way we work with our compute resources, but from a management perspective it's a bit of a nightmare. If you have been around a baby at some point in your life, you may recognize that when a baby is very small you can put it down on its back and it will lie there in the same spot; you can leave the room for five minutes, and when you come back, the baby will be exactly where you put it. At some point though, it all changes. If you put the baby down, you'd better watch the poor thing constantly because it has no interest in staying put; leave the room for five minutes and you may spend the next half an hour trying to find out exactly where it got to. And so it is - or at least it feels - with virtualization. A service that in the past would have had a fixed location on a specific server in a particular data center is now simply an itinerant workload looking for a compute resource on which to execute, and as a result it can move around the data center at the will of the server administrators, and it can even migrate between datacenters. This is the parental equivalent of leaving the room for five minutes and coming back to find that your baby is now in next door's back yard.

 

The reason we should care about this is because understanding the infrastructure properly means understanding dependencies.

 

Know Your Dependencies

 

Let's look at some virtual machines (VMs) and see where there might be dependencies:

 

dependencies.png

 

In this case, there are users on the Internet connecting to a load balancer, which will send their sessions onward to, say, VM1. In a typical environment with element management systems, we can identify if any of the systems in this diagram fail. However, what's less clear is what the impact of that failure will be. Let's say the SERVER fails; who is affected? We need to know immediately that VM1, VM2, and VM3 will be unavailable, and from an application perspective, I need to know those those virtual machines are my web farm, so web services will be unavailable. I can identify service-level problems because I know what my VMs do, and I know where they are currently active. If a VM moves, I need to know about it so that I can keep my dependencies accurate.

 

The hypervisor shown is using NFS mounts to populate what the VMs will see as attached storage. It should be clear from the diagram that if anything in the path between the hypervisor and the SAN fails, while the Hypervisor will initially be the one complaining, it won't take too long before the VMs complain as well.

 

From an alerting perspective, knowing these dependencies means that:

  • I could try to suppress or otherwise categorize alerts that are downstream from the main error (e.g. I don't want to see 100 NFS alerts per second when I already know that the SAN has failed);
  • I know what impact a particular failure will have on the services provided by the infrastructure.

 

Application Level Dependencies

 

It may also be important to dig inside the servers themselves and monitor both the server's performance as well as the applications on that server, and the server's view of the infrastructure. For example, if it is reported that MS SQL Server has a problem on a particular node, we can infer that applications dependent on that database service will also be impacted. It's possible that everything in the infrastructure is nominally ok, but there is an application or process problem on the server itself, or perhaps the VM is simply running at capacity. I will say that tools like Solarwinds' Server & Application Monitor are very helpful when it comes to getting visibility beyond the system level, and when used with knowledge of an application's service role this can make a huge difference when it comes to pre-empting problems and quickly identifying the root cause of emergent problems and using that information to ignore the downstream errors and focus on the real issue.

 

Long Distance Relationships

 

Let's take this a step further. Imagine that VM3 in the local datacenter runs a database that is used by many internal applications. In another location there's a web front end which accesses that database. An alert comes in that there is a high level of dropped packets on the SAN's connection to the network. Shortly after, there are alerts from VM3 complaining about read/write failures. It stands to reason that if there's a problem with the SAN, there will be problems with the VMs because the hypervisor uses NFS mounts for the VM hard drives.  We should therefore fully anticipate that there will be problems with the web front end even though it's located somewhere else, and when those alerts come in, we don't want to waste any time checking for errors with the web server farm or the WAN link. In fact, it might be wise to proactively alert the helpdesk about the issue so that when a call comes in, no time will be wasted trying to replicate the problem or track down the issue. Maybe we can update a status page with a notice that there is potential impact to some specific services, and thus avoid further calls and emails. Suddenly, the IT group is looking pretty good in the face of a frustrating problem.

 

Service Oriented Management

 

One of the biggest challenges with element management systems is in the name; they are managing technologies not services. Each one, reasonably enough, is focused on managing a specific technology to the best of its abilities, but without some contextual information and an understanding of dependencies, the information being gathered cannot be used to its full potential. That's not to say that element management systems have no value; far from it, and for the engineers responsible for that technology, they are invaluable. Businesses, however, don't typically care about the elements, they care about services. When there's a major outage and you call the CTO to let them know, it's one thing to tell them that SAN-17 has failed! but the first question a CTO should be asking in response is What services are impacted by this? If your answer to that kind of question would be I don't know, then no matter how many elements you monitor and how frequently you poll them, you don't fully know your infrastructure and you'll never reach a state of inner peace.

 

I'm curious to know whether Thwack users feel like they have a full grip on the services using the infrastructure and the dependencies in place, not just a view of the infrastructure itself?

 

 

In my next post, I'll be looking at more knowledge challenges: baselining the infrastructure and identifying abandoned compute resources.

It’s a problem many have faced, myself included, in this industry. You spend years honing your craft on a particular skill, narrowed focus, and always learning. Then your organization decides to do a 180, and you’re either out of a job or low person on the totem pole. Think Novell!

 

How can one dedicate oneself to the task at hand, while attempting to remain relevant to an industry constantly in flux?

 

Having had this happen to me, I vowed that I’d not allow it to happen again. Of course, that’s easier said than done.

 

I’ve not had the experience of a developer, having always been an infrastructure guy, though I have supported many. I think that it may be more difficult to maintain or build a skillset in programming, while simultaneously trying to build skills in entirely new platforms, but I do imagine that this is what it’d take to not get phased out.

 

Following are a list of things that have come naturally to me over the years, and some ways in which to accomplish them.

 

  • Be Curious

 

In my case, there is always so much going on, new startups in software tools, storage, and orchestration are being launched practically every day.  To choose some particular piece of this tech, and learn it, if only to be able to speak intelligently about it, takes a bit of effort, but can be accomplished by reading white papers, or attending webinars. Even more beneficial would be to attend trade shows, accept meeting requests from sales teams who’ve targeted you, and most importantly, talk. User Groups, Meetups, and the like have proven to me to be highly effective entrees into newer technologies. Seek out the technologies that interest you, and can be solutions that appear ideal to the problem at hand.

 

  • Pursue your passions

 

Be aware that often, your time will be limited, and thus your productivity in launching into technology that you’ve not seen previously, will likely be hampered, if not thwarted entirely, but don’t despair. If you’re truly passionate about a given piece of tech, or the solution to a given issue, that passion will drive you forward to truly learn what you need on it.

 

Look at the big picture within your organization, take a look at potential needs that are not being fulfilled, or fulfilled well. Evaluate, and pursue the viable candidate(s). Once you’ve fully researched, you can present the findings to management. By establishing your interest and willingness to go above and beyond the call of your day to day job. You’ll then get to pursue this technology and have the opportunity to keep your skills fresh.

 

  • Look at market trends, and key players

 

When evaluating the next tool (Piece of Hardware, or software) you’d like to learn, acknowledge that dying technology is probably not a great place to go. You’ll want to know the pros and cons of yesterday’s stuff, as it relates to the solution of a problem, but more relevant are the newer, more elegant solutions to a problem. For example, look how far remote access into an organization has come since the introduction of a VPN into the IT landscape. Your needs will be secure, manageable, scalable, as well as easy to maintain. Some older remote access technologies require huge amounts of maintenance, and might even have difficulty keeping up with threats. There are new ways of accomplishing this, which seem almost revolutionary in comparison. Once you’ve narrowed your choices, you’ll want to make a deeper dive into them.

 

But to be sure, if you’re evaluating a product or solution set toward a particular goal, it’ll be far more enjoyable to do this, when the solution you’re hoping to solve is compelling to you.

 

Also, nothing can make your star shine more than taking on and implementing a highly beneficial solution that the organization never even considered. That would involve selling your solution to management, negotiating within workforce, vendors, and contractors as well to make it happen. A successful rollout is highly satisfactory.

 

Always remember, complacency is the enemy.

Heading to Vegas on Sunday for VMWorld next week. I've got one session, a panel discussion, and I will also be hosting a Meet the Experts session. Oh, and I've also got an Experts and Espresso session as well as an in-booth session. So, yeah, it's a fairly busy week for me. That being said, if you are heading to VMWorld please stop by and say hello, I'd love the opportunity to connect with you while we are there. 

 

Anyway, here are the items I found most amusing from around the Internet. Enjoy!

 

Microsoft open sources PowerShell; brings it to Linux and Mac OS X

In related news, Hell has frozen over. From what I can tell, Hell must be full of Linux admins because those are the folks complaining the most about this announcement.

 

Usain Bolt and the Fastest Men in the World Since 1896 – on the Same Track

Because I love it when someone takes data and presents it in unique ways such as this post.

 

All the National Food Days

Did you know there are 214 "food" days during the year? Well, now you do. And don't forget September 5th is fast approaching this year!

 

"Daddy is Working" Light

I would get this except my kids would know on the door to ask me if I forgot to turn it of because they needed something anyway.

 

All Public Cloud roads lead to Hybrid Infrastructure

Some very interesting thoughts in this article. To me, the idea that a developer spins up a prototype in AWS isn't surprising, as I liken that to the enabling of Shadow IT. What surprises me is how most don't discuss the security and privacy concerns of developers taking such action. That's probably because if you do raise such questions you end up being labeled a "roadblock" to progress. 

 

Pentagon Has a New Data Center Consolidation Plan

If your company isn't yet in the Hybrid IT world then I just want you to know that the US Government...an entity that moves slower than molasses in January...is already there.

 

65 Business Jargon Phrases to Stop Using and What Do Use Instead

This isn't even a comprehensive list, But if I were allowed an ask it would be that we don't punch a puppy with creating our own list.

 

Remember when Apple stores sold software? Well, this was the only software I could find the other day at my local Apple store (sorry, it's just Apple now). If Steve Jobs weren't already dead, this would have killed him for certain:

a682a544-196d-4eb4-bf98-9f154f657879.jpeg

arjantim

The Hybrid Cloud

Posted by arjantim Aug 23, 2016

Keeping it easy this post. It'll be just a small recap to build some tension for the next few posts. In the last two posts, I talked a little about the private and public cloud, and it is always difficult to write everything with the right words. So, I totally agree with most of the comments made, and I wanted to make sure a couple of them were addressed in this post. Let’s start with the cloud in general:

 

ThereIsNoCloud.png

 

A lot of you said that the cloud is just a buzzword (or even just someone else’s computer).

 

I know it’s funny, and I know people are still trying to figure out what cloud is exactly, but for now we (our companies and customers) are calling it cloud. And I know we techies want to set things straight, but for now let’s all agree on calling it cloud, and just be done with it (for the sake of all people that still see the computer as a magical box with stardust as its internal parts, and unicorns blasting rainbows as their administrators.)

 

The thing is, I like the comments because I think posts should always be written as conversation starters. We are here to learn from each other, and that’s why we need these comments so badly.

 

The private cloud (or infrastructure) is a big asset for many of the companies we work for. But they pay a lot of money to just set up and maintain the environment, where the public cloud just gives them all these assets and sends a monthly bill. Less server cost, less resource cost, less everything, at least that’s what a lot of managers think. But as a couple of the comments already mentioned, what if things go south? What if the provider goes bankrupt and you can't access your data anymore?

 

In the last couple of years, we've seen more and more companies in the tech space come up with solutions, even for these kinds of troubles. With the right tools, you could make sure you’re data is accessible, even if your provider goes broke and the lights go out. Companies like Zerto, Veeam, Solarwinds, VMware and many more are handing you tools to use the clouds as you want them, while still being in control and able to see what is going on. We talked about DART and SOAR, and these are very important in this era and the future ahead. We tend to look at the marketing buzz and forget that it's their way of saying that they often don't understand half of the things we do or say, and the same goes for a lot of people outside the IT department. In the end they just want it KISS, and that where a word like "cloud" comes from. But let's go back to hybrid.

 

So what is hybrid exactly? A lot of people I talk to are always very outspoken about what they see as hybrid cloud. They see the hybrid cloud as the best of both worlds, as private and hybrid clouds combined. For me, the hybrid cloud is much more than that. For me, it can be any combination, even all public, but shared among multiple providers (multi-cloud anybody?!? ), or private and public clouds on-premises, and so on. In the end, the cloud shouldn't matter; it should just be usable.

 

For me, the hybrid solution is what everybody is looking for, the one ring to rule them all. But we need something software-defined to manage it all.

 

That's why my next post will be about the software-defined data center. It's another buzzword, I know, but let's see if we can learn a bit more from each other on where the IT world is going to, and how we can help our companies leverage the right tools to build the ultimate someone else’s computer.

 

See you in Vegas next week?!?

With the federal government’s Cloud First Policy nearly four years old, most agencies already have a clear understanding of the promised values of cloud computing.

 

That said, there is still plenty of uncertainty and concerns about moving to a cloud environment. How will you secure your data and monitor your applications? Will a cloud environment make your job obsolete? How will your agency manage the changes?

 

In reality, however, moving to the cloud can have less of an impact than one might imagine. Data will continue to be secure, applications will continue to perform and job security will not change.  You don’t have to lose control.

 

Today’s environment

 

Today, you’re encrypting your data, using performance monitoring tools, tracking resource usage and evolving requirements (memory, CPU, etc.), tracking service-level agreements (SLAs) and much more – all considered best practices.

 

The key is to understand the differences between application requirements and deployment practices.

 

Protecting data in the cloud simply entails knowing what requirements you must meet, and learning how to do that in the cloud. So the more clarity you have on how your applications work today, the easier your migration will be.

 

If you understand your application resource contentions, you will know how much memory and CPU your database has been using, but you also need a clear understanding of the source of bottlenecks. This knowledge will ensure you get the capacity you need while meeting your performance requirements.

 

Your cloud environment

 

Your cloud environment might actually look quite similar to your data center-hosted environment.

 

From a security perspective, there are many options available in the cloud. Remember, meeting strict federally mandated security requirements is a cloud provider's bread and butter. All cloud providers that are compliant with the Federal Risk and Authorization Management Program (FedRAMP) meet FISMA-moderate requirements.

 

It is likely you will end up in a hybrid environment, so you should find a set of monitoring tools that allow you to monitor applications both in the cloud and in your own data center. The key metrics you already track – application performance, memory usage, CPU utilization – should continue to be tracked in the cloud.

 

Look for tools that allow you to see both sides through a single pane of glass, providing complete visibility across the entire environment. These types of tools provide stability throughout the transition and ease migration.

 

As for job security, remember that most of the work you do today will continue. You will still be responsible for application performance optimization, for example, but the applications will simply be in a different location. You’ll be tracking performance metrics relate directly to potential cost savings for your agency. Tuning, enhancing efficiency, optimizing resources (cost) and evaluating current practices may also become a larger part of many federal IT jobs.

 

Focus on data security and optimizing performance, and continue to track resource usage, evolving requirements and SLAs. And remember, the more rigorously you monitor and manage your applications today, the easier – and more cost effective – your migration will be.

 

Find the full article on Government Computer News.

ExpertsExpresso.PNG

What is the difference in the guest OS measurement of CPU and memory in a virtual machine (VM) versus the CPU and memory utilization of the host server? Why does each matter? Each serves their own purpose. Perspective is key to properly utilize them as you optimize your virtual environment.

 

At a high level, the CPU and memory utilization reported by the host server takes into account all the VMs on that system, their scheduling, and any privileged instructions as consumption takes place against that host’s resources. CPU and memory reported from this perspective is at a system level. Optimization requires having a holistic view of the system, the VMs on it, and their applications since over-commitment of system resources can cause bottlenecks.

 

The guest OS measurement of CPU and memory is measured from the perspective of the VM with respect to those resources that have been provisioned for it. These metrics have neither awareness of the VMkernel and its scheduling nor the physical system’s overall system metrics and the system’s scheduling though it can definitely be impacted by what happens on the system. Point in case, noisy neighbor VMs. Optimization focuses on understanding the behavior of the VM and its application. For instance, the threadedness of the application can come into play. A single-threaded application does not benefit from additional vCPUs because the app won’t be able to take advantage. Plus, those unused vCPUs waste pCPU and can hurt overall system performance, which in turn, can affect that VM's performance.

 

The end-goal remains delivering application Quality-of-Service that is acceptable to end-users. Both points-of-view are important in how one designs and implements their virtualization infrastructure as well as proper resource allocation to VMs and across clusters of host systems. They both serve purposes in root-causing bottlenecks and proper remediation of those issues.

 

Share your thoughts in the comments section. Below are additional materials for reference.

 

Reference

VMware’s CPU scheduler white paper: http://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/vmware-vsphere-cpu-sched-performance-white-paper.pdf

 

VMware Knowledge Base Article: https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2032

How much storage do you have? What storage are you using? When will you need more?

 

These seem to be simple questions, and that's often how they're approached. But if you're a systems administrator "on the hook" for storage, you know they're actually devilishly complex to answer. Storage utilization metrics have been a challenge from the beginning, and it's only getting more complex in our world of storage virtualization, shared storage, and cloud storage! Let's get back to the basics and think about the root questions here and how we can solve them.

 

Storage Utilization: Why Do You Care?

 

Here's a radical statement coming from a "storage guy" like me: Don't obsess about storage capacity for its own sake, since it's basically free these days.

 

Now that I've given that a moment to sink in, let's consider the value of a byte. I recently bought 10 TB of storage at retail for under $250, including tax. That's a whole lot of storage! In fact, it's enough capacity to hold the core financial and operational data of a midsize company! 5 TB, 6 TB, 8 TB, and even 10 TB hard disk drives are readily available today, and they're available for under $1,000!

 

With prices like these, why do we even care about storage capacity anymore?

  1. Storage performance remains very expensive, even though widespread use of flash and SSD's have radically opened up what's possible in terms of performance
  2. Advanced storage features are expensive, too, and they're what we're really paying for when we buy enterprise storage
  3. IT budgets, approvals, and group ownership remain confounding factors in making efficient use of storage

 

In other words, even though storage capacity is free, everything else still costs a whole lot of money! Storage management has never really been about the bytes themselves. It's all about making data available to the business on demand, every time. And that's a much bigger issue!

 

Yet all of those things (performance, features, and bureaucracy) are linked to those core questions at the top of the page. When people ask how much storage they have, how much they're using, and how much is available what they're really asking is, "can the storage environment support my needs?" Therefore, answering capacity questions requires thought and consideration. And the answer is never a simple number!

 

How Much Storage Do You Have?

 

You have more than enough storage today. Otherwise, you'd be in "crisis mode" installing more storage, not reading blog posts! But it's painfully difficult to answer even this basic question!

 

Most businesses purchase storage on a per-project or per-department basis, and this is probably the worst possible way to do it. It encourages different groups to be misers, hiding storage from each other lest someone else take it. After all, if engineering paid for a storage array, why should sales be able to use some of it?

 

Even if people have the best intentions, "orphan storage" is common. As servers and applications come off-line, "shared" storage systems are the last to go. It's typical for companies to have many such storage arrays still soldiering on, supporting a few leftover servers in a corner somewhere. Some storage is "orphaned" even before it's ever used, having been purchased and saved for an application that never needed it.

 

A few years back, I used to do audits of enterprise storage environments, and I always started with an in-person tour of the data center. My most-common question was, "what's that storage array over there?" And the answer was always illuminating and a little embarrassing for someone. But that wasn't my goal; I just wanted to get a baseline of what was in place today.

 

What Storage Are You Using?

 

Even storage systems in active use might not meet your needs. Outdated and over-burdened arrays are just as common as orphans. And duplicate data is everywhere in a modern datacenter!

 

Companies often purchase systems that match their budgets rather than their needs, leading to odd mis-matches between application criticality and system capability. And vendor sales representatives are as much to blame, pushing inappropriate systems on companies.

 

The key question in terms of storage in use is suitability to task, not just capacity used. Are applications I/O constrained? There's plenty of storage performance to be had, thanks to SSD's and flash storage arrays! Do you need to actively share and move data? Today's arrays have fantastic snapshot, cloning, and replication features just waiting to be exploited! And there's specialized software to manage storage, too!

 

My storage audits also included a census of data in active use. I had the systems and database administrators tell me how much data was actively used in their applications. Once again, the answer was shocking to jaded managers used to talking about petabytes and usually exabytes. Indeed, most large companies had only a small amount of active data: I often recorded single-digit percentages! But this too is misleading, since modern systems need test and development, backup and archiving, and space to grow.

 

When Will You Need More Storage?

 

A critical question is, "what kind of storage will you need in the future?" Applications are going in all directions today: Some need scalable, API-driven cloud storage, others need maximum performance, while more need integration and data management features. The old standby storage arrays aren't going to cut it anymore.

 

Don't just look to expand existing infrastructure, which is likely out of date and inappropriately used anyway. Consider instead what you want to do with data. There are lots of wonderful solutions out there, from large and small companies, if you're willing to look beyond what you have.

 

Encourage your company to move to a "storage utility" model, where a general storage budget goes to IT instead of being doled out on a per-project basis. Then you can develop a multi-product storage service with SAN, NAS, Object/Cloud, and even Software-Defined Storage. And maybe you can stay ahead of the question in the future.

 

Another option is to purchase per-application but be very conservative when selecting. Try to keep variety to a minimum and don't over-purchase. Over-buying leads either to orphans or "hop-ons", and neither is good for business.

 

Regardless, make sure the storage meets the needs of the application. Isn't that better than just worrying about capacity?

 

I am Stephen Foskett and I love storage. You can find more writing like this at blog.fosketts.net, connect with me as @SFoskett on Twitter, and check out my Tech Field Day events.

thwack2.jpeg

 

How to use network configuration, change, and compliance management (NCCCM) and other monitoring software in response to an actual security breach.

 

If you have not read part one, I would suggest that you give that an overview, so you can understand fully how and why this comes into play. For those that are ready for part two, welcome back!  I'll attempt to share some assessments of an internal sabotage and how to use things like monitoring and management software to see and recovery.  The best way to respond is by thinking ahead, having clear steps to prevent, and halt further damage.

 

Today, we are going to dive into a couple of scenarios, and directly assess ways to be alerted to and address situations that may be taking place within your organization.  Now, should we all live like we have a monkey on our back/shoulder?  No, but it doesn't hurt to have a little healthy "skepticism" about unusual things that are happening around you.  Being aware of your surroundings allows you to fight back and take back control of hiccups along the way.

 

 

Internal Planning Possible Sabotage:

Things to look for visually as well as with monitoring and management software.

 

  • Unusual behavior (after a confrontation or write-up has happened) - thank you sparda963 I forgot to place when to look for this
    • This can be obviously aggressive, but the one often overlooked is "overly" nice and helpful.
      • Yes, this sound condescending and I understand that concern but think of this as out of character.  They now want to help higher levels with mission critical information or configurations.  They want to "watch" you command line interface to a device.  They are "contributing" to get to know where key points are.  These are things that are outside of their scope.
    • Aggressive well the writing is on the wall at that point and if secretive comes into play then watch out and plan accordingly.
    • Use Real-time change notifications, approval systems, and compliance to help you see changes made, and users added to devices of monitoring management software.
      • Make sure that you have a script to remove access to devices ahead of time.  One that you can fill in the blank for the user ID and take permissions away quickly.
      • Verify you have alerts set up to notify you with quick access to the devices through a management software so you can cancel access levels and revert changes quickly.
  • Logon's found in unusual servers by said person
    • Use a Log Event Monitor to help you be alerted with strange behavior to login attempts and places.
    • Know your monitoring software and have quick pages to deny access to accounts quickly
  • New Users
    • Use a Log Event Monitor to alert you to new account creations.  You need to know when these were created and had a trail on these to remove.
  • Job creation for mass configuration changes
    • Verify through an approval system all changes on your network.  An excellent way to do this is with an NCCCM product and enable the approval system to be fully active.  You will want at least a 2 level approval system to help prevent issues and possible changes.
    • Real-time change notification with segmented emails for critical devices. 
    • Backups to be quickly accessible and found in multiple locations to ensure access during a breach.

 

Internal Execution of Sabotage:

Things to do if you find yourself under attack

(Network Side)

  • First things first
    • Log Event Monitoring - should be alerting you to access violations, additions of accounts, or deleting of accounts
    • TACACS - should be enabled and in full use for auditing within your monitoring and management software choices
    • Real-Time change notifications should be sending emails immediately to the correct people with an escalation of higher up network engineers on your team.
  • Now to fight back!
    • If they are opening firewalls to gain access you need to shut these down and stop traffic immediately.  You will need to have a plan on a script for a shut all or use something like Firewall Security Manager or Network Configuration Manager to implement commands from a stored location.
      • Allows time to figure out the user and what is going on while you can have the floodgate closed.
      • Addressed in a security protocol to enable you to have this authority.  Saving you and your company a lot of money when you are trying to prevent a massive break-in.
    • If they are deleting router configs
      • Real-time change notification (RTN) alerts should be sent out to you to bring you up to speed.
        • Use a script to deny access to the user that made the change shown in the RTN email.
        • Revert configurations from within your NCCCM software and get these back online
      • Verify users that have access
        • Use a compliance report to check access levels and remove where needed.
        • CONTINUE to monitor these reports
      • Check you Approval system
        • Verify who has access
        • Change passwords to all monitoring and management software logins.
          • I have had a customer that would set these up to one password for all that he would create if in crisis.  Allowing a quick shutdown of software usage to gain control when an attack was ensuing.
    • Verify critical application status
      • Log event monitor - check logs to see if access has been happening outside of usual
      • NetPath or something similar for pathways to check accessibility or changes
      • NCCCM - Verify all changes that have occurred within the past seven days minimum as this could only be the first wave of intrusion.
      • Network performance monitor to verify any malware or trojans that could be lingering and sending data on your network.
        • Volumes filling up and being alerted to this
        • Interface utilization skyrocketing
        • NetFlow monitor showcasing high amounts of unusual traffic or NO traffic history is essential here.

 

Security gut check:

Things to go over with yourself and team to make sure your security and plans for recovery are current.

 

Pre-Assessment

  • understand and know what is critical information within your organization
  • Where are your system boundaries
  • Pinpoint your security documentation

Assessment

  • Setup a meeting with your team over the above pre-assessment
  • Review your security information
  • Practice scenarios that "could" happen within your networks
  • Setup session controls
  • Verify maintenance plans
  • Ensure mapping of your critical networking connections with critical applications
  • Ensure your policies are relevant today as they were when first created
  • Verify entry points of concerns
    • Internal/External
  • System and Network Exposures

 

Team Analysis

  • Where are your vulnerabilities?
  • What are your Countermeasures?
  • What is the impact if breached?
  • Who can segment and take on sections of security recommendations?

 

Final

  • Implement new security plans as defined and found above.
  • Set up a meeting review for at least three months later to make sure all vulnerabilities are known and addressed.
  • Verify that the plan is accessible for your team to review so they are aware of actions to take.
  • Sign an agreement within your team to follow these protocols.

 

 

Well, that is a lot to cover, whef!  Once again everyone's networks and infrastructures are different.  You and I understand that.  The main point is how to use tools to help you stay ahead and be able to fight back with minimal damage.  Having a recovery plan and consistently updating these to new vulnerabilities is vital to stay ahead.  You can shift these and use for outside attacks as well.  Security is a fluid dance and ever changing so don't be stuck sitting on the outside looking in. 

 

 

Thank you,

~Dez~

BurningDatabaseCropped.jpg

 

 

In my previous post, I wrote about becoming an Accidental DBA whether or not you had that title formally.  I described the things a Minimalist DBA should focus on before jumping into performance tuning or renaming that table with the horribly long name (RETAIL_TRANSACTION_LINE_ITEM_MODIFIER_EVENT_REASON I <3 you.)   In today's post I want to cover how you, personally, should go about prioritizing your work as a new Accidental DBA.  You

 

  Most accidental DBAS perform firefighter-like roles: find the fire, put it out, rush off to the next fire and try to fight it as well. Often without the tools and help they need to prevent fires. Firefighting jobs are tough and exhausting.  Even in IT.  But they never allocate time to prevent fires, to maintain their shiny red fire trucks, or to practice sliding down that fire pole.

 

How to Prioritize your Accidental DBA Work

 

  1. Establish a good rule of thumb on how decisions are going to be made.  On a recent project of mine, due to business priorities and the unique type of business, we settled on Customer retention, legal and application flexibility as our priorities.  Keep our customers, keep our CIO out of jail, and keep in business. Those may sound very generic, but I've worked in businesses where customer retention was not a number one priority. In this business, which was membership and subscription based, we could not afford to lose customers over system issues.  Legal was there to keep our CIO and CEO out of jail (that's what ROI stands for: Risk of Incarceration).  Application flexibility was third because the whole reason for the project was to enable business innovation to save the company.

    Once you have these business priorities, you can make technical and architectural decisions in that context.  Customer retention sounds like a customer service issue, but it's a technical one as well.  If the system is down, customers can not be customers.  If their data is wrong, they can't be customers.  If their data is lost, they can't be customers. And so on.  Every decision we made first reflected back to those priorities.

  2. Prioritize the databases and systems.  Sure, all systems are important.  But they have a priority based on business needs. Your core selling systems, whatever they might be, are usually very high priority.  As are things like payroll and accounting.  But maybe that system that keeps track of whether employees want to receive a free processed meat ham or a chunk of processed cheese over the holidays isn't that high on the list.  This list should already exist,  at least in someone's head.   There might even be an auditor's report that says if System ABC security and reliability issues aren't fixed, someone is going to go to jail.  So I've heard.  And experienced. 

  3. Automate away the pain…and the stupid.  The best way to help honor those priorities is to automate all the things. In most cases, when an organization doesn't have experienced or dedicated DBAs, their data processes are mostly manual, mostly reactive, and mostly painful.  This is the effect of not having enough knowledge or time to develop, test, and deploy enterprise-class tools and scripts.  I understand that this is the most difficult set of tasks to put at a higher priority if all the databases are burning down around you. Yes, you must fight the fires, but you you must put a priority on fire reductions.  Otherwise you'll just be fighting bigger and more painful fires.

    Recovery is the most important way we fight data fires.  No amount of performance tuning, index optimization, or wizard running will bring back lost data.  If backups are manual, or automated and never tested, or restores are only manual, you have a fire waiting to happen. Head Geek Tom LaRock sqlrockstar says that "recovery is Job #1 for a DBA".  It certainly is important. A great DBA automates all backups and recovery. If you are recovering manually, you are doing it wrong.

      Other places where you want automation is in monitoring and alerting.  You want to know something is going on even before someone smells smoke, not when users are telling you the database is missing.  If your hard drive is running out of space, it is generally much faster to provision more resources or free up space than it is to recover a completely down system.  Eventually you'll want to get to the point where many of these issues are taken care of automatically.  In fact, that's why they invented cloud computing.

Get Going, the Alarm Bell is Ringing!

 

Become the Best DBA: A Lazy DBA. Lazy DBAs automate the stuff out of everything.  And lazy DBAs know that automating keeps databases from burning. They automate away the dumb mistakes that happen when the system is down, they automate test restores,  they automate away the pain of not knowing they missed setting a parameter when they hit ENTER.  They know how to get to the fire, they know what to do and they fix it.


The Best DBAs know when database are getting into trouble
, long before they start burning down.


The Best DBAs don't panic
.  They have a plan, they have tools, they have scripts.  When smoke starts coming out of the database, they are there.  Ready to fight that fire.  They are ready because they've written stuff down. They've trained.  They've practiced.  How many clicks would it take you to restore 10 databases?  Would you need to hit up Boogle first to find out how to do a point-in-time restore? Do you know the syntax, the order in which systems have to be restored? Who are the other people you have to work with to fix this fire?

 

As a new DBA, you should be working on automation every day, until all that work frees up so much of your time you can work on performance tuning, proper database design, and keeping your database fire truck shiny.

I'm in Austin this week to film more SolarWinds Lab episodes and conduct general mischief. Lucky for me Delta had a MUCH better Monday this week than last week.

 

Anyway, here are the items I found most amusing from around the Internet. Enjoy!

 

Bungling Microsoft singlehandedly proves that golden backdoor keys are a terrible idea

If you have ever wanted to run Linux on your Surface, now is your chance! I'm look at you adatole.

 

Millions of Cars Vulnerable to Remote Unlocking Hack

This is why I drive an 2008 Jeep. It's like the Battlestar Galactica: too old to be hacked by new technology.

 

Scammers sneak into customer support conversations on Twitter

I've noticed a handful of fake support accounts popping up lately, so this is a good reminder to be careful out there.

 

Best Fighter Jet In History Grounded By Bees

No, seriously. And I dig the part where they decided not to just kill the bees, but to relocate them. Nice touch.

 

Walmart and the Multichannel Trap

Wonderful analysis of where Walmart is headed, and why Amazon is likely to run all retail in the future. After living through what Walmart did to Vlasic 20 years ago, this seems fitting to me.

 

Networking Needs Information, Not Data

Nice post that echos what I've been saying to data professionals for a few years now. There is a dearth of data analysts in the world right now. You could learn enough about data analytics in a weekend that will impact your career for the next 20 years, but only if you get started.

 

Even Michael Phelps knows it's football season, right kong.yang ?

IMG_4244.PNG

data_security.jpgData is an incredibly important asset. In fact, data is the MOST important asset for any company, anywhere. Unfortunately, many continue to treat data as an easily replaced commodity.

 

But we’re not talking about a database administrator’s (DBA) iTunes library. We’re talking highly sensitive and important data that can be lost or compromised.

 

It’s time to stop treating data as a commodity. We need to create a secure and reliable data recovery plan. And we can get that done by following a few core strategies.

 

Here are the six easy steps you can take to prevent data loss.

 

Build a Recovery Plan

Novice DBAs think about backups as the starting point for data loss. It is the experienced senior DBAs that know the starting point is building the recovery plan.

 

The first thing to do here is to establish a Recovery Point Objective (RPO) that determines how much data loss is acceptable. Understanding acceptable risk levels can help establish a baseline understanding of where DBAs should focus their recovery efforts. Then, work on a Recovery Time Objective (RTO) that shows how long the business can afford to be without its data. Is a two-day restore period acceptable, or does it have to be 15 minutes?

 

Finally, remember that “high availability” and “disaster recovery” are different. A DBA managing three nodes with data flowing between each may assume that if something happens to one node the other two will still be available. But an error in one node will undoubtedly get replicated across all of them. You better have a recovery plan in place when this happens.

 

If not, then you should consider having an updated resume.

 

Understand That Snapshots != Database Backups

There’s a surprising amount of confusion about the differences between database backups, server tape backups, and snapshots. Many administrators have a misperception that a storage area network (SAN) snapshot is good enough as a database backup, but that snapshot is only a set of data reference markers. The same issue exists with VM snapshots as well. Remember that a true backup is one that allows you to recover your data to a transactionally consistent view at a specific point in time.

 

Also consider the backup rule of three, where you save three copies of everything, in two different formats, and with one off-site backup. Does this contain hints of paranoia? Perhaps. But it also perfectly illustrates what constitutes a backup, and how it should be done.

 

Make Sure the Backups Are Working

There is only one way to know if your backups are working properly, and that is to try doing a restore. This will provide assurance that backups are running -- not failing -- and highly available. This also gives you a way to verify if your recovery plan is working and meeting your RPO and RTO objectives.

 

Use Encryption

Data-at-rest on the server should always be encrypted, and there should also be backup encryption for the database as well as the database backups. There are a couple of options for this. DBAs can either encrypt the database backup file itself, or encrypt the entire database. That way, if someone takes a backup, they won’t be able to access the information without a key.

 

DBAs must also ensure that if a backup device is lost or stolen, the data stored on the device remains inaccessible to users without proper keys. Bio-level encryption tools like BitLocker can be useful in this capacity.

 

Monitor and Collect Data

Real-time data collection and real-time monitoring should also be used to help protect data. Combined with network monitoring and other analysis software, data collection and monitoring will improve performance, reduce outages, and maintain network and data availability.

 

Collection of data in real-time allows administrators to perform proper data analysis and forensics, making it easier to track down the cause of an intrusion, which can also be detected through monitoring. Together with log and event management, DBAs have the visibility to identify potential threats through unusual queries or suspected anomalies. They can then compare the queries to their historical information to gauge whether or not the requests represent potential intrusions.

 

Test, Test, Test

This is assuming a DBA has already tested backups, but let’s make it a little more interesting. Let’s say a DBA is managing an environment with 3,000 databases. It’s impossible to restore them every night; there’s simply not enough space or time.

 

In this case, DBAs should take a random sampling of their databases to test. Shoot for a sample size representing at least 95 percent of the 3,000 databases in deployment, while leaving a small margin of error (much like a political poll). From this information DBAs can gain confidence that they will be able to recover any database they administer, even if that database is in a large pool. If you’re interested in learning more, check out this post, which gets into further detail on database sampling.

 

Summary

Data is your most precious asset. Don’t treat it like it’s anything but that. Make sure no one is leaving server tapes lying around cubicles, practice the backup rule of three, and, above all, develop a sound data recovery plan.

The term "double bind" refers to an instance where a person receives two or more conflicting messages, each of them negating the other. Addressing one message creates a failure in the other, and vice-versa. A double bind is an unsolvable puzzle resulting in a no-win situation.

 

As a federal IT professional, you’ve probably come across a double bind or two in your career, especially in regard to your network and applications. The two depend on each other but, often, when something fails, it’s hard to identify which one is at fault.

 

Unlike a true double bind, though, this puzzle actually has a two-step solution:

 

Step 1: Check out your network

 

Throughout history, whenever things slow down, the first reaction of IT pros and end-users alike has been to blame the network. So let’s start there – even though any problems you might be experiencing may not be the poor network’s fault.

 

You need to monitor the overall performance of your network. Employ application-aware monitoring and deep-packet inspection to identify mission-critical applications that might be creating network issues. This can help you figure out if the issue is a network or application problem. If it’s a network problem, you’ll be able to quickly identify, and resolve it.

 

What if it’s not the network? That’s where step two comes in.

 

Step 2: Monitor your application stack

 

Federal agencies have become reliant upon hundreds of applications. Each of these applications is responsible for different functions, but they also work together to form a central nervous system that, collectively, keeps things running. Coupled with a backend infrastructure, this application stack forms a critical yet complex system in which it can be difficult to identify a malfunctioning application.

 

Solving this challenge requires cultivating an application-centric view of your entire application stack, which includes not just the applications themselves, but the all the components that help them operate efficiently, including the systems, storage, hypervisors and databases that make up the infrastructure.

 

Consolidating management of your infrastructure internally and maintaining control of your application stack can help. Maintaining an internal level of involvement and oversight, even over cloud-based resources, is important and this approach gives you the control you need to more easily pinpoint and quickly address problems.

 

Of course, the best way to remediate issues is to never have them at all, but that’s not entirely possible. What you can do is mitigate the chances for problems by weighing performance against financial considerations before making changes to your network or applications.

 

For example, for many Defense Department agencies, the move to the cloud model is driven primarily by the desire for cost savings. While that’s certainly a benefit, you cannot discount the importance of performance when it comes to compute, storage and networking technologies, which are just as important.

 

These early considerations – combined with a commitment to network monitoring and a complete application stack view – can save you tons of money, time and trouble. Not to mention keeping you out of some serious binds.

 

Find the full article on Defense Systems.

Capacity Planning 101

The objective of Capacity Planning is to adequately anticipate current and future capacity demand (resource consumption requirements) for a given environment. This helps to accurately evaluate demand growth, identify growth drivers and proactively trigger any procurement activities (purchase, extension, upgrade etc.).

 

Capacity planning is based primarily on two items. The first one is analyzing historical data to obtain organic consumption and growth trends. The second one is predicting the future by analyzing the pipeline of upcoming projects, taking also in consideration migrations and hardware refreshes. IT and Business must work hand-in-hand to ensure that any upcoming projects are well-known in advance.

 

The Challenges with Capacity Planning or “the way we’ve always done it”

 

Manual capacity planning by running scripts here and there, exporting data, compiling data and leveraging Excel formulas can work. However, there are limits of one’s time availability, and at the expense of not focusing into higher priority issues.

 

The time spent on manually parsing data, reconciling and reviewing can be nothing short of a huge challenge, if not a waste of time. The larger an environment grows, the larger the dataset will be, the longer it will take to prepare capacity reports. And the more manual the work is, the more it is prone to human errors.  While it’s safe to assume that any person with Excel skills and a decent set of instruction can generate capacity reports, the question remains about their accuracy. It’s also important to point out that new challenges have emerged for those who like manual work.

 

Space saving technologies like deduplication and compression have complicated things. What used to be a fairly simple calculation of linear growth based on growth trends and YoY estimates is now complicated by non-linear aspects such as compression and dedupe savings. Since both compression and deduplication ratios are dictated by the type of data as well as the specifics of the technology (see in-line vs. at-rest deduplication, as well as block size), it becomes extremely complicated to factor this into a manual calculation process. Of course, you could “guesstimate” compression and/or deduplication factors for each of your servers. But the expected savings can also fail to materialize for a variety of reasons.

 

Typical mistakes in capacity management and capacity planning involve space reclamation activities at the storage array level. Rather, the lack of  awareness and  activities on the matter. Monitoring storage consumption at the array level without relating with the way storage has been provisioned at the hypervisor level may result in discrepancies. For example, not running Thin Provisioning Block Space Reclamation (through the VMware VAAI UNMAP primitive) on VMware environments may lead some individuals to believe that a storage array is reaching critical capacity levels while in fact a large portion of the allocated blocks is no longer active and can be reclaimed.

 

Finally, in manual capacity planning, any attempt to run “What-If” scenarios (adding n number of VMs with a given usage profile for a new project) are wild guesses at best. Even while having the best intentions and focus, you are likely to end up either with an under-provisioned environment and resource pressure, or with an over-provisioned environment with idle resources. While the latter is preferable, this is still a waste of money that might’ve been invested anywhere else.

 

Capacity Planning – Doing It Right

 

As we’ve seen above, the following factors can cause incorrect capacity planning:

  • Multiple sources of data collected in different ways
  • Extremely large datasets to be processed/aggregated manually
  • Manual, simplistic data analysis
  • Key technological improvements not taken into account
  • No simple way to determine effects of a new project into infrastructure expansion plans

 

Additionally, all of the factors above are also prone to human errors.

 

Because the task of processing data manually is nearly impossible and also highly inefficient, precious allies such as Solarwinds Virtualization Manager are required to identify real-time issues, bottlenecks, potential noisy neighbors as well as wasted resources. Once these wasted resources are reclaimed, capacity planning can provide a better evaluation of the actual estimated growth in your environment.

 

Capacity planning activities are not just about looking into the future, but also about managing the environment as it is now. The link between Capacity Planning and Capacity Reclamation activities is crucial. Just as you want to keep your house tidy before planning an extension or improving it with new furniture, the same needs to be done with your virtual infrastructure.

 

Proper capacity planning should factor in the following items:

  • Central, authoritative data source (all the data is collected by a single platform)
  • Automated data aggregation and processing through software engine
  • Advanced data analysis based on historical trends and usage patterns
  • What-If scenarios engine for proper measurement of upcoming projects
  • Capacity reclamation capabilities (Managing VM sprawl)

 

Conclusion

 

Enterprises must consider whether capacity planning done “the way we’ve always done it” is adding any value to their business or rather being the Achilles heel of their IT strategy. Because of its criticality, capacity planning should not be considered as a recurring manual data collection/aggregation chore that is assigned to “people who know Excel”. Instead, it should be run as a central, authoritative function that measures current usage, informs about potential issues and provides key insights to plan future investments in time.

Having grown since college in my career, I have learned many lessons. I started out in computers back in 1986, at the precipice of the “clone” era. I cut my teeth and my hands by installing chips on memory cards and motherboards, replacing 5.25” floppy discs, and dealing with the jumble of cables on Dual 5MB Winchester hard drives that cost and weighed an arm and a leg.

 

I moved on to networking with Novell and Banyan Vines, then Microsoft, onward into Citrix Winframe/Metaframe, and moved into VMware and virtualization adding storage technologies and ultimately cloud into that paradigm. I’m always tinkering, learning, and growing in my abilities. One of my key goals is to never stop learning.

 

Along the way, I’ve had the opportunity to undertake amazing projects, meet wonderful people, and be influenced by some of the most phenomenal people in the industry. It’s been almost a master’s degree in IT. I’ve actually had the opportunity to teach a couple classes at DePaul University in Business Continuity to the Master’s program in IS. What a great experience that was!

 

I’ve been contemplating my friends on The Geek Whisperers, (@Geek_Whisperers) as I listen to their podcast regularly. One of the things that they do regularly is ask their interviewee in some manner, what advice they’d offer someone just coming up in the industry, or at other times, the question is, what lesson have you taken most to heart or what mistake have you made that you would caution against to anyone who might be willing to follow your advice. This question, phrased in whatever manner, is a wonderful exercise in introspection.

 

There are many mistakes I’ve made in my career. Most, if not all, have proven to be learning experiences. I find this to be probably the single most important lesson anyone can learn. Mistakes are inevitable. Admit to them, be honest and humble about them, and most of all, learn from them. Nobody expects you not to make mistakes. My old boss used to say, “If you’re going to screw-up, do it BIG!” By this, he meant that you should push your boundaries, try to do things of significance, and outside the box, and above all, make positive change. I would say that this is some of the best advice I could give as well.

 

Humility is huge. The fine line between knowing the answer to a question, and acting as a jerk to prove it is what I consider to be the “Credibility Line.” I’ve been in meetings wherein a participant has given simply wrong information, and has fought to stay with that point to the point of belligerence. These are people with whom you’d much rather not fight. But, if the point itself is critical, there are ways in which you can prove that you’re in possession of the right information without slamming that other person. I once took this approach: A salesperson stated that a particular replication technology was Active/Active. I knew that it was Active/Passive. When he stated his point, I simply said, OK, let’s white-board the solution as it stands in Active/Active, acting as if I believed him to be right. With the white-boarding of the solution I proved that he’d been wrong, we all apologized and moved on. Nobody was made to feel less-than, all saved face, and we all moved forward.

 

This is not to say that it was the best approach, but simply one that had value in this particular situation. I felt that I’d handled it with some adept facility, and diffused what could have been a very distracting argument.

 

One thing I always try to keep in mind is that we all have our own agendas. While I may have no difficulty acknowledging when and where I’m wrong, others may find that to be highly discouraging. If you were to choose to make someone look foolish, you’d look just as foolish or worse. Often the most difficult thing to do is to size up the person with whom you’re speaking, in an effort to determine their personality type and motivations. Playing things safe, and allowing them to show their cards, their personality rather than making assumptions is always a good policy. This goes with anyone with whom you may be speaking, from coworkers, to customers, from superiors to peers. Assumption can lead to very bad things. Sometimes the best way to answer a question is with another question.

 

To summarize, my advice is this:

  • Push yourself.
  • Be humble.
  • Exploit your strengths.
  • Listen before answering.
  • Try not to assume, rather clarify.
  • Defer to others (your customers and your peers) as appropriately as possible.
  • Learn from your mistakes.
  • Above all, be kind.

 

One last piece of advice, leverage social media. Blog, Tweet, do these things as well and artfully as you can. Follow the same rules that I’ve stated above, as your public profile is how people will know you, and remember: once on the web, always on the web.

1608_GeekSpeak_RequiredReading_1200x628_2.png

Back in April, we helped you get ready for summer by offering some suggestions for summer reading. It ended up being one of the most read posts of the year, with nearly 2,300 views and over two dozen comments.

 

We already knew that THWACK was a community of avid readers, but the level of interest and nature of the comments showed us that you had a real thirst for Geek-recommended and Geek-approved sources of information.

 

So now, with the end of summer in sight, we thought we would create a companion to the "SolarWinds Summer Fun Reading list" to help you get back in the work mindset.

 

We wanted to collect a required reading list for the IT professional and budding monitoring engineer, so our choices reflect books that have stood the test of time in terms of skills, philosophy, and ideas that have deep relevance in the world of IT.

 

These picks are designed to help you get up to speed with some of the foundational concepts and history of IT generally, and monitoring specifically, including processes, technology, tips, tricks, and more. These are the things we at SolarWinds believe any IT pro worth their salt should know about or know how to do.

 

  • The Mythical Man-Month: Essays on Software Engineering, by Frederick P. Brooks Jr.
  • The Practice of System and Network Administration; The Practice of Cloud System Administration, by Thomas Limoncelli
  • The Psychology of Computer Programming; An Introduction to General Systems Thinking, by Gerald M. Weinberg
  • Accidental Empires: How the Boys of Silicon Valley Make Their Millions, Battle Foreign Competition, and Still Can't Get a Date, by Robert X. Cringely
  • Linux for Dummies, by Emmett Dulaney
  • Design Patterns: Elements of Reusable Object-Oriented Software, by Gamma, Helm, Johnson, and Vlissides
  • Network Warrior, by Gary A. Donahue
  • The Phoenix Project: A Novel About IT, DevOps, and Helping Your Business Win, by Gene Kim and Kevin Behr
  • Liars and Outliers: Enabling the Trust that Society Needs to Thrive, by Bruce Schneier
  • The Clean Coder: A Code of Conduct for Professional Programmers, by Robert Martin
  • Commodore: A Company on the Edge, by Brian Bagnall
  • The Inmates Are Running the Asylum: Why High Tech Products Drive Us Crazy and How to Restore the Sanity, by Alan Cooper
  • In Search of Stupidity: Over Twenty Years of High Tech Marketing Disasters, by Merrill R. (Rick) Chapman
  • Bricklin on Technology, by Dan Bricklin

 

Meanwhile, we would be remiss if we didn’t mention the SolarWinds-specific titles that are out in the world:

 

As your summer tan fades, we hope this list will help you feel like you are ready to brush the sand from your feet, trade your shorts for a trusty pair of cargo pants, and return to the data center with new skills and renewed passion!

I'm back from the SQL Saturday Baton Rouge event. I've been doing SQL Saturday events for over six years now, and I still enjoy them as if it were my first time. It takes a special kind of geek to be willing to give up a weekend day for extra learning but those of us in the SQL community know the value these events have. The ability to mingle with a few hundred similar minded folks all looking to connect, share, and learn is worth every minute, even on a weekend.

 

Anyway, here are the items I found most amusing from around the Internet. Enjoy!

 

Frequent Password Changes Is a Bad Security Idea

I never thought about passwords in this way, but now that I have I can't help but think how much better we will be when the machines take over and everything is a retina scan.

 

The Mechanics of Sprinting: Play a minigame to test your reaction time

Olympic sprinters average just under 200 milliseconds reaction time. Time to see how you rate. There are tests for rowing, long jump, with others coming soon and I can't stop trying to set a new personal best and neither will you.

 

Facebook Wants To Clean Up The Clickbait

And they are going to do it using this one weird trick.

 

Apple is launching an invite-only bug bounty program

Nice to see Apple finally taking security seriously. But, true to form, even this program is being done with a touch of exclusivity, as opposed to other programs that have been in existence for years.

 

Only 9% of America Chose Trump and Clinton as the Nominees

Because I love it when people can tell a story with data visualizations, here's one that I think everyone will find interesting.

 

Delta’s Tech Meltdown Causes Hundreds of Flight Delays, Cancellations

I fly Delta frequently, so the outage they experienced hit home for me not just because I'm a customer but because I'm a data professional that knows how hardware can fail when there is a power outage, or power surge. This is also a good lesson in BCP planning, as well as a reminder that HA != DR.

 

This candle is only available in stores you've never heard about:

 

a - 1 (6).jpg

IMG_8315.JPGThis past Monday morning Delta suffered a disruption to their ticketing systems. While the exact root cause has yet to be announced I did find mention here that the issue was related to a switchgear, a piece of equipment that allows for power failover. It's not clear to me right now if Delta or Georgia Power is responsible for maintaining the switchgear, but something tells me that right now a DBA is being blamed for it anyway.

 

The lack of facts hasn't stopped the armchair architects taking to the internet the past 24 hours in an effort to point out all the ways that Delta failed. I wanted to wait until facts came out about the incident before offering my opinion, but that's not how the internet works.

 

So, here's my take on where we stand right now, with little to no facts at my disposal.

 

HA != DR

I've had to correct more than one manager in my career that there is a big difference between high availability (HA) and disaster recovery (DR). Critics yesterday mentioned that Delta should have had geo-redundancy in place to avoid this outage. But without facts it's hard to say that such redundancy would have solved the issue. Once I heard about it being power related, I thought about power surges, hardware failures, and data corruption. You know what happens to highly available data that is corrupted? It becomes corrupted data everywhere, that's what. That's why we have DR planning, for those cases when you need to restore your data to the last known good point in time.

 

This Was a BCP Exercise

Delta was back online about six hours after the outage was first reported. Notice I didn't say they were "back to normal". With airlines it takes days to get everything and everyone back on schedule. But the systems were back online, in no short part to some heroic efforts on the part of the IT staff at Delta. This was not about HA, or DR, no, this was about business continuity. At some point a decision was made on how best to move forward, on how to keep the business moving despite suffering a freak power outage event involving a highly specialized piece of equipment (the switchgear). From what I can tell, without facts, it would seem the BCP planning at Delta worked rather well, especially when you consider that Southwest recently had to wait 12 hours to reboot their systems due to a bad router.

 

Too Big To Failover

Most recovery sites are not built to handle all of the regular workload, they are designed to handle just the minimum necessary for business to continue. Even if failover was an option many times the issue isn't with the failover (that's the easy part), the issue is with the fallback to the original primary systems. The amount of data involved may be so cumbersome that a six hour outage is preferable to the 2-3 days it might take to fail back. It is quite possible this outage was so severe that Delta was at a point where they were too big to failover. And while it is easy to just point to the Cloud and yell "geo-redundancy" at the top of your lungs the reality is that such a design costs money. Real money.

 

Business Decisions

If you are reading this and thinking "Delta should have foreseen everything you mentioned above and built what was needed to avoid this outage" then you are probably someone that has never sat done with the business side and worked through a budget. I have no doubt that Delta has the technical aptitude to architect a 21st century design but the reality of legacy systems, volumes of data, and near real-time response rates on a global scale puts that prices tag into the hundreds of millions of dollars. While that may be chump change to a high-roller such as yourself, for a company (and industry) that has thin margins the idea of spending that much money is not appealing. That's why things get done in stages, a little bit at a time. I bet the costs for this outage, estimated in tens of millions of dollars, are still less than the costs for the infrastructure upgrades needed to have all of their data systems rebuilt.

 

Stay Calm and Be Nice

If you've ever seen the Oscar-snubbed classic movie "Roadhouse", you know the phrase "be nice". I have read a lot of coverage of the outage since yesterday and one thing that has stood out to me is how professional the entire company has been throughout the ordeal. The CEO even made this video in an effort to help people understand that they are doing everything they can to get things back to normal. And, HE WASN'T EVEN DONE YET, as he followed up with THIS VIDEO. How many other CEOs put their face on an outage like this? Not many. With all the pressure on everyone at Delta, this attitude of staying calm and being nice is something that resonates with me.

 

The bottom line here, for me, is that everything I read about this makes me think Delta is far superior to their peers when it comes to business continuity, disaster recovery, and media relations.

 

Like everyone else, I am eager to get some facts about what happened to cause the outage, and would love the read the post-mortem on this event if it ever becomes available. I think the lessons that Delta learned this week would benefit everyone that has had to spend a night in a data center keeping their systems up and running.

arjantim

The Public Cloud

Posted by arjantim Aug 9, 2016

A couple of years ago nobody really thought of Public cloud (although that might be different in the US), but things change, quickly. Since the AWS invasion of the public clo space we’ve seen a lot competitors try to win their share in this lucrative market. Lucrative is a well chosen word here as most of the businesses getting into this market take a big leap of faith, as most of them have to take their losses for the first couple of years. But why should Public Cloud be of any interest to you, and what are the things you need to think about? Let’s take a plane and fly to see what the public cloud has to offfer, and if it will take over the complete datacenter or just parts of it?

 

Most companies have only one purpose and that is to make more money then they spend… And where prices are under pressure there is really nly one thing to do, cut the cost. A lot of companies see the public cloud as cutting cost, as you’re only paying for the resources you use, and not for all the other stuff that is alsoo needed to run your own “private cloud”. And because of this they think the cost of public cloud is cheaper than building their datacenters every 5 years or so.

 

To be honest, in a lot of ways the companies are right. Cutting cost by moving certain workloads to the public cloud will certainly help to cut cost, but it might also be a great test/dev environment. The thing is you need to determine the best public cloud strategy per company, and it might even be needed to do it per department (in particular cases). But saying everything will be in the public cloud is a bridge to far for many companies…. At the moment.

 

A lot of companies are already doing loads of workloads in the public cloud, without even really understanding it. Microsoft Office 365 (and in particular outlook) is one the examples where a lot of companies use public cloud, sometimes even without really looking into the details and if it is allowed by law. Yes, that’s right going public you need to think of even what can and what can’t be put in the cloud. Some companies are prohibited to by national law to put certain parts of their data in a public cloud, so make sure to look for everything before telling your company or customer to go public.

 

Most companies choose a gentle path towards public cloud, and choose the right workloads to go public. This is the right way to do if you’re an established company with your own way, but than again you need to not only think of your own, but also about the law that your company needs to follow.

 

In my last post on Private Cloud I mentioned the DART framework, as I think it is an important tool to go cloud (private at first, but Public also). In this post on Public Cloud I want to go for the SOAR framework.

 

Security - In a Public Cloud environment it really important to Secure your data. IT should make sure the Public part(s) as well as the Private part(s) are well secured and all data is save. Governance, compliancy and more should be well thought of, and re-thought of every step of the way.

 

Optimization - the IT infrastructure is a key component in a fast changing world. As I already mentioned a lot of companies are looking to do more for less to get more profit. IT should be an enabler for the business, not some sort of firefighters.

 

Automation - is the key to faster deployments. It’s the foundation for continuous delivery and other DevOps practices. Automation enforces consistency across your development, testing and production environments, and ensures you can quickly orchestrate changes throughout your infrastructure: bare metal servers, virtual machines, cloud and container deployments. In the end automation is a key component for optimization

 

Reporting - is a misunderstood IT trade. Again it is tidely connected with Optimization but also automation. For me reporting is only possible with the right monitoring tools. If you want to be able to do the right reporting you need to have a “big brother” in your environment. Getting the rigt reports from public and private is important, and with those reports the company can further finetune the environment.

 

There is so much more to say, but I leave it with this for now. I really look forward on the comments, and I know there is no “right” explanation for private, public, or hybrid cloud but I think we need to help our companies to understand the strenght of cloud. Help them sort out what kind to use and how. We’re here to help them use IT as IT is meant to be, regardless of the name we give it. See you next time, and in the comments!

The Internet of Things (IoT) offers the promise of a more connected and efficient military, but Defense Department IT professionals are having a hard time turning that promise into reality. They’re deterred by the increasing demands and security vulnerabilities of more connected devices.

 

That hasn’t stopped defense agencies from exploring and investing in mobility and next-generation technology, including IoT devices. One of the points in the Defense Information Systems Agency’s 2015 – 2020 Strategic Plan specifically calls out the agency’s desire to “enable warfighter capabilities from a sovereign cyberspace domain, focused on speed, agility, and access.” The plan also notes “mobile devices…continue to transform our operational landscape and enable greater mission effectiveness through improved communication, access, information sharing, data analytics – resulting in more rapid response times.”

 

It’s a good thing the groundwork for IoT was laid a few years ago, when administrators were working on plans to fortify their networks against an onslaught of mobile devices. Perhaps unbeknownst to them, they had already begun implementing and solidifying strategies that can now serve as a good foundation for managing IoT’s unique set of challenges.

 

Tiny devices, big problems

 

The biggest challenge is the sheer number of devices that need to be considered. It’s not just a few smart phones; with IoT, there is literally an explosion of potentially thousands of tiny devices with different operating systems, all pumping vast amounts of data through already overloaded networks.

Many of these technological wonders were developed primarily for convenience, with security as an afterthought. There’s also the not insignificant matter of managing bandwidth and latency issues that the plethora of IoT devices will no doubt introduce.

 

Making the IoT dream an automated reality

 

These issues can be addressed through strategies revolving around monitoring user devices, managing logs and events, and using encrypted channels – the things that administrators hopefully began implementing in earnest when the first iPhones began hitting their networks.

 

Administrators will need to accelerate their device tracking efforts to new levels. Device tracking will help identify users and devices and create watch lists, and the challenge will be the number of new devices. And while log and event management software will still provide valuable data about potential attacks, the attack surface and potential vulnerabilities will increase exponentially with the introduction of a greater number of devices and network access points.

 

More than ever, managers will want to complement these efforts with network automation solutions, which can correct issues as they arise. This creates a much more streamlined atmosphere for administrators to manage, making it easier for them to get a handle on everything that touches the network.

 

A reluctance to automate will not work in a world where everything, from the tablets at central command to the uniforms on soldiers’ bodies, will someday soon be connected. It’s now time for federal IT administrators to build off their BYOD strategies to help the Defense Department realize DISA’s desire for a highly connected and mobilized military.

 

  Find the full article on Defense Systems.

It seems like you can't talk to anyone in IT without hearing about "software-defined" something these days. Ever since Software-Defined Networking (SDN) burst on the scene, it's the hot trend. My world of storage is just as bad: It seems as if every vendor is claiming to sell "Software-Defined Storage" without much clarity about what exactly it is. Is SDS just the latest cloudy buzzword or does it have a real meaning?

 

Wikipedia, that inerrant font of all human knowledge, defines Software-Defined Storage (SDS) to be "policy-based provisioning and management of data storage independent of the underlying hardware." It goes on to talk about abstraction, automation, and commodity hardware. I can get behind that definition. Wikipedia also contrasts SDS with mere "Software-Based Storage", which pretty much encompasses "all storage" these days!

 

I've fought quite a few battles about what is (and isn't) "Software-Defined Storage", and I've listened to more than enough marketers twisting the term, so I think I can make some informed statements.

 

#1: Software-Defined Storage Isn't Just Software-Based Storage

 

Lots of marketers are slapping the "software-defined" name on everything they sell. I recently talked to one who, quite earnestly, insisted that five totally different products were all "SDS", including a volume manager, a scale-out NAS, and an API-driven cloud storage solution! Just about the only thing all these products have in common is that they all contain software. Clearly, "software" isn't sufficient for "SDS".

 

It would be pointless to try to imagine a storage system, software-denied or otherwise, that doesn't rely on software for the majority of its functionality.

 

#2: Commodity Hardware Isn't Sufficient For SDS Either

 

Truthfully, all storage these days is primarily software. Even the big fancy arrays from big-name vendors are based on the same x86 PC hardware as my home lab. That's the rise of commodity hardware for you. And it's not just x86: SAS and SATA, PCI Express, Ethernet, and just about every other technology used in storage arrays is common to PC's and servers too.

 

Commodity hardware is a great way to improve the economics of storage, but software running on x86 isn't sufficiently differentiated to be called "SDS" either.

 

#3: Data Plane + Control Plane = Integration and Automation

 

In the world of software-defined networking, you'll hear a lot about "separating the data plane from the control plane." We can't make the exact same analogy for storage since it's a fundamentally different technology, but there's an important conteptual seed here: SDN is about programmability and centralized control, and this architecture allows such a change. Software-defined storage should similarly allow centralization of control. That's what "policy-based provisioning and management" and "independent of the underlying hardware" are all about.

 

SDS, like SDN, is about integration and automation, even if the "control plane/data plane" concept isn't exactly the same.

 

#4: SDS Is Bigger Than Hardware

 

SDN was invented to transcend micro-management of independent switches, and SDS similarly must escape from the confines of a single array. The primary challenge in storage today is scalability and flexibility, not performance or reliability. Abstraction of storage software from underlying hardware doesn't just mean being able to use different hardware; abstraction also means being able to span devices, swap components, and escape from the confines of a "box."

 

SDS ought to allow storage continuity even as hardware changes.

 

My Ideal Software-Defined Storage Solution

 

Here's what my ideal SDS solution looks like:

  1. A software platform for storage that virtualizes and abstracts underlying components
  2. A scalable solution that can grow, shrink, and change according to the needs of the application and users
  3. API-driven control, management, provisioning, and reporting that allows the array to "disappear" in an integrated application platform

 

Any storage solution that meets these requirements is truly software-defined and will deliver transformative benefits to IT. We've already seen solutions like this (VSAN, Amazon S3, Nutanix) and they are all notable more for what they deliver to applications than the similarity or differences between their underlying components. Software-defined storage really is a different animal.

 

I am Stephen Foskett and I love storage. You can find more writing like this at blog.fosketts.net, connect with me as @SFoskett on Twitter, and check out my Tech Field Day events.

I’m probably going to get some heat for this, but I have to get something off my chest. At Cisco Live this year, I saw a technology that was really flexible, with amazing controllability potential, and just cool: PoE-based LED lighting. Rather than connecting light fixtures to mains power and controlling them via a separate control network, it’s all one cable. Network and power, with the efficiency of solid-state LED lighting, with only one connection. However, after several vendor conversations, I can’t escape the conclusion that the idea is inherently, well… dumb.

 

Okay, Not Dumb, Just Math

 

Before Cree®, Philips®, or any of the other great companies with clever tech in the Cisco® Digital Celling Pavilion get out their pitchforks, I have to offer a disclaimer: this is just my opinion. But it is the opinion of an IT engineer who also does lots of electrical work at home, automation, and, in a former life, network consulting for a commercial facilities department. I admit I may be biased, and I’m not doing justice to features like occupancy and efficiency analytics, but the problem I can’t get past is the high cost of PoE lighting. It’s a regression to copper cable, and worse, at least as shown at Cisco Live, ridiculous switch overprovisioning.

 

poeledceling.png

First, the obvious: the cost of pulling copper. We’re aggressively moving clients to ever-faster WLANs both to increase flexibility and decrease network wiring costs. With PoE lighting, each and every fixture and bulb has its own dedicated CAT-3+ cable running hub-and-spoke back to an IT closet. Ask yourself this question: do you have more workers or bulbs in your environment? Exactly. Anyone want to go back to the days of thousands of cables in dozens of thick bundles?  (Image right: The aftermath of only two dozen fixtures.)

 

Second, and I’m not picking on Cisco here, is the per port cost of using enterprise switches as wall plugs. UPNP is a marvelous thing. A thousand-plus watts per switch is remarkable, and switch stacking makes everything harmonious and redundant. Everyone gets a different price of course, but the demo switch at Cisco Live was a Catalyst 3850 48 Port UPOE, and at ~$7,000, that’s $145/port. Even a 3650 at ~ $4000 comes to $84 to connect a single light fixture.

 

It’s not that there’s anything inherently wrong with this idea, and I would love to have more Energy Wise Catalysts in my lab, but this is overkill. Cisco access switches are about bandwidth, and PoE LEDs need little. As one vendor in the pavilion put it, “… and bandwidth for these fixtures and sensors is stupid simple. It could work over dial-up, no problem.” It’s going to be tough to sell IT budget managers enterprise-grade stackable switches with multi-100 gig backplanes for that.

 

And $84/port is just a SWAG at hardware costs. Are you going to put a rack of a dozen Catalysts directly on mains power? Of course not. You’re going to add in UPS to protect your enterprise investment. (One of the touted benefits of PoE lighting is stand-by.) The stated goal by most of the vendors was to keep costs under $100/port, and that’s going to be a challenge when you include cable runs, IT closets, switches, and UPS. Even then, $100/port?

 

Other Considerations

 

There are a couple of other considerations, like Cat 3+ efficiency at high power. As you push more power over tiny network cables it becomes less efficient, and at a certain output per port, overall PoE system efficiency drops and becomes less efficient than AC LEDs. There’s also an IPAM management issue, with each fixture getting its own IP. That ads DHCP, and more subnets to wrangle without adding much in terms of management. Regardless of how you reach each fixture you’ll still have to name, organize, and otherwise mange how they’re addressed. Do you really care if it’s by IP you manage or a self-managing low-power mesh?

 

DC Bus for the Rest of Us

 

What this initiative really highlights is that just as we’re in the last gasps of switched mobile carrier networks, and cable television provided in bundles via RF, we need to move past the most basic concept of AC mains lighting to the real opportunity of DC lighting. Instead of separate Ethernet runs, or hub-and-spoke routed 120VAC Romex, the solution for lighting is low voltage DC busses with an overlay control network. It’s the low voltage and efficient common DC transformation that’s the real draw.

 

Lighting would evolve into universally powered, addressable nodes, daisy-chained together with a tap-able cable supplying 24-48VDC from common power supplies. In a perfect world, the lighting bus would also support a data channel, but then you get into the kind of protectionist vendor shenanigans that stall interoperability. What seems to be working for lighting or IoT in general is more future-proof and replaceable control systems, like wireless IPv6 networks today, then whatever comes next later.

 

Of course, on the other hand, if a manufacturer starts shipping nearly disposable white-label PoE switches that aren’t much smarter than mid-spans, mated to shockingly inexpensive and thin cables, then maybe PoE lightening has a brighter future.

 

What do you think? Besides “shockingly” not being the worst illumination pun in this post?

The Rio Olympics start this week which means one thing: Around the clock reports on the Zika virus. If we don't get at least one camera shot of an athlete freaking out after a mosquito bite then I'm going to consider this event a complete disaster.

 

Here are the items I found most amusing from around the Internet. Enjoy!

 

#nbcfail hashtag on Twitter

Because I enjoy reading about the awful broadcast coverage from NBC and I think you should, too. 

 

Apple taps BlackBerry talent for self-driving software project, report says

Since they did so good at BlackBerry, this bodes well for Apple.

 

Parenting In The Digital Age

With my children hitting their teenage years, this is the stuff that scares me the most.

 

Microsoft's Windows NT 4.0 launched 20 years ago this week

Happy Birthday! Where were you when NT 4.0 launched in 1996? I'm guessing some of you unlucky ones were on support that night. Sorry.

 

Larry Ellison Accepts the Dare: Oracle Will Purchase NetSuite

First Larry says that the cloud isn't a thing. Then he says he invented the cloud. And now he overspends for NetSuite. With that kind of background he could run for President. Seriously though, this purchase shows just how far behind Oracle is with the cloud.

 

This Guy Hates Traffic... So He's Building a Flying Car

Flying cars! I've been promised this for years! Forget the Tesla, I will line up to buy one of these.

 

ACHIEVEMENT UNLOCKED! Last weekend I found all 4 IKEA references made in the Deadpool movie! What, you didn't know this was a game?

a - 1 (4).jpg

When there are application performance issues, most IT teams focus on the hardware, after blaming and ruling out the network, of course. If an application is slow, the first thought is to add hardware to combat the problem. Agencies have spent millions throwing hardware at performance issues without a good understanding of the true bottlenecks slowing down an application.

 

But a recent survey on application performance management by research firm Gleanster LLC reveals that the database is the No. 1 source of issues with performance. In fact, 88 percent of respondents cite the database as the most common challenge or issue with application performance.

 

Understanding that the database is often the cause of application performance issues is just the beginning; knowing where to look and what to look for is the next step. There are two main challenges to trying to identify database performance issues:

 

There are a limited number of tools that assess database performance. Tools normally assess the health of a database (is it working, or is it broken?), but don’t identify and help remediate specific database performance issues.

 

Database monitoring tools that do provide more information don’t go much deeper. Most tools send information in and collect information from the database, with little to no insight about what happens inside the database that can impact performance.

 

To successfully assess database performance and uncover the root cause of application performance issues, IT pros must look at database performance from an end-to-end perspective.

 

The application performance team should be performing wait-time analysis as part of regular application and database maintenance. This is a method that determines how long the database engine takes to receive, process, fulfill and return a request for information. A thorough wait-time analysis looks at every level of the database and breaks down each step to the millisecond.

 

The next step is to look at the results, then correlate the information and compare. Maybe the database spends the most time writing to disk; maybe it spends more time reading memory. Understanding the breakdown of each step helps determine where there may be a slowdown and, more importantly, where to look to identify and fix the problem.

 

We suggest that federal IT shops implement regular wait-time analysis as a baseline of optimized performance. The baseline can help with change management. If a change has been implemented, and there is a sudden slowdown in an application or in the database itself, a fresh analysis can help quickly pinpoint the location of the performance change, leading to a much quicker fix.

 

Our nearly insatiable need for faster performance may seem like a double-edged sword. On one hand, optimized application performance means greater efficiency; on the other hand, getting to that optimized state can seem like an expensive, unattainable goal.

 

Knowing how to optimize performance is a great first step toward staying ahead of the growing need for instantaneous access to information.

 

Find the full article on Government Computer News.

What is VM sprawl ?

VM sprawl is defined as a waste of resources (compute : CPU cycles and RAM consumption) as well as storage capacity due to a lack of oversight and control over VM resource provisioning. Because of its uncontrolled nature, VM sprawl has adverse effects on your environment’s performance at best, and can lead to more serious complications (including downtime) in constrained environments.

 

VM Sprawl and its consequences

Lack of management and control over the environment will cause VMs to be created in an uncontrolled way. This means not only the total number of VMs in a given environment, but also how resources are allocated to these VMs. You could have a large environment with minimal sprawl, but a smaller environment with considerable sprawl.

 

Here are some of the factors that cause VM sprawl:

 

  • Oversized VMs: VMs which were allocated more resources than they really need. Consequences:
    • Waste of compute and/or storage resources
    • Over-allocation of RAM will cause ballooning and swapping to disk if the environment falls under memory pressure, which will result in performance degradation
    • Over-allocation of virtual CPU will cause high co-stops, which means that the more vCPUs a VM has, the more it needs to wait for CPU cycles to be available on all the physical cores at the same moment. The more vCPUs a VM has, the less likely it is that all the cores will be available at the same time
    • The more RAM and vCPU a VM has, the higher is the RAM overhead required by the hypervisor.

 

  • Idle VMs: VMs up and running, not necessarily oversized, but being unused and having no activity. Consequences:
  • Waste of computer and/or storage resources + RAM overhead at the hypervisor level
  • Resources wasted by Idle VMs may impact CPU scheduling and RAM allocation while the environment is under contention
  • Powered Off VMs and orphaned VMDKs eat up space resources

 

 

How to Manage VM sprawl

Controlling and containing VM sprawl relies on process and operational aspects. The former covers how one prevents VM sprawl from happening, while the latter covers how to tackle sprawl that happens regardless of controls set up at the process level.

 

Process

On the process side, IT should define standards and implement policies:

 

  • Role Based Access Control which defines roles & permissions on who can do what. This will greatly help reduce the creation of rogue VMs and snapshots.
  • Define VM categories and acceptable maximums: while not all the VMs can fit in one box, standardizing on several VM categories (application, databases, etc.) will help filter out bizarre or oversized requests. Advanced companies with self-service portals may want to restrict/categorize what VMs can be created by which users or business units
  • Challenge any oversized VM request and demand justification for potentially oversized VMs
  • Allocate resources based on real utilization. You can propose a policy where a VM resources will be monitored during 90 days after which IT can adjust resource allocation if the VM is undersized or oversized.
  • Implement policies on snapshots lifetime and track snapshot creation requests if possible

 

In certain environments where VMs and their allocated resources are chargeable, you should contact your customers to let them know that a VM needs to be resized or was already resized (based on your policies and rules of engagement) to ensure they are not billed incorrectly. It is worthwhile to formalize your procedures for how VM sprawl management activities will be covered, and to agree with stakeholders on pre-defined downtime windows that will allow you to seamlessly carry any right-sizing activities.

 

Operational

Even with the controls above, sprawl can still happen. It can be caused by a variety of factors. For example, you could have a batch of VMs provisioned for one project, but while they passed through the process controls, they can sit idle for months eating up resources because the project could end up being delayed or cancelled and no one informed the IT team.

 

In VMware environments where storage is thin provisioned at the array level, and where Storage DRS is enabled on datastore clusters it’s also important to monitor the storage consumption at the array level. While storage capacity will appear to be freed up at the datastore level after a VM is moved around or deleted, it will not be released on the array and this can lead to out-of-storage conditions. A manual triggering of the VAAI Unmap primitive will be required, ideally outside of business hours, to reclaim unallocated space. It’s thus important to have, as a part of your operational procedures, a capacity reclamation process that is triggered regularly.

 

The usage of virtual infrastructure management tools with built-in resource analysis & reclamation capabilities, such as Solarwinds Virtualization Manager, is a must. By leveraging software capabilities, these tedious analysis and reconciliation tasks are no longer required and dashboards present IT teams with immediately actionable results.

 

Conclusion

Even with all the good will in the world, VM sprawl will happen. Although you may have the best policies in place, your environment is dynamic and in the rush that IT Operations are, you just can’t have an eye on everything. And this is coming from a guy whose team successfully recovered 22 TB of space previously occupied by orphaned VMDKs earlier this year.

(to those who saw the earlier post, I apologize for the confusion. This is the correct link to use!)

 

Our first foray into Wiley Brands' Dummies series - "Network Monitoring for Dummies" - has been a runaway success at conventions and trade shows, with copies literally disappearing off our display when our backs are turned.

 

But we realize that not everyone has $3k to drop to visit us at CiscoLive, MS:Ignite, VMWorld, and the rest. So we're publishing the link here. Feel free to download and share the monitoring glory with colleagues, or even pass a copy to management!

thwack.jpg

I BEAT THEM TO FIRING ME! (Part Two) Fight Back

Why network configuration, change and compliance management (NCCCM) is a must

Inspired by former Citibank employee sentencing

(Part Two)

 

We've all heard horror stories about the disgruntled employee who pillages the office supply closet and leaves the building waving an obscene gesture, security badge skittering across the parking lot in his wake. Rage-quit is a thing, folks, and it's perfectly reasonable to be afraid that someone with high-level access, someone who could make changes to a network, might do so if they get mad enough. This happens more often than anyone would like to think about, and it's something that needs to be addressed in every organization. I felt like we should talk about this and discuss ways to help control and slow the damage of said employees and their bad will. Bottom line: we need to be aware of these situations and have a plan for recovery when things like this happen.

 

 

The gist of the story is simple: there was an employee who wiped out critical network configurations to about 90% of his former company's infrastructure.  Monday he was sentenced on charges of criminal vandalism. So, I realize the article above is technically in the past, but it brings up a great starter conversation about how IT organizations can stop criminal vandalism by actually using NCCCM products to protect ourselves and others from any type of disastrous events. Sometimes you need that brief pause or slight inconvenience to help you think straight and not go over the edge. This post can also help keep your butt out of, well, jail .

 

Today, we are going to talk about some of the risks of not having NCCCM software:

 

 

  1. Real-time change notification not enabled.
    • There is no tracking, idea, or reference to when changes are being made via maintenance plans, change requests, or malicious intent.
      • Being able to see network changes and know the timing helps you to be proactive, and gives you immediate remediation action for your network.
    • Who's on first base, and did someone slide in to home base?
      • When you have more than a couple of network engineers, documentation can be lacking and, well, you're busy, right? Being able to track when changes happen and who made them allows you to find and discover who, when, and what was changed, even when it's a week later.
      • Being able to compare the change that was made to existing is key to correlating issues after a change was made. All of a sudden, traffic is not flowing, or it's restricted, and you find out it was an error in the config change.
    • Someone is on your network changing your critical devices and wiping them clean.
      • Receive alerts so you don't find this type of information out when it's too late. Be able to log in, and after receiving the alert, restore to previous config.
  2. Approval process not in use.
    • No change auditing.
      • Being able to make changes without approval or a process sets you up for human error or worse: attacks.
      • Implementing an approval process allows you to have an auditing system that shows that more than one person approved a change.
      • Use this with real-time change notification to see if anyone outside your team is making changes. Either allow them into your NCCCM, or delete or lock out their login info to the devices.
    • No one can verify that you are making the change, or even what that change was.
      • When you have a larger team, you delegate changes or areas of functionality. Having an approval process verifies that the correct changes are being made. That gives you an extra set of eyes on the changes that are being made, which adds another level of detection to human error.
    • One person has complete access to your devices at a control level.
      • When you give people straight access to network devices there is a single point of failure. Taking an extra step creates a safe zone of recognition, training, and the ability to track changes and implementations on your network.
  3. Advanced change alert not enabled.
    • Not having an escalation alert set up can leave you with no configurations on your devices when you come into work the next day.
      • Set up escalation alerts based on more than one action.
        • Create a mass change alert if X amount of syslog changes happen within five minutes: Alert Manager NOW.
        • Mute these when implementing maintenance plans. more info by adatole
  4. Backups you are saving to your desktop or network drive (when you remember).
    • If a crisis happens, the great news is that network devices just need to be told what to do. But if you are like me and don't remember every line of code for hundreds of devices, then you better implement a backup system NOW.
      • If you have backups being stored, recovery is a click away with an NCCCM.
      • Compare starting to running to make sure a reboot won't cancel your changes.
      • Verify you have backups in secure locations so downtime is minimized and quickly averted.
        • I generally implement server side and network share drive backups. Make your server accessible with security verification lockdown in case someone tries to delete the backups (this happens because they don't want you to recover).
  5. Recovery procedures not in place.
    • Can your team recover from an emergency without you being on site?
      • Have a plan and practice with your team. You have to have a plan to be able to recover from maintenance plans gone wrong all the way to disaster recovery.  This takes practice, and should be something the whole team discusses so that you are better engaged. It helps to have an open mind to see how others may offer solutions to each potential problem suggested.
    • Setup an automatic password change template to be easily used in case of a potential issue within or outside your organization.
    • Use your NCCCM to monitor your configurations for potential issues or open back doors within your network.
      • Sometimes people will start allowing access within your network watching your configurations with a compliance reporting service allows you to detect and remediate quickly to stop these types of security breaches in their tracks.

 

If your curious on setup check this out:More info Security and SolarWinds NCM

 

Stay tuned for part two, I'll showcase how each one of these can be used in response to security!

 

Now that is a few things you should be able to use within any NCCCM software package.  This should also be something you revisit consistently to reevaluate and assess your situation and how to better protect yourself.

Let's dive into the mindset and standard methodologies around the security aspect:

 

This isn't just for technology this is in general things to be aware of and to implement on your own.  The ability to look at these with a non-judging eye and see them as just ways to hold off malicious attacks or ill will.

 

  1. There needs to be a clear exit strategy for anyone that is going to be fired or removed from a position with potential harm.
    • But he is such a nice guy?  Nice guys can turn bad.
    • When this information is being circulated you need to do what's best for your career as well as the company you work for and go on the defense.
      • Bring in specialized help organizations that can come in assess and prevent issues before they are terminated or moved
      • Make sure you verify all traffic and location they were involved in
        • Any passwords etc that were globally known NEEDS CHANGED NOW not LATER
        • Check all management software and pull rights to view only in the remainder days then delete access immediately after termination
        • Verify all company technology is accounted for (Accounting and inventory within your NCCCM is vital to maintain diligence on awareness of property and access to your network)
  2. Monitoring of team
    • Some may not be happy with a decision to terminate an employee and feel betrayed
    • Monitor their access and increase awareness to their actions
      • If you see them logging in to more routers and switches than ever before might setup a meeting...
      • See them going outside of their side and digging into things they should not, meeting time
      • Awareness is key and an approval process and change detection is key to preventing damage
  3. Security policies
    • You're only as good as the policy in place
      • Dig into your policies and make sure they are current and relevant
      • If you seriously have things like "If they call from desk phone reset password over the phone" type of security measures please REVISIT these.
        • Re-read that last statement
    • Make sure your team is signing acknowledgement of what they can and cannot do
      • Easier to prosecute when they have signed and agreed
    • Verify your security policies to your network devices
      • NCCCM compliance reporting setup for your needs is a great way to stay ahead of these items
      • You can find back doors on your network that people have setup to go around security policies this way. 

 

     I cannot obviously solve every issue, but at least help to point you into some good directions and processes.  If any of you want to jump in and add to this, please do I'm always interested in other people's methods of security.  The main point is to be aware of these situations, have a plan and recover when things like this happen.

 

Thank you,

 

~Dez~

 

Follow me on Twitter:

@Dez_Sayz

Filter Blog

By date: By tag: