Skip navigation
1 2 3 4 Previous Next

Geek Speak

1,861 posts

The Internet of Things (IoT) offers the promise of a more connected and efficient military, but Defense Department IT professionals are having a hard time turning that promise into reality. They’re deterred by the increasing demands and security vulnerabilities of more connected devices.


That hasn’t stopped defense agencies from exploring and investing in mobility and next-generation technology, including IoT devices. One of the points in the Defense Information Systems Agency’s 2015 – 2020 Strategic Plan specifically calls out the agency’s desire to “enable warfighter capabilities from a sovereign cyberspace domain, focused on speed, agility, and access.” The plan also notes “mobile devices…continue to transform our operational landscape and enable greater mission effectiveness through improved communication, access, information sharing, data analytics – resulting in more rapid response times.”


It’s a good thing the groundwork for IoT was laid a few years ago, when administrators were working on plans to fortify their networks against an onslaught of mobile devices. Perhaps unbeknownst to them, they had already begun implementing and solidifying strategies that can now serve as a good foundation for managing IoT’s unique set of challenges.


Tiny devices, big problems


The biggest challenge is the sheer number of devices that need to be considered. It’s not just a few smart phones; with IoT, there is literally an explosion of potentially thousands of tiny devices with different operating systems, all pumping vast amounts of data through already overloaded networks.

Many of these technological wonders were developed primarily for convenience, with security as an afterthought. There’s also the not insignificant matter of managing bandwidth and latency issues that the plethora of IoT devices will no doubt introduce.


Making the IoT dream an automated reality


These issues can be addressed through strategies revolving around monitoring user devices, managing logs and events, and using encrypted channels – the things that administrators hopefully began implementing in earnest when the first iPhones began hitting their networks.


Administrators will need to accelerate their device tracking efforts to new levels. Device tracking will help identify users and devices and create watch lists, and the challenge will be the number of new devices. And while log and event management software will still provide valuable data about potential attacks, the attack surface and potential vulnerabilities will increase exponentially with the introduction of a greater number of devices and network access points.


More than ever, managers will want to complement these efforts with network automation solutions, which can correct issues as they arise. This creates a much more streamlined atmosphere for administrators to manage, making it easier for them to get a handle on everything that touches the network.


A reluctance to automate will not work in a world where everything, from the tablets at central command to the uniforms on soldiers’ bodies, will someday soon be connected. It’s now time for federal IT administrators to build off their BYOD strategies to help the Defense Department realize DISA’s desire for a highly connected and mobilized military.


  Find the full article on Defense Systems.

It seems like you can't talk to anyone in IT without hearing about "software-defined" something these days. Ever since Software-Defined Networking (SDN) burst on the scene, it's the hot trend. My world of storage is just as bad: It seems as if every vendor is claiming to sell "Software-Defined Storage" without much clarity about what exactly it is. Is SDS just the latest cloudy buzzword or does it have a real meaning?


Wikipedia, that inerrant font of all human knowledge, defines Software-Defined Storage (SDS) to be "policy-based provisioning and management of data storage independent of the underlying hardware." It goes on to talk about abstraction, automation, and commodity hardware. I can get behind that definition. Wikipedia also contrasts SDS with mere "Software-Based Storage", which pretty much encompasses "all storage" these days!


I've fought quite a few battles about what is (and isn't) "Software-Defined Storage", and I've listened to more than enough marketers twisting the term, so I think I can make some informed statements.


#1: Software-Defined Storage Isn't Just Software-Based Storage


Lots of marketers are slapping the "software-defined" name on everything they sell. I recently talked to one who, quite earnestly, insisted that five totally different products were all "SDS", including a volume manager, a scale-out NAS, and an API-driven cloud storage solution! Just about the only thing all these products have in common is that they all contain software. Clearly, "software" isn't sufficient for "SDS".


It would be pointless to try to imagine a storage system, software-denied or otherwise, that doesn't rely on software for the majority of its functionality.


#2: Commodity Hardware Isn't Sufficient For SDS Either


Truthfully, all storage these days is primarily software. Even the big fancy arrays from big-name vendors are based on the same x86 PC hardware as my home lab. That's the rise of commodity hardware for you. And it's not just x86: SAS and SATA, PCI Express, Ethernet, and just about every other technology used in storage arrays is common to PC's and servers too.


Commodity hardware is a great way to improve the economics of storage, but software running on x86 isn't sufficiently differentiated to be called "SDS" either.


#3: Data Plane + Control Plane = Integration and Automation


In the world of software-defined networking, you'll hear a lot about "separating the data plane from the control plane." We can't make the exact same analogy for storage since it's a fundamentally different technology, but there's an important conteptual seed here: SDN is about programmability and centralized control, and this architecture allows such a change. Software-defined storage should similarly allow centralization of control. That's what "policy-based provisioning and management" and "independent of the underlying hardware" are all about.


SDS, like SDN, is about integration and automation, even if the "control plane/data plane" concept isn't exactly the same.


#4: SDS Is Bigger Than Hardware


SDN was invented to transcend micro-management of independent switches, and SDS similarly must escape from the confines of a single array. The primary challenge in storage today is scalability and flexibility, not performance or reliability. Abstraction of storage software from underlying hardware doesn't just mean being able to use different hardware; abstraction also means being able to span devices, swap components, and escape from the confines of a "box."


SDS ought to allow storage continuity even as hardware changes.


My Ideal Software-Defined Storage Solution


Here's what my ideal SDS solution looks like:

  1. A software platform for storage that virtualizes and abstracts underlying components
  2. A scalable solution that can grow, shrink, and change according to the needs of the application and users
  3. API-driven control, management, provisioning, and reporting that allows the array to "disappear" in an integrated application platform


Any storage solution that meets these requirements is truly software-defined and will deliver transformative benefits to IT. We've already seen solutions like this (VSAN, Amazon S3, Nutanix) and they are all notable more for what they deliver to applications than the similarity or differences between their underlying components. Software-defined storage really is a different animal.

I’m probably going to get some heat for this, but I have to get something off my chest. At Cisco Live this year, I saw a technology that was really flexible, with amazing controllability potential, and just cool: PoE-based LED lighting. Rather than connecting light fixtures to mains power and controlling them via a separate control network, it’s all one cable. Network and power, with the efficiency of solid-state LED lighting, with only one connection. However, after several vendor conversations, I can’t escape the conclusion that the idea is inherently, well… dumb.


Okay, Not Dumb, Just Math


Before Cree®, Philips®, or any of the other great companies with clever tech in the Cisco® Digital Celling Pavilion get out their pitchforks, I have to offer a disclaimer: this is just my opinion. But it is the opinion of an IT engineer who also does lots of electrical work at home, automation, and, in a former life, network consulting for a commercial facilities department. I admit I may be biased, and I’m not doing justice to features like occupancy and efficiency analytics, but the problem I can’t get past is the high cost of PoE lighting. It’s a regression to copper cable, and worse, at least as shown at Cisco Live, ridiculous switch overprovisioning.



First, the obvious: the cost of pulling copper. We’re aggressively moving clients to ever-faster WLANs both to increase flexibility and decrease network wiring costs. With PoE lighting, each and every fixture and bulb has its own dedicated CAT-3+ cable running hub-and-spoke back to an IT closet. Ask yourself this question: do you have more workers or bulbs in your environment? Exactly. Anyone want to go back to the days of thousands of cables in dozens of thick bundles?  (Image right: The aftermath of only two dozen fixtures.)


Second, and I’m not picking on Cisco here, is the per port cost of using enterprise switches as wall plugs. UPNP is a marvelous thing. A thousand-plus watts per switch is remarkable, and switch stacking makes everything harmonious and redundant. Everyone gets a different price of course, but the demo switch at Cisco Live was a Catalyst 3850 48 Port UPOE, and at ~$7,000, that’s $145/port. Even a 3650 at ~ $4000 comes to $84 to connect a single light fixture.


It’s not that there’s anything inherently wrong with this idea, and I would love to have more Energy Wise Catalysts in my lab, but this is overkill. Cisco access switches are about bandwidth, and PoE LEDs need little. As one vendor in the pavilion put it, “… and bandwidth for these fixtures and sensors is stupid simple. It could work over dial-up, no problem.” It’s going to be tough to sell IT budget managers enterprise-grade stackable switches with multi-100 gig backplanes for that.


And $84/port is just a SWAG at hardware costs. Are you going to put a rack of a dozen Catalysts directly on mains power? Of course not. You’re going to add in UPS to protect your enterprise investment. (One of the touted benefits of PoE lighting is stand-by.) The stated goal by most of the vendors was to keep costs under $100/port, and that’s going to be a challenge when you include cable runs, IT closets, switches, and UPS. Even then, $100/port?


Other Considerations


There are a couple of other considerations, like Cat 3+ efficiency at high power. As you push more power over tiny network cables it becomes less efficient, and at a certain output per port, overall PoE system efficiency drops and becomes less efficient than AC LEDs. There’s also an IPAM management issue, with each fixture getting its own IP. That ads DHCP, and more subnets to wrangle without adding much in terms of management. Regardless of how you reach each fixture you’ll still have to name, organize, and otherwise mange how they’re addressed. Do you really care if it’s by IP you manage or a self-managing low-power mesh?


DC Bus for the Rest of Us


What this initiative really highlights is that just as we’re in the last gasps of switched mobile carrier networks, and cable television provided in bundles via RF, we need to move past the most basic concept of AC mains lighting to the real opportunity of DC lighting. Instead of separate Ethernet runs, or hub-and-spoke routed 120VAC Romex, the solution for lighting is low voltage DC busses with an overlay control network. It’s the low voltage and efficient common DC transformation that’s the real draw.


Lighting would evolve into universally powered, addressable nodes, daisy-chained together with a tap-able cable supplying 24-48VDC from common power supplies. In a perfect world, the lighting bus would also support a data channel, but then you get into the kind of protectionist vendor shenanigans that stall interoperability. What seems to be working for lighting or IoT in general is more future-proof and replaceable control systems, like wireless IPv6 networks today, then whatever comes next later.


Of course, on the other hand, if a manufacturer starts shipping nearly disposable white-label PoE switches that aren’t much smarter than mid-spans, mated to shockingly inexpensive and thin cables, then maybe PoE lightening has a brighter future.


What do you think? Besides “shockingly” not being the worst illumination pun in this post?

The Rio Olympics start this week which means one thing: Around the clock reports on the Zika virus. If we don't get at least one camera shot of an athlete freaking out after a mosquito bite then I'm going to consider this event a complete disaster.


Here are the items I found most amusing from around the Internet. Enjoy!


#nbcfail hashtag on Twitter

Because I enjoy reading about the awful broadcast coverage from NBC and I think you should, too. 


Apple taps BlackBerry talent for self-driving software project, report says

Since they did so good at BlackBerry, this bodes well for Apple.


Parenting In The Digital Age

With my children hitting their teenage years, this is the stuff that scares me the most.


Microsoft's Windows NT 4.0 launched 20 years ago this week

Happy Birthday! Where were you when NT 4.0 launched in 1996? I'm guessing some of you unlucky ones were on support that night. Sorry.


Larry Ellison Accepts the Dare: Oracle Will Purchase NetSuite

First Larry says that the cloud isn't a thing. Then he says he invented the cloud. And now he overspends for NetSuite. With that kind of background he could run for President. Seriously though, this purchase shows just how far behind Oracle is with the cloud.


This Guy Hates Traffic... So He's Building a Flying Car

Flying cars! I've been promised this for years! Forget the Tesla, I will line up to buy one of these.


ACHIEVEMENT UNLOCKED! Last weekend I found all 4 IKEA references made in the Deadpool movie! What, you didn't know this was a game?

a - 1 (4).jpg

When there are application performance issues, most IT teams focus on the hardware, after blaming and ruling out the network, of course. If an application is slow, the first thought is to add hardware to combat the problem. Agencies have spent millions throwing hardware at performance issues without a good understanding of the true bottlenecks slowing down an application.


But a recent survey on application performance management by research firm Gleanster LLC reveals that the database is the No. 1 source of issues with performance. In fact, 88 percent of respondents cite the database as the most common challenge or issue with application performance.


Understanding that the database is often the cause of application performance issues is just the beginning; knowing where to look and what to look for is the next step. There are two main challenges to trying to identify database performance issues:


There are a limited number of tools that assess database performance. Tools normally assess the health of a database (is it working, or is it broken?), but don’t identify and help remediate specific database performance issues.


Database monitoring tools that do provide more information don’t go much deeper. Most tools send information in and collect information from the database, with little to no insight about what happens inside the database that can impact performance.


To successfully assess database performance and uncover the root cause of application performance issues, IT pros must look at database performance from an end-to-end perspective.


The application performance team should be performing wait-time analysis as part of regular application and database maintenance. This is a method that determines how long the database engine takes to receive, process, fulfill and return a request for information. A thorough wait-time analysis looks at every level of the database and breaks down each step to the millisecond.


The next step is to look at the results, then correlate the information and compare. Maybe the database spends the most time writing to disk; maybe it spends more time reading memory. Understanding the breakdown of each step helps determine where there may be a slowdown and, more importantly, where to look to identify and fix the problem.


We suggest that federal IT shops implement regular wait-time analysis as a baseline of optimized performance. The baseline can help with change management. If a change has been implemented, and there is a sudden slowdown in an application or in the database itself, a fresh analysis can help quickly pinpoint the location of the performance change, leading to a much quicker fix.


Our nearly insatiable need for faster performance may seem like a double-edged sword. On one hand, optimized application performance means greater efficiency; on the other hand, getting to that optimized state can seem like an expensive, unattainable goal.


Knowing how to optimize performance is a great first step toward staying ahead of the growing need for instantaneous access to information.


Find the full article on Government Computer News.

What is VM sprawl ?

VM sprawl is defined as a waste of resources (compute : CPU cycles and RAM consumption) as well as storage capacity due to a lack of oversight and control over VM resource provisioning. Because of its uncontrolled nature, VM sprawl has adverse effects on your environment’s performance at best, and can lead to more serious complications (including downtime) in constrained environments.


VM Sprawl and its consequences

Lack of management and control over the environment will cause VMs to be created in an uncontrolled way. This means not only the total number of VMs in a given environment, but also how resources are allocated to these VMs. You could have a large environment with minimal sprawl, but a smaller environment with considerable sprawl.


Here are some of the factors that cause VM sprawl:


  • Oversized VMs: VMs which were allocated more resources than they really need. Consequences:
    • Waste of compute and/or storage resources
    • Over-allocation of RAM will cause ballooning and swapping to disk if the environment falls under memory pressure, which will result in performance degradation
    • Over-allocation of virtual CPU will cause high co-stops, which means that the more vCPUs a VM has, the more it needs to wait for CPU cycles to be available on all the physical cores at the same moment. The more vCPUs a VM has, the less likely it is that all the cores will be available at the same time
    • The more RAM and vCPU a VM has, the higher is the RAM overhead required by the hypervisor.


  • Idle VMs: VMs up and running, not necessarily oversized, but being unused and having no activity. Consequences:
  • Waste of computer and/or storage resources + RAM overhead at the hypervisor level
  • Resources wasted by Idle VMs may impact CPU scheduling and RAM allocation while the environment is under contention
  • Powered Off VMs and orphaned VMDKs eat up space resources



How to Manage VM sprawl

Controlling and containing VM sprawl relies on process and operational aspects. The former covers how one prevents VM sprawl from happening, while the latter covers how to tackle sprawl that happens regardless of controls set up at the process level.



On the process side, IT should define standards and implement policies:


  • Role Based Access Control which defines roles & permissions on who can do what. This will greatly help reduce the creation of rogue VMs and snapshots.
  • Define VM categories and acceptable maximums: while not all the VMs can fit in one box, standardizing on several VM categories (application, databases, etc.) will help filter out bizarre or oversized requests. Advanced companies with self-service portals may want to restrict/categorize what VMs can be created by which users or business units
  • Challenge any oversized VM request and demand justification for potentially oversized VMs
  • Allocate resources based on real utilization. You can propose a policy where a VM resources will be monitored during 90 days after which IT can adjust resource allocation if the VM is undersized or oversized.
  • Implement policies on snapshots lifetime and track snapshot creation requests if possible


In certain environments where VMs and their allocated resources are chargeable, you should contact your customers to let them know that a VM needs to be resized or was already resized (based on your policies and rules of engagement) to ensure they are not billed incorrectly. It is worthwhile to formalize your procedures for how VM sprawl management activities will be covered, and to agree with stakeholders on pre-defined downtime windows that will allow you to seamlessly carry any right-sizing activities.



Even with the controls above, sprawl can still happen. It can be caused by a variety of factors. For example, you could have a batch of VMs provisioned for one project, but while they passed through the process controls, they can sit idle for months eating up resources because the project could end up being delayed or cancelled and no one informed the IT team.


In VMware environments where storage is thin provisioned at the array level, and where Storage DRS is enabled on datastore clusters it’s also important to monitor the storage consumption at the array level. While storage capacity will appear to be freed up at the datastore level after a VM is moved around or deleted, it will not be released on the array and this can lead to out-of-storage conditions. A manual triggering of the VAAI Unmap primitive will be required, ideally outside of business hours, to reclaim unallocated space. It’s thus important to have, as a part of your operational procedures, a capacity reclamation process that is triggered regularly.


The usage of virtual infrastructure management tools with built-in resource analysis & reclamation capabilities, such as Solarwinds Virtualization Manager, is a must. By leveraging software capabilities, these tedious analysis and reconciliation tasks are no longer required and dashboards present IT teams with immediately actionable results.



Even with all the good will in the world, VM sprawl will happen. Although you may have the best policies in place, your environment is dynamic and in the rush that IT Operations are, you just can’t have an eye on everything. And this is coming from a guy whose team successfully recovered 22 TB of space previously occupied by orphaned VMDKs earlier this year.

(to those who saw the earlier post, I apologize for the confusion. This is the correct link to use!)


Our first foray into Wiley Brands' Dummies series - "Network Monitoring for Dummies" - has been a runaway success at conventions and trade shows, with copies literally disappearing off our display when our backs are turned.


But we realize that not everyone has $3k to drop to visit us at CiscoLive, MS:Ignite, VMWorld, and the rest. So we're publishing the link here. Feel free to download and share the monitoring glory with colleagues, or even pass a copy to management!


I BEAT THEM TO FIRING ME! (Part Two) Fight Back

Why network configuration, change and compliance management (NCCCM) is a must

Inspired by former Citibank employee sentencing


We've all heard horror stories about the disgruntled employee who pillages the office supply closet and leaves the building waving an obscene gesture, security badge skittering across the parking lot in his wake. Rage-quit is a thing, folks, and it's perfectly reasonable to be afraid that someone with high-level access, someone who could make changes to a network, might do so if they get mad enough. This happens more often than anyone would like to think about, and it's something that needs to be addressed in every organization. I felt like we should talk about this and discuss ways to help control and slow the damage of said employees and their bad will. Bottom line: we need to be aware of these situations and have a plan for recovery when things like this happen.



The gist of the story is simple: there was an employee who wiped out critical network configurations to about 90% of his former company's infrastructure.  Monday he was sentenced on charges of criminal vandalism. So, I realize the article above is technically in the past, but it brings up a great starter conversation about how IT organizations can stop criminal vandalism by actually using NCCCM products to protect ourselves and others from any type of disastrous events. Sometimes you need that brief pause or slight inconvenience to help you think straight and not go over the edge. This post can also help keep your **** out of, well, jail .


Today, we are going to talk about some of the risks of not having NCCCM software:



  1. Real-time change notification not enabled.
    • There is no tracking, idea, or reference to when changes are being made via maintenance plans, change requests, or malicious intent.
      • Being able to see network changes and know the timing helps you to be proactive, and gives you immediate remediation action for your network.
    • Who's on first base, and did someone slide in to home base?
      • When you have more than a couple of network engineers, documentation can be lacking and, well, you're busy, right? Being able to track when changes happen and who made them allows you to find and discover who, when, and what was changed, even when it's a week later.
      • Being able to compare the change that was made to existing is key to correlating issues after a change was made. All of a sudden, traffic is not flowing, or it's restricted, and you find out it was an error in the config change.
    • Someone is on your network changing your critical devices and wiping them clean.
      • Receive alerts so you don't find this type of information out when it's too late. Be able to log in, and after receiving the alert, restore to previous config.
  2. Approval process not in use.
    • No change auditing.
      • Being able to make changes without approval or a process sets you up for human error or worse: attacks.
      • Implementing an approval process allows you to have an auditing system that shows that more than one person approved a change.
      • Use this with real-time change notification to see if anyone outside your team is making changes. Either allow them into your NCCCM, or delete or lock out their login info to the devices.
    • No one can verify that you are making the change, or even what that change was.
      • When you have a larger team, you delegate changes or areas of functionality. Having an approval process verifies that the correct changes are being made. That gives you an extra set of eyes on the changes that are being made, which adds another level of detection to human error.
    • One person has complete access to your devices at a control level.
      • When you give people straight access to network devices there is a single point of failure. Taking an extra step creates a safe zone of recognition, training, and the ability to track changes and implementations on your network.
  3. Advanced change alert not enabled.
    • Not having an escalation alert set up can leave you with no configurations on your devices when you come into work the next day.
      • Set up escalation alerts based on more than one action.
        • Create a mass change alert if X amount of syslog changes happen within five minutes: Alert Manager NOW.
        • Mute these when implementing maintenance plans. more info by adatole
  4. Backups you are saving to your desktop or network drive (when you remember).
    • If a crisis happens, the great news is that network devices just need to be told what to do. But if you are like me and don't remember every line of code for hundreds of devices, then you better implement a backup system NOW.
      • If you have backups being stored, recovery is a click away with an NCCCM.
      • Compare starting to running to make sure a reboot won't cancel your changes.
      • Verify you have backups in secure locations so downtime is minimized and quickly averted.
        • I generally implement server side and network share drive backups. Make your server accessible with security verification lockdown in case someone tries to delete the backups (this happens because they don't want you to recover).
  5. Recovery procedures not in place.
    • Can your team recover from an emergency without you being on site?
      • Have a plan and practice with your team. You have to have a plan to be able to recover from maintenance plans gone wrong all the way to disaster recovery.  This takes practice, and should be something the whole team discusses so that you are better engaged. It helps to have an open mind to see how others may offer solutions to each potential problem suggested.
    • Setup an automatic password change template to be easily used in case of a potential issue within or outside your organization.
    • Use your NCCCM to monitor your configurations for potential issues or open back doors within your network.
      • Sometimes people will start allowing access within your network watching your configurations with a compliance reporting service allows you to detect and remediate quickly to stop these types of security breaches in their tracks.


If your curious on setup check this out:More info Security and SolarWinds NCM


Stay tuned for part two, I'll showcase how each one of these can be used in response to security!


Now that is a few things you should be able to use within any NCCCM software package.  This should also be something you revisit consistently to reevaluate and assess your situation and how to better protect yourself.

Let's dive into the mindset and standard methodologies around the security aspect:


This isn't just for technology this is in general things to be aware of and to implement on your own.  The ability to look at these with a non-judging eye and see them as just ways to hold off malicious attacks or ill will.


  1. There needs to be a clear exit strategy for anyone that is going to be fired or removed from a position with potential harm.
    • But he is such a nice guy?  Nice guys can turn bad.
    • When this information is being circulated you need to do what's best for your career as well as the company you work for and go on the defense.
      • Bring in specialized help organizations that can come in assess and prevent issues before they are terminated or moved
      • Make sure you verify all traffic and location they were involved in
        • Any passwords etc that were globally known NEEDS CHANGED NOW not LATER
        • Check all management software and pull rights to view only in the remainder days then delete access immediately after termination
        • Verify all company technology is accounted for (Accounting and inventory within your NCCCM is vital to maintain diligence on awareness of property and access to your network)
  2. Monitoring of team
    • Some may not be happy with a decision to terminate an employee and feel betrayed
    • Monitor their access and increase awareness to their actions
      • If you see them logging in to more routers and switches than ever before might setup a meeting...
      • See them going outside of their side and digging into things they should not, meeting time
      • Awareness is key and an approval process and change detection is key to preventing damage
  3. Security policies
    • You're only as good as the policy in place
      • Dig into your policies and make sure they are current and relevant
      • If you seriously have things like "If they call from desk phone reset password over the phone" type of security measures please REVISIT these.
        • Re-read that last statement
    • Make sure your team is signing acknowledgement of what they can and cannot do
      • Easier to prosecute when they have signed and agreed
    • Verify your security policies to your network devices
      • NCCCM compliance reporting setup for your needs is a great way to stay ahead of these items
      • You can find back doors on your network that people have setup to go around security policies this way. 


     I cannot obviously solve every issue, but at least help to point you into some good directions and processes.  If any of you want to jump in and add to this, please do I'm always interested in other people's methods of security.  The main point is to be aware of these situations, have a plan and recover when things like this happen.


Thank you,




Follow me on Twitter:



The Private cloud

Posted by arjantim Jul 30, 2016

In a private cloud model, the control of a secure and unique cloud environment to manage your resources is done by your IT department. The difference with public cloud is that the pool of resources is accessible only by you and therefore it makes management much easier and secure.


So, if you require a dedicated resource, based on performance, control, security, compliance or any other business aspect, the private cloud solution might just be the right solution for you.


More and more organisations are looking for the flexibility and scalability of cloud solutions. But many of these organisations struggle with business and regulatory requirements that keep them from being the right candidate for public or private cloud offerings, they think.


It can be that you work within a highly regulated environment that is not suitable for public cloud, and you don't have the internal resources to set up or administer suitable private cloud infrastructure. On the other hand, it might just be that you have specific industry requirements for performance that aren't yet available in the public cloud.


In those cases it could just be that the private cloud as an alternative to the use of public cloud, is a great opportunity. A private cloud enables the IT department, as well as the applications itself, to access IT resources as they are required, while the datacentre itself is running in the background. All services and resources used in a private cloud are defined in systems that are only accessible to the user and are secured towards external access. The private cloud offers many of the advantages of the public cloud but at the same time it minimises the risks. Opposed to many public clouds, the criteria for performance and availability in a private cloud can be customised, and compliance to these criteria can be monitored to ensure that they are achieved.


As a cloud or enterprise architect a couple of things are very important in the cloud era. You should know your application (stack) and the  way it behaves. By knowing what your application needs, you can determine which parts of the application could be placed where, so private or public. A good way to make sure you know your application is using the DART principle:


Discover          -           Show me what is going on

Alert                -           Tell me when it breaks or is going bad

Remediate      -           Fix the problem

Troubleshoot   -           Find the root cause



If  you run the right tools within your environement, it should be easy to discover what is going on in your environment and where certain bottlenecks are, and how your application is behaving and what the requirements for it are, the step to hybrid is much easier to make, but that is for another post, first I'll dive into public cloud a little further next time.

Hybrid IT is used to cover all manners of IT-ism especially those that span services an IT organization is delivering and services being delivered by someone outside of the IT organization. The technology constructs that are present in the current IT state, where services are continually delivered, integrated, and consumed on any device at any given time, are giving rise to hybrid IT adoption. The challenge for IT professionals is to unlock the potential of Hybrid IT without getting caught up in the churn and burn scenario of tech hype and tech debt. IT rigor and discipline must be part of the equation. And this is where monitoring as a discipline comes into play.


At BrightTALK’s Cloud and Virtualization Summit, I presented on monitoring as a discipline as the key to unlocking hybrid IT’s potential. The recording is available to view on BrightTALK’s website and it's hyperlinked below.




Let me know what you think of it in the comment section below.


(Zen Stones by Undeadstawa on DeviantArt)


Over the years, I've observed that despite running multiple element and performance management systems, most organizations still don't truly understand their IT infrastructure. In this post I'll examine how it's possible to have so much information on hand yet still have a large blind spot.




What does discovery mean to you? For most of us I'm guessing that it involves ICMP pings, SNMP community strings, WMI, login credentials and perhaps more in an attempt to find all the manageable devices that make up our infrastructure: servers, hypervisors, storage devices, switches, routers and so forth. We spin up network management software, perhaps a storage manager, virtualization management, performance management, and finally we can sleep safely knowing that we have full visibility and alerting for our compute, storage and networking infrastructure.


At this point I'd argue that the infrastructure discovery is actually only about 50% complete. Why? Because the information gathered so far provides little or no data that can be used to generate a correlation between the elements. By way of an analogy you could say that at this point all of the trees have been identified, labeled and documented, but we've yet to realize that we're standing in the middle of a forest. To explain better, let's look at an example.


Geographical Correlation

Imagine you have a remote site at which we are monitoring servers, storage, printers and network equipment. The site is connected back to the corporate network using a single WAN link, and—horrifyingly—that link is about to die. What do the monitoring systems tell us?


  • Network Management: I lost touch with the edge router and six switches.
  • Storage Management: I lost touch with the storage array.
  • Virtualization Management: I lost touch with these 15 VMs.
  • Performance Management: These elements (big list) are unresponsive.


Who monitors those systems? Do the alerts all appear in the same place, to be viewed by the same person? If not, that's the first issue, as spotting the (perhaps obvious) relationship between these events requires a meat-bag (human) to realize that if storage, compute and network all suddenly go down, there's likely a common cause. If this set of alerts went in different directions, in all likelihood the virtualization team, for example, might not be sure whether their hypervisor went down, a switch died, or something else, and they may waste time investigating all those options in an attempt to access their systems.

Centralize your alert feeds

Suppressing Alerts

If all the alerts are coming into a single place, the next problem is that in all likelihood the router failure event led to the generation of a lot of alerts at the same time. Looking at it holistically, it's pretty obvious that the real alert should be the loss of a WAN link; everything else is a consequence of losing the site's only link to the corporate network. Personally in that situation, I'd ideally like the alert to look like this:


2016/07/28 01:02:03.123 CRITICAL: WAN Node <a.b.c.d> is down. Other affected downstream elements include (list of everything else).


This isn't a new idea by any means; alert suppression based on site association is something that we should all strive to achieve, yet so many of us fail to do so. One of the biggest challenges with alert monitoring is being overwhelmed by a large number of messages, and the signal to noise ratio makes it impossible to see the important information. This is a topic I will come back to, but let's assume it's a necessary evil.

Suppress unnecessary alert noise

Always On The Move

In addition to receiving several hundred alerts from the devices impacted by the WAN failure, now it seems the application team is troubleshooting an issue with the e-commerce servers. The servers themselves seem fine, but the user-facing web site is generating an error when trying to populate shipping costs during the checkout process. For some reason the call to the server calculating shipping costs isn't able to connect, which is odd because it's based in the same datacenter as the web servers.


The security team is called in and begins running a trace on the firewall, only to confirm that the firewall is correctly permitting a session from the e-commerce server to an internal address on port tcp/5432 (postgres).


The network team is called in to find out why the TCP session to shipsrv01.ecomm.myco.corp is not establishing through the firewall, and they confirm that the server doesn't seem to respond to ping. Twenty minutes later, somebody finally notices that the IP returned for shipsrv01.ecomm.myco.corp is not in the local datacenter. Another five minutes later, the new IP is identified as being in the site that just went down; it looks like somebody had moved the VM to a hypervisor in the remote site, presumably by mistake, when trying to balance resources across the servers in the data center. Nobody realized that the e-commerce site had a dependency on a shipping service that was now located in a remote site, so nobody associated the WAN outage with the e-commerce issue. Crazy. How was anybody supposed to have known that?


It seems that despite having all those management systems I'm still a way from having true knowledge of my infrastructure. When I post next, I'll look at some of the things I'd want to do in order to get a better and more holistic view of my network so that I can embrace the inner peace I desire so much.


The Actuator - July 27th

Posted by sqlrockstar Employee Jul 27, 2016

Just when you thought 2016 couldn't get crazier you wake up to find that Verizon has bought Yahoo and that you are more interested in reading about the drone that delivered a Slurpee. Welcome to my world.


Here are the items I found most amusing from around the Internet. Enjoy!


Verizon to Purchase Yahoo’s Core Business for $4.8 Billion

I'm shocked Yahoo is worth even that much. I'm also hoping that someone will give me $57 million to do nothing.


Canadian Football League Becomes First Pro Football Organization To Use Sideline Video During Games

As our technology advances at an ever increasing pace, and is applied in new situations, it is up to someone in IT to make it all work. It's all about the data folks, as data is the most valuable asset any company (or team) can own.


Nearly Half of All Corporate Data is Out of IT Department’s Control

Honesty, I think that number is much higher. 


GOP delegates suckered into connecting to insecure Wi-Fi hotspots

I am certain the GOP leaders were tech savvy enough not to fall for this trick, right?


Snowden Designs a Device to Warn if Your iPhone’s Radios Are Snitching

Showing what he's been doing with his free time while living in exile, Snowden reveals how our phones have been betraying us for years.


Status Report: 7 'Star Trek' Technologies Under Development

With the release of the new Star Trek movie last week I felt the need to share at least one Star Trek link. But don't get your hopes up for warp drive or transporters anytime soon.


I wanna go fast: HTTPS' massive speed advantage

"If you wanna go fast, serve content over HTTPS using HTTP/2."


Watch The First Slurpee Delivery By Drone

Because who doesn't love a Slurpee in the summertime?


Meanwhile, in Redmond:

a - 1 (3).jpg

turkducken_fire_rz.jpgI need to deep fry a turbaconducken.


This isn't a want, no. This is a primal need of mine.


I feel so strongly about this that it's on my bucket list. It is positioned right below hiring two private investigators to follow each other, and right above building an igloo with the Inuit.


Deep frying a turkey is a dangerous task. You can burn your house down if you are not careful. Why take the risk? Because the end result, a crispy-juicy turkey bathed in hot oil for 45 minutes, is worth the effort. Or so I've been told. Like I said, it's on my bucket list.


Being the good data professional that I am I started planning out how to prepare for the day that I do, indeed, deep fry my own turkey. As I laid out my plans it struck me that there was a lot of similarity between both an exploding turkey and the typical "database is on fire" emergency many of us know all too well.


So here's my list for you to follow for any emergency, from exploding turkeys to databases catching fire and everything in between. You're welcome.


Don't Panic


People who panic are the same people who are not prepared. A little bit of planning and preparation go a long way to helping you avoid "panic mode" in any emergency situation. Whenever I see someone panicking (like ripping out all their network cables just because their mouse isn't working) it is a sure sign that they have little to no practical experience with the situation at hand.


Planning will help you from feeling the need to panic. If your database is on fire you can recover from backups, because you prepared for such a need. And if your turkey explodes you can always go to a restaurant for a meal.


Rely on all your practice and training (you have practiced this before, right)? Emergency response people train in close to real life situations, often. In fact, firefighters even pay people to burn down their spare barns.


Go to your do have a checklist, right? And a process to follow? If not you may find yourself in a pile of rubble, covered in glitter.


Assess the Situation


Since you aren't panicking you are able to calmly assess the situation. A turkey on fire inside your oven would require a different response than a turkey that explodes in a fireball on your deck and is currently burning the side of your house. Likewise, an issue with your database that affects all users will require a different set of troubleshooting steps than an issue affecting only some users or queries.


In order to do a proper assessment of the situation you will be actively gathering data. For database servers you are likely employing some type of monitoring and logging tools. For turkeys, it's likely a thermometer to make certain it has completely thawed before you drop it into the hot oil.


You also need to know your final goal. Perhaps your goal is to stop your house from being engulfed in flames. Perhaps your goal is to get the systems back up and running, even if it means you may have some data loss.


Not every situation is the same. That's why a proper assessment is necessary when dealing with emergencies...and you can't do that while in a panic.


Know Your Options


Your turkey just exploded after you dropped it into a deep fryer. Do you pour water on the fire quickly? Or do you use a fire extinguisher?


Likewise, if you are having an issue with a database server should you just start rebooting it in the hopes that it clears itself up?


After your initial assessment is done you should have a handful of viable options to explore at that point. You need to know the pros and cons for each of these options. That's where the initial planning comes handy, too. Proper planning will reduce panic, allow you to assess the situation, and then you can understand all your viable options along with the pros and cons. See how all this works together?


It may help for you to phone a friend here. Sometimes talking through things can help, especially when the other person has been practicing and helping all along.


Don't Make Things Worse


Pouring water on the grease fire on your deck is going to make the fire spread more quickly. And running 17 different DBCC commands isn't likely to make your database issue any better, either.


Don't be the person that makes things worse. If you are able to calmly assess the situation, and you know your options well, then you should be able to make an informed decision that doesn't make things worse. Also, don’t focus on blame. Now isn't the time to worry about blame. That will come later. If you focus on fault, you aren’t working on putting out the fire right now. You might as well grab a stick and some marshmallows for making s’mores while your house burns to the ground.


Also, a common mistake here is done by people who try to do many things at once, specifically for database issues. If you make multiple changes then you may never know what worked, or the changes you make may cancel each other out leaving you still with a system offline. Know the order of the actions you want to take and do them one at a time.


And it wouldn't hurt you to take a backup now, before you start making changes, if you can.


Learn From Your Mistakes


Everyone makes mistakes, I don't care what their marketing department may tell you. Making mistakes isn't as big of a concern as not learning from your mistakes. If you burned your house down the past two Thanksgivings, don't expect a lot of people showing up for dinner this year.


Document what you’ve done, even if it is just a voice recording.  You might not remember all the details afterwards, so take time to document events while they are still fresh in your memory.


Review the events with others and gather feedback along the way as to how things could have been better or avoided. Be open to criticism, too. There's a chance the blame could be yours. If that's the case, accept that you are human and lay out a training plan that will help you to avoid making the same mistake in the future.


I'm thankful that my database server isn't on fire. But if it was, I know I'd be prepared.



Many agencies are already practicing excellent cyber hygiene; others are still in implementation phases. Regardless of where you are in the process, it is critical to understand that security is not a one-product solution. Having a solid security posture requires a broad range of products, processes and procedures.


Networks, for example, are a critical piece of the security picture; agencies must identify and react to vulnerabilities and threats in real time. You can implement automated, proactive security strategies that will increase network stability and have a profound impact on the efficiency and effectiveness of the overall security of the agency.


How can agencies leverage their networks to enhance security? Below are several practices you can begin to implement today, as well as some areas of caution.


Standardization. Standardizing network infrastructure is an often-overlooked method of enhancing network performance and security.


Start by reviewing all network devices and ensure consistency across the board. Next, make sure you’ve got multiple, well-defined networks. Greater segmentation will provide two benefits: greater security, as access will not necessarily be granted across each unique segment, and greater ability to standardize, as segments can mimic one another to provide enhanced control.


Change management. Good change management practices go a long way toward enhanced security. Specifically, software that requires a minimum of two unique approvals before changes can be implemented can prevent unauthorized changes. In addition, make sure you fully understand the effect changes will have across the infrastructure before granting approval.


Configuration database. It’s important to have a configuration database for backups, disaster recovery, etc. If you have a device failure, being able to recover quickly can be critical; implementing a software setup that can do this automatically can dramatically reduce security risks. Another security advantage of a configuration database is the ability to scan for security-policy compliance.


Compliance awareness. Compliance can be a complicated business. Consider using a tool that automates vulnerability scanning and FISMA/DISA STIG compliance assessments. Even better? A tool that also automatically sends alerts of new risks by tying into the NIST NVD, then checking that information against your own configuration database.


Areas of caution:

Most security holes are related to inattention to infrastructure. In other words, inaction can be a dangerous choice. Some examples are:


Old inventory. Older network devices inherently have outdated security. Invest in a solution that will inventory network devices and include end-of-life and end-of-support information. This also helps forecast costs for new devices before they quit or become a security liability.


Not patching. Patching and patch management is critical to security. Choose an automated patching tool to be sure you’re staying on top of this important task.


Unrestricted bring-your-own-device policies. Allow BYOD, but with restrictions. Separate the unsecure mobile devices on the network and closely monitor bandwidth usage so you can make changes on the fly as necessary.


There is no quick-and-easy solution, but tuning network security through best practices will not only enhance performance, but will also go a long way toward reducing risks and vulnerabilities.


Find the full article on Government Computer News.

In my previous post, I listed some best practices for help desk IT pros to follow to save time resolving issues. The responses I received from that post made me realize that the best solution for one IT organization may not necessarily be the same for another. An organization’s size, business model, functional goals, organizational structure, etc. create unique challenges for those charged with running the help desk function, and these factors directly affect IT support priorities.


With this knowledge in mind, I decided to take a different approach for this post. Below, I have listed some of the easy ways that help desk organizations – irrespective of their differences – can improve their help desk operations through automation to create a chaos-free IT support environment.


  1. Switch to centralized help desk ticketing
    Receiving help desk requests from multiple channels (email, phone, chat, etc.), and manually transferring them onto a spreadsheet creates a dispersed and haphazard help desk environment. Switching to a centralized help desk ticketing system will help you step up your game and automate the inflow of incidents and service requests.
  2. Automate ticket assignment and routing
    Managing help desk operations manually can lead to needless delays in assigning tickets to the right technician, and potential redundancy if you happen to send the same requests to multiple technicians. To avoid this, use a ticketing system that helps you assign tickets to technicians automatically, based on their skill level, location, availability, etc.
  3. Integrate remote support with help desk
    With more people working remotely, traditional help desk technicians have to adapt and begin to resolve issues without face-to-face interactions. Even in office settings, IT pros tend to spend about 30% of their valuable time visiting desks to work on issues. By integrating a remote support tool into your help desk, you can resolve issues remotely, taking care of on- and off-site problems with ease.
  4. Resolve issues remotely without leaving your desk
    A recent survey by TechValidate states that 77% of surveyed help desk technicians feel that using remote support decreased their time-to-resolution of trouble tickets. Using the right remote support tool helps you easily troubleshoot performance issues and resolve complex IT glitches without even leaving your desk.


These are some of the simple yet powerful ways that organizations can create a user-friendly help desk. Are you managing your help desk the easy way or the hard way?


Four Easy Ways to Create a Chaos-free Help Desk.png


To download this infographic, click here. Share your thoughts on how you reduce workload and simplify help desk support in the comments section.

Filter Blog

By date:
By tag: