Skip navigation
1 2 3 4 Previous Next

Geek Speak

1,765 posts

Interop 2016 kicked off the week with two days of IT summits that covered an amazing range topics, including cloud, containers, and microservices, IT Leadership, and cybersecurity, plus hands-on hacking tutorials. The following three days included the Expo floor opening as well as the session tracks.


Since the IT Leadership Summit was sold out, I decided to join the Dark Reading Cyber Security Summit Day 1. I was only planning on attending Day 1, but the content was so good that I eschewed Container Summit and attended Dark Reading's Day 2. To kick things off, the editors at Dark Reading shared some interesting insights followed by industry thought leaders.


DevOps - SecOps Relational image via @petecheslock and his Austin DevOps Days 2015 presentation.


My top 10 takeaways from the Dark Reading Cybersecurity Summit Days are below.

  1. $71.1B was spent on cybersecurity last year.
  2. Security pros spend most of their time patching legacy stuff and fixing vulnerabilities versus addressing targeted, sophisticated attacks, which happens to be their primary security concern. Number two is phishing and social engineering attacks.
  3. Security is one of the most important priorities and one of the least resourced by IT organizations. Security pros make policy decisions, but non-security people make purchasing decisions.
  4. The weakest link is the end-user, who make up the surface area of vulnerability.
  5. There are not enough skilled security ops people. 500K to 2M more security pros are needed by 2020.
  6. The most talented security pros are hackers.
  7. The average time to detect an intrusion is 6-7 months.
  8. 92% of the intrusions, incidents, and attacks of the past 10 years fall into nine distinct patterns, which can be further reduced down to three.
  9. The cost of a breach is roughly $254 per record for breaches, including 100 records, while $0.09 per record for breaches involving 100M records. Note that the cost is a multi-variable function with many dimensions to factor in.
  10. Only 40% of attacks are malware, so stopping malware is not enough.


Attached below is my DART IT Skills Framework presentation from my Interop IT Leadership speaking session. One of the CIO's SLA is security, so the Cybersecurity Summit was timely.


Let me know what you think of the security insights, as well as my presentation below, in the comment section. I would be happy to present my DART session to our community if there is enough interest, so let me know and I will make it so.


The Actuator - May 11th

Posted by sqlrockstar Employee May 11, 2016

I'm back from Liverpool and SQLBits. It was a brilliant event, as always. If you were there I hope you came by to say hello.


Here's this week's Actuator, filled with things I find amusing from around the Internet...


What is ransomware and how can I protect myself?

You recover from backups. If you don't have backups then you are hosed.


Ivy League economist ethnically profiled, interrogated for doing math on American Airlines flight

To be fair, he is a member of the al-Gebra movement, and was carrying weapons of math instruction.


The Year That Music Died

Wonderful interactive display of the top five songs every day since 1958. Imagine if you had this kind of interaction with your monitoring data, with some machine learning on top.


Apple Stole My Music. No, Seriously.

Since we are talking about music, here's yet another reason why reading the fine print is important.


Apple's Revenue Declines For The First Time In 13 Years

I am certain it has *nothing* to do with the issues inherent in their software and services like Apple Music. None.


The Formula One Approach to Security

This article marks the first time I have seen the phrase "security intelligence" and now I'm thinking it will be one of the next big buzzwords. Still a great read and intro to NetFlow for those that haven't heard about that yet.


Study: Containers Are Great, but Skilled Admins Are Scarce

I wonder how long they spent studying this. I believe it's always been the case that skilled admins are scarce, which is why we have so many accidental admins in the world. There's more tech work available than tech people available.


My secret to avoiding jet lag for events revealed:

NDQE7187 copy.jpg

In the world of networking, you would be hard pressed to find a more pervasive and polarizing topic than that of SDN. The concept of controller-based, policy-driven, and application-focused networks has owned the headlines for several years as network vendors have attempted to create solutions that allow everyone to operate with the optimization and automation as the large Web-scale companies do. The hype started in and around data center networks, but over the past year or so, the focus has sharply shifted to the WAN, for good reason.


In this three-part series we are going to take a look at the challenges of current WAN technologies, what SD-WAN brings to the table, and what some drawbacks may be in pursuing an SD-WAN strategy for your network.


Where Are We Now?


In the first iteration of this series, we’re going to identify and discuss some of the limitations in and around WAN technology in today’s networks. The lists below are certainly not comprehensive, but speak to the general issues faced by network engineers when deploying, maintaining, and troubleshooting enterprise WANs.


Perspective – The core challenge in creating a policy-driven network is perspective. For the most part, routers in today's networks make decisions independent of the state of peer devices. While there certainly are protocols that share network state information (routing protocols being the primary example), actions based off of this exchanged information are exclusively determined through the lens of the router's localized perspective of the environment.


This can cause non-trivial challenges in the coordination of desired traffic behavior, especially for patterns that may not follow the default/standard behavior that a protocol may choose for you. Getting every router to make uniform decisions, each utilizing a different perspective, can be a difficult challenge and add significant complexity depending on the policy trying to be enforced.


Additionally, not every protocol shares every piece of information, so it is entirely possible that one router is making decisions off of considerably different information than what other routers may be using.


Application Awareness - Routing in current generation network is remarkably simple. A router considers whether or not it is aware of the destination prefix, and if so, forwards the packet on to the next hop along the path. Information outside of the destination IP address is not considered when determining path selection.  Deeper inspection of the packet payload is possible on most modern routers, but that information does not play into route selection decisions. Due to this limitation in how we identify forwarding paths, it is incredibly difficult to differentiate routing policy based off of the application traffic being forwarded.


Error Detection/Failover – Error detection and failover in current generation routing protocols is a fairly binary process. Routers exchange information with their neighbors, and if they don’t hear from them in some sort of pre-determined time window, they tear down the neighbor relationship and remove the information learned from that peer. Only at that point will a router choose to take what it considers to be an inferior path. This solution works well for black-out style conditions, but what happens when there is packet loss or significant jitter on the link? The answer is that current routing protocols do not take these conditions into consideration when choosing an optimal path. It is entirely possible for a link to have 10% packet loss, which significantly impact voice calls, and have the router plug along like everything is okay since it never loses connection with its neighbor long enough to tear down the connection and choose an alternate path. Meanwhile, a perfectly suitable alternative may be sitting idle, providing no value to the organization.


Load Balancing/Efficiency - Also inherent in the way routing protocols choose links is the fact that all protocols are looking to identify the single best path (or paths, if they are equal cost) and make it active, leaving all other paths passive until the active link(s) fail. EIGRP could be considered an exception to this rule as it allows for unequal cost load balancing, but even that is less than ideal since it won’t detect brown-out conditions on a primary link and move all traffic to the secondary. This means that organizations have to purchase far more bandwidth than necessary to ensure each link, passive or active, has the ability to support all traffic at any point. Since routing protocols do not have the ability to load balance based off of application characteristics, load balancing and failover is an all or nothing proposition.


As stated previously, the above list is just a quick glance at some of the challenges faced in designing and managing the WAN in today’s enterprise network.  In the second part of this series we are going to take a look at what SD-WAN does that helps remediate many of the above challenges.  Also keep your eyes peeled for Part 3, which will close out the series by identifying some potential challenges surrounding SD-WAN solutions, and some final thoughts on how you might take your next step to improving your enterprise’s WAN.

Did the title of this blog entry scare you and make you think, "Why in the world would I do that?"  If so, then there is no need to read further.  The point of this blog post is not to tell you why you should be doing so, only why some have chosen to do so, and what issues they find themselves dealing with after having done so. If you still think that the idea of moving any of your data center to the cloud is simply ludicrous, you may go back to your regularly scheduled programming.


If the demand for on your company's IT resources is consistent throughout the week and year, then the biggest reason for moving to the cloud really doesn't apply to you.  Consider how Amazon Web Services (AWS) got built. They discovered that most of the demand on their company's IT resources came from a few days of the year: Black Friday, Mother's Day, Christmas, etc. The rest of the year, the bulk of their IT resources were going unused. They asked themselves whether there might be other people who had the need for their IT resources when they weren't using them, and AWS was born. It has, of course, grown well beyond the simple desire to sell excess capacity into one of their most profitable business lines.

If your company's IT systems have a demand curve like that, then the public cloud might be for you. Why pay for servers to sit there for an entire year when you can rent them when demand is high and give them back when demand is low?  In fact, some companies even rent extra computing capacity by the hour when the demand is high. Imagine being able to scale the capabilities of your data center within minutes in order to meet the increased demand created by a Slashdot article or a viral video. This is the reason to go to the cloud. Then, once the demand goes down, simply give that capacity back.


The challenge for IT people looking to replace portions of their data center with the public cloud is automating it, and making sure that what they automate fits within the budget.  While a public cloud vendor can typically scale to whatever demand level you find yourself with, the bill will automatically scale as well. Unless the huge spike in demand is directly related to a huge spike in sales, your CFO might not take kindly to an enormous bill when your video goes viral. Make sure you plan for that ahead of time so you don't end up having to pay a huge and unexpected cost. Perhaps the decision will be made to just let things get slow for a while. After all, that ends up in the news, too. And if you believe all publicity is good publicity, then maybe it wouldn't be such a bad thing.


There are plenty of companies that have replaced all their data centers with the cloud. Netflix is perhaps the most famous company that runs their entire infrastructure in AWS.  But they argue that the constant changes in demand for their videos make them a perfect match for such a setup. Make sure the way your customers use your services is consistent with the way the public cloud works, and make sure that your CFO is ready for the bill if and when it happens. That's how to move things into the cloud.

As an avid cloud user, I'm always amused by people who suggest that moving things to the cloud means you don't have to manage them.  And, of course, when I say "amused," what I really mean is I feel lnigo Montoya in Princess Bride.  "You keep using that word.  I do not think it means what you think it means."


Why do I say this?  Because I am an avid cloud user and I manage my cloud assets all the time.  So where do we get this idea?  I'd say it starts with the idea that you don't have to manage the hardware.  Push a few buttons and a "server" magically appears in your web browser.  This is so much easier than creating a real server, which actually works similarly these days.  Push a few buttons on the right web site, and an actual server shows up at your front door in a few days.  All you have to do is plug it in, load the appropriate OS and application stack and you're ready to go.  The cloud VM is a little bit easier.  It appears in minutes and comes preloaded with the OS and application stack that you specified during the build process.


I think what most people think when they say their cloud resources don't need to be managed is that they don't have to worry about the hardware.  They know that the VM is running on highly resilient hardware that is being managed for them.  They don't have to worry about a failed disk drive, network controller, PCI card, etc.  It just manages itself. But anyone who thinks this is all that needs to be managed for a server must never have actually managed any servers.


There are all sorts of things that must be managed on a server that have nothing to do with hardware.  What about the filesystems?  When you create the VM, you create it with a volume of a certain size.  You need to make sure that volume doesn't fill up and take your server down with it.  You need to monitor the things that would fill it up for no reason, such as web logs, error logs, database transaction logs, etc.  These need to be monitored and managed.  Speaking of logs, what about those error logs?  Is anyone looking at them? Are they scanning them for errors that need to be addressed?  Somebody should be, of course.


Another thing that can fill up a filesystem is an excessive number of snaphshots.  They need to be managed as well.  Older snapshots need to be deleted and certain snapshots may need to kept for longer periods of time or archived off to different medium. Snapshots do not manage themselves.


What about my favorite topic of backups?  Is that VM getting backed up?  Does it need to be?  If you configured it to be backed up, is it backing up?  Is anyone looking at those error logs?  One of the biggest challenges is figuring out when a backup didn't run. It's relatively easy to figure out when a backup ran but failed; however, if someone configured the backup to not run at all, there's no log of that.  Is someone looking for backups that just magically disappeared?

Suffice it to say that the cloud doesn't remove the need for management.  It just moves it to a different place.  Some of these things may be able to be offloaded to the cloud vendor, of course.  But even if that's the case someone needs to watch the watcher.  There is no such thing as free lunch and there is no such thing as a server that manages itself.

Network variation is hurting us

Network devices like switches, routers, firewalls and load-balancers ship with many powerful features. These features can be configured by each engineer to fit the unique needs of every network. This flexibility is extremely useful and, in many ways, it's what makes networking cool. But there comes a point at which this flexibility starts to backfire and become a source of pain for network engineers.

Variation creeps up on you.  It can start with harmless requests for some non-standard connectivity, but I've seen those requests grow to the point where servers were plugging straight into the network core routers.  In time, these one-off solutions start to accumulate and you can lose sight of what the network ‘should’ look like.  Every part of the network becomes its own special snowflake.

I’m not judging here. I've managed quite a few networks and all of them end up with high-degrees of variation and technical debt. In fact, it takes considerable effort to fight the storm of snowflakes. But if you want a stable and useful network you need to drive out variation. Of course you still need to meet the demands of the business, but only up to a point. If you're too flexible you will end up hurting your business by creating a brittle network which cannot handle changes.

Your network becomes easier and faster to deploy, monitor, map, audit, understand and fix if you limit your network to a subset of standard components. Of course there are great monitoring tools to help you manage messy networks, but you’ll get greater value from your tools when you point them towards a simple structured network.

What’s so bad about variety?

Before we can start simplifying our networks we have to see the value in driving out that variability. Here are some thoughts on how highly variable (or heterogeneous) networks can make our lives harder as network engineers:

  • Change control - Making safe network change is extremely difficult without standard topologies or configurations. Making a change safely requires a deep understanding of the current traffic flows - and this will take a lot of time. Documentation makes this easier, but a simple standardized topology is best. The most frustrating thing is that when you do eventually cause an outage, the lessons learned from your failed change cannot be applied to other dissimilar parts of your network.
  • Discovery time can be high. How do you learn the topology of your network in advance of problems occurring? A topology mapping tool can be really helpful to reduce the pain here, but most people have just an outdated visio diagram to rely on.
  • Operations can be a nightmare in snowflake networks.  Every problem will be a new one, but probably one that could have been avoided - it's likely that you'll go slowly mad. Often you'll start troubleshooting a problem and then realize, ‘oh yeah, I caused this outage with the shortcut I took last week. Oops’.  By the way, it’s a really good sign when you start to see the same problems repeatedly. Operations should be boring, It means you can re-orient your Ops time towards 80/20 analysis of issues, rather that spending your days firefighting.
  • Stagnation -  You won't be able to improve your network until you simplify and standardize your network. Runbooks are fantastic tools for your Ops and Deployment teams, but the runbook will be useless if the steps are different for every switch in your network. Think about documenting a simple task...if network Y do step1, except if feature Z enabled then do something else, except if it’s raining or if it's a leap year.  You get the message.
  • No-Automation - If your process it too complicated to capture in a runbook you shouldn't automate it. Simplify your network, then your process, then automate.



Network variation can be a real source of pain for us engineers. In this post we looked at the pain it causes and why we need to simplify and standardize our networks. In Part 2 we'll look at the root causes for these complicated, heterogenous networks and how we can begin tackling the problem.

Data center consolidations have been a priority for years, with the objectives of combatting server sprawl, centralizing and standardizing storage, and streamlining application management and establishing shared services across multiple agencies.


But, consolidation has created challenges for federal IT professionals, including:

  • Managing the consolidation without an increase in IT staff
  • Adapting to new best practices like shared services and cloud computing
  • Shifting focus to optimizing IT through more efficient computing platforms


Whether agencies have finished their consolidation or not, federal IT pros have definitely felt the impact of the change. But how do the remaining administrators manage the growing infrastructure and issues while meeting SLAs?


One way data center administrators can stay on top of all the change is to modernize their monitoring system, with the objective of improved visibility, and troubleshooting.


The Value of Implementing Holistic Monitoring


A holistic approach to monitoring provides visibility into how each individual component is running and impacting the environment as a whole. It can bridge the gap that exists between the IT team and the program groups through connected visibility.



Who is responsible for what? Shared services can be hard to navigate.


Even though the data center team now owns the infrastructure and application operations, the application owners still need to ensure application performance. Both teams require visibility into performance with a single point of truth, which streamlines communication and eases the transition to shared services.


Application Performance

Application performance is critical to executing agency missions, so when users provide feedback that an application is slow, it is up to data center administrators to find the problem and fix it—or escalate it—quickly.


Individually checking each component of the IT infrastructure—the application, servers, storage, database or a virtualized environment—can be tedious, time consuming and difficult. End-to-end visibility into how each component is performing, allows for quick identification and remediation of the issues.



Virtualization can introduce complexities and management challenges. In a virtual environment, virtual machines can be cloned and moved around so easily and often that the impact on the entire environment can be missed, especially in a dynamically changing infrastructure.


Consolidated monitoring and comprehensive awareness of the end-to-end virtual environment is the answer to effective change management in the virtualized environment.



Efficiency was a key driver behind consolidations, but this can seem near impossible for the remaining data centers. But with integrated monitoring that provides end-to-end visibility, data center administrators can troubleshoot issues in seconds instead of hours or days and proactively manage their IT. With the right tools, administrators can provide end-users with high service levels.


Consolidation is part of the new reality for data center administrators. Holistic, integrated monitoring and management of the dynamically changing IT environment will help to refine the new responsibilities of being a shared service, ensure mission-critical applications are optimized and improve visibility into virtualized environments.


Find the full article on Signal.

Practitioners in nearly every technology field are facing revolutionary changes in the way systems and networks are built. Change, by itself, really isn't all that interesting. Those among us who have been doing this a while will recognize that technological change is one of the few reliable constants. What is interesting, however, is how things are changing.


Architects, engineers, and the vendors that produce gear for them have simply fallen in love with the concept of abstraction. The abstraction flood gates have metaphorically flown open following the meteoric rise of the virtual machine in enterprise networks. As an industry, we have watched the abstraction of the operating system -- from the hardware it lives on -- give us an amazing amount of flexibility in the way we deploy and manage our systems.  Now that the industry has fully embraced the concept of abstraction, we aim to implement it everywhere.


Breaking away from monolithic stack architecture


If we take a look at systems specifically, it used to be that the hardware, the operating system, and the application all existed as one logical entity.  If it was a large application, we might have components of the application split out across multiple hardware/OS combos, but generally speaking the stack was a unit. That single unit was something we could easily recognize and monitor as a whole. SNMP, while it has its limitations, has done a decent job of allowing operators to query the state of everything in that single stack.


Virtualization changed the game a bit as we decoupled the OS/Application from the hardware. While it may not have been the most efficient way of doing it, we could still monitor the VM like we used to when it was coupled with the hardware.  This is because we hadn't really changed the architecture.  Abstraction gave us some significant flexibility but our applications still relied on the same components, arranged in a similar pattern to the bare-metal stacks we started with.  The difference is that we now had two unique units where information collection was required, the hardware remained as it always had and the OS/Application became a secondary monitoring target.  It took a little more configuration but it didn't change the nature of the way we monitored the systems.


Cloud architecture changes everything


Then came the concept of cloud infrastructure. With it, developers began embracing the elastic nature of the cloud and started building their products to take advantage of it. Rather than sizing an application stack based off of guesstimates of the anticipated peak load, it can now be sized minimally and scaled out horizontally when needed by adding additional instances. Previously, just a handful of systems would have handled peak loads. Now those numbers could be dozens, or even hundreds of dynamically built systems scaled out based on demand. As the industry moves in this direction, our traditional means of monitoring simply do not provide enough information to let us know if our application is performing as expected.


The networking story is similar in a lot of ways. While networking has generally been resistant to change over the past couple of decades, the need for dynamic/elastic infrastructure is forcing networks to take several evolutionary steps rather quickly.  In order to support the cloud models that application developers have embraced, the networks of tomorrow will be built with application awareness, self-programmability, and moment-in-time best path selection as core components.


Much like in the systems world, abstraction is one of the primary keys to achieving this flexibility. Whether the new model of networks is built upon new protocols, or overlays of existing infrastructure, the traditional way of statically configuring networks is coming to an end. Rather than having statically assigned primary, secondary, and tertiary paths, networks will balance traffic based off of business policy, link performance, and application awareness. Fault awareness will be built in, and traffic flows will be dynamically routed around trouble points in the network. Knowing the status of the actual links themselves will become less important, much like physical hardware that applications use. Understanding network performance will require understanding the actual performance of the packet flows that are utilizing the infrastructure.


At the heart of the matter, the end goal appears to be ephemeral state of both network path selection as well as systems architecture.


So how does this change monitoring?


Abstraction inherently makes application and network performance harder to analyze. In the past, we could monitor hardware state, network link performance, CPU, memory, disk latency, logs, etc. and come up with a fairly accurate picture of what was going on with the applications using those resources. Distributed architectures negate the correlation between a single piece of underlying infrastructure and the applications that use it.  Instead, synthetic application transactions and real-time performance data will need to be used to determine what application performance really looks like. Telemetry is a necessary component for monitoring next generation system and network architectures.


Does this mean that SNMP is going away?


While many practitioners wouldn't exactly shed a tear if they never needed to touch SNMP again, the answer is no. We still will have a need to monitor the underlying infrastructure even though it no longer gives us the holistic view that it once did. The widespread use of SNMP as the mechanism for monitoring infrastructure means it will remain a component of monitoring strategies for some time to come. Next generation monitoring systems will need to integrate the traditional SNMP methodologies with deeper levels of real-time application testing and awareness to ensure operators can remain aware of the environments they are responsible for managing.

“With me, everything turns into mathematics.”

– Rene Descartes



Ransomware is not new. Beginning as misleading ads, and warnings that your computer is infected, Symantec traces ransomware deployments (including crypto lockers) back to 2005.[1] Early crypto locking extortion scams were not that successful. However, current business owners face increasing risk of cyber extortion, and crypto locking ransomware has been on the rise over the past two years. It has become so prevalent that the FBI issued a warning highlighting the increasing threat to businesses.[2]  Given the increasing velocity of deployment, the ease of infiltration, and the dire consequences of infection, we believe ransonware is a significant risk to businesses.


There are two primary factors contributing to the rise of ransomware:


  1. More real-time business data has been digitized, especially in health care and loan processing, which has increased the available pool of targets.
  2. Anonymous payment systems make monetizing ransomware easy, efficient, and risk-free for cyber criminals.


Observed samples of ransomware in 2014 totaled almost 9 million, yet in Q2 2015 alone, samples hit 4 million. This run rate is doubling year over year. Ransomware, unlike many vulnerabilities and malware, does not require administrative privileges, as its purpose is to encrypt the files useful to the end-user. Furthermore, the same types of scams and hooks that make ransomware successful on Windows are being deployed against other platform targets. 

What systems are at risk?

Cyber criminals have built ransomware kits that target a wide range of systems, including Windows, Linux, Android, and recently (March 2016) Mac OS. While the majority of ransomware successes are still on Windows, users should be alert to the increasing risk of ransomware on Android, which is on the rise.  Android ransomware could become particularly troubling in dedicated devices used in health care, manufacturing, and retail.

How does ransomware behave?

On Windows, ransomware works to impair your computer in one of three common ways:


  1. Encrypt your files (Locky and Cerber).
  2. Prevent you from accessing in certain apps (FakeBsod – locks browser).[3]
  3. Restrict access to the operating system itself (Revton – locks PC).


On Android, ransomware falls generally into one of two types:


  1. 1. Screen locking.
  2. 2. File encrypting.


Unfortunately for Android users, both forms of ransomware are increasingly seen in the wild. The chronology of Android ransomware follows a similar pattern to the Windows chronology; it begins with a fake antivirus, then fake police demands, followed by full cryptographic file locking. Versions of Simplocker malware on Android encrypt the SIM card; versions of Lockerpin acquire administrative privileges and prevent access to the device.[4]


On Linux, the most common target is web servers. The ransomware Linux.Encoder.1 has been reported in the wild since November 2015. This variant does require root privileges, and it walks the web server file directory structure as well as nginx, /root and others.[5]  The reported ransom for this variant is one bitcoin.


Fortunately for Mac OS users, the first reported ransomware that encrypts Mac OS files has not been widely deployed or successful. With only 6500 downloads identified, Mac OS ransomware is a drop in the proverbial bucket.

What organizations are likely targets?

As mentioned above, real-time access needs for critical data create the easiest targets for ransomware. While no individual or business is free from worry, public service (police stations) and health care (hospitals) have been successfully targeted in the last 12 months. We can infer that other businesses, such as title companies, car dealerships, and other loan processors are likely targets as well. The criticality of data in these organizations is intuitive, and most cyber criminals keep the ransom amount “reasonable” (around $10,000). This amount is low enough that it appears to be economically rational for businesses that need to restore access quickly. Additionally, setting up a bitcoin wallet is relatively straightforward, with a number of YouTube how-to videos readily accessible. For an individual system, or business with less real-time critical data, the price is usually a single bitcoin.  


What defensive steps can you take?

Prevention is, of course, the goal. However, between the ranges of infection vectors (SMS on Android, browser exploitation, spam malware, and exploit kits), and the volume of ransomware samples observed in the wild, the risk of initial infection of ransomware is difficult to eliminate. Therefore a combination of preventative tactics as well as planning for incident remediation is the best risk-mitigating course of action.


Preventative Actions


  1. Educate your users on the risk. Users who process a large number of inbound attachments and emails, such as accounts receivable processors, account managers, and marketing personnel, are particularly vulnerable.
  2. Maintain patches on desktop users’ systems, as well as critical data servers.  Desktop users are often updated in a haphazard manner, or not at all, which makes them vulnerable to exploitation.
  3. Reduce or eliminate automatic mapping of drives. Recommended by thwack community member Stephen Black, eliminating automatic drive mapping means the ransomware won’t be able to walk your network from one initial infected system.
  4. Monitor for infections to prevent contagion.  If you use LEM, there is a monitoring rule you can download and use.


Incident remediation

If you find yourself in the unfortunate situation where a system has become locked with ransomware, you have limited options. While some researchers have been successful reverse engineering ransomware, the ability to do so takes time and depends on vulnerabilities in the ransomware code itself. If you were lucky enough to be hit by one of these old variants, you can use the techniques the researchers have published.[6]  But, realistically, for most situations there are only two real options:


  1. Restore from backup.
  2. Pay the ransom.


If your business fits in the class of organizations currently being targeted, or shares characteristics with organizations being targeted, it would be prudent to actually test your ability to restore from your backup media, whether that is a cloud backup, local backup, or offsite backup. Businesses with Android users are encouraged to explore mobile device backup, or at least educate your users on their options.[7] Unfortunately, the only time the restore from backup process is usually tested or validated is during an audit, or test of a business continuity or disaster recovery plan, which may be too late.


Do you have a favorite way to use LEM to look for malware? 

When did you last test your business continuity plan? 

Know anyone who has successfully recovered files after a ransomware attack?

Share your stories so we can all benefit.

[1] Symantec, Internet Security Threat Report, 2016 pg. 58







I'm on my way to Liverpool for SQLBits. So if you are reading this and find yourself near Liverpool this week, head on over to SQLBits and say hello.


Things I find amusing from around the Internet...


Star Wars: A Bad Lip Reading

In case you haven't seen this yet, I figured this was a good way to celebrate May the Fourth. Also? I want a wooden snowman.


Nearly All of Your ATMs are Insecure

Not sure what I find more amusing here, the fact that 'ATM not secure' is seen as something new or the fact that it's a Russian firm cited in the report.


Automating Change With Help From Fibonacci

As a math geek and IT pro, this article is full of so much win that I want to place it inside a Golden Rectangle and place it on top of a Klein Bottle.


Scientists Have Figured Out How To Put Electronics Inside Your Body

"We've always dreamed of infusing our bodies with technology". Um, no, we haven't. And as if building robots wasn't enough, now we have this to help usher in the singularity. (Darth Vader)


The advent of the citizen developer

Otherwise known as "shadow IT", and not something new. End users will turn to whatever tools they can find in an effort to do their jobs better than the day before.


FBI Says It Won't Disclose How It Accessed Locked iPhone

Because they don't know what they did, kinda like how I can never explain to my mother what I did to her computer to get her email working again.


Digital Genies

How can we ensure safety for humans when the robots rise up? Fascinating post here about how AI could go horribly wrong if it *thinks* it has the right data, but doesn't.




Containers in The Real World

Posted by jdgreen May 3, 2016

All industry changing trends have an uncomfortable period where the benefit to adoption is understood but real world use is often exaggerated. The way the modern use of containers fundamentally changes the paradigm with which operations folks run their data centers means that the case for adoption needs to be extremely compelling before anyone will move forward.


Also, since change is hard, major industry-shifting trends come with lots of pushback from people who have built a career on the technology that is being changed, disrupted, or even displaced. In the case of containers, there exists a sizeable assembly of naysayers and not shockingly, they generally come from an Operations (and specifically virtualization) background.


To that end, I decided to dig deep into a handful of case studies and interview industry acquaintances about their experiences with containers in production. Making the case that containers can be handy for 2 developers on their laptops is easy; I was curious to find out what happens when companies adopt a container-based data center practice throughout the entire software lifecycle and at substantial scale. Here is what I found.

It’s Getting Better

One of the major challenges many people reported with containerization in the early stages with relation to products like Docker Engine and rkt was that at scale, it was very difficult to manage. Natively, these tools didn’t include any sort of single pane of glass management or higher level orchestration.


As the container paradigm has matured, tools like Docker Swarm, Kubernetes, and Cloud Foundry have helped adopters make sense of what’s happening across their entire environment and begin to more successfully automate and orchestrate the entire software development lifecycle.

Small Businesses Are Last, As Usual

As with other pivotal data center technologies like server virtualization, small businesses are sometimes least likely to see a valuable return by jumping on the bandwagon. Because of their small data center footprint, they don’t see the dramatic impact to the bottom line that enterprises do when making a change to the way their data center operates. While that’s obviously not always the case, my discussions with colleagues in the field and research into case studies seems to indicate that just like all the big shifts before it, full-steam-ahead containerization is primarily for the data centers of scale, at least for now.


One way this might change in the future is software distribution by manufacturers in a container format. While small businesses might not need to leverage containers to accelerate their software development practice, they may start getting forced into containerizion by the software manufacturers they deal with. Just like many, many ISVs today deliver their offering in an OVA format to be deploy into a virtualized environment, we may begin to see lots of containers delivered as the platform for running a particular software offering.

Containers are Here to Stay

As much as the naysayers and conservative IT veterans speculate about containers being mostly hype, the anecdotal evidence I’ve collected seems to indicate that many organization have indeed seen dramatic improvement in their operations, limited defects, and ultimately seen the impact to their bottom line.


I try to be very careful about buying in to hype, but it doesn’t look like containers are slowing down any time soon. The ecosystem that is developing around the paradigm is quite substantial, and as a part of the overall DevOps methodology trend, I see container-based technologies enabling the overall vision as much as any other sort of technology. It will be interesting to see how the data center landscape looks with regard to containers in 2020; will it be like the difference between virtualization in 2005 and 2015?

Today’s users demand access to easy-to-use applications even though the IT landscape has become a complex mishmash of end-user devices, connectivity methods, and siloed IT organizations, some of which contain further siloes for applications, databases and back-end storage.


These multiple tiers of complexity, combined with end users’ increasing dependency on accessible applications, creates significant difficulties for IT professionals across the globe, but especially in government agencies, with all their regulations and policies.


Figuring out how to maintain application performance in these complex environments has become a key objective for federal IT staff. Here are five methods for preserving a high-performance app stack:


1. Simplifying application stack management


A significant part of the effort lies in simplifying management of the application stack (app stack) itself, which includes the application, middleware and the extended infrastructure the application requires for performance. Think about the entire environment.


Rather than looking at networks, storage, servers and clients as distinct silos of individual responsibility, federal IT departments can reduce the complexity of the sometimes conflicting information they use to manage these silos. The simplification lies in the practice of monitoring all applications and the resources they use as a single application ecosystem, recognizing the relationships.


Working through the entire app stack lets federal IT pros understand where performance is degraded and improves troubleshooting.


2. Monitoring servers


Server monitoring is a significant part of managing the app stack. Servers are the engines that provide application services to the end user. And applications need sufficient CPU cycles, memory, storage I/O and network bandwidth to work effectively.


Monitoring current server conditions and analyzing historical usage trends is the key to ensuring problems are resolved rapidly or prevented.


3. Monitoring virtualization


Monitoring the virtualization infrastructure is key and Federal IT pros should monitor how and when VMs move from one host or cluster to another as well as the status of shared hosts, networks and storage resources, especially if they are over-subscribed.


Federal IT pros should prioritize how individual VMs on a host are working together, whether resource contention is occurring on a host or a cluster, and what applications are causing those conflicts. In addition, federal IT pros should keep tabs on network latency.


4. Monitoring user devices


Today’s users are running applications on all types of devices with a range of capabilities and connectivity options, all of which are significant factors in maintaining a healthy app ecosystem.


5. Bring it together with alerting


The last component is alerting, which notifies technicians when there is an issue with a component of the app stack prior to the first end-user noticing the problem.


The ability to set proactive performance baselines for devices and applications to signal when app stack issues arise helps both in day-to-day monitoring and future capacity planning.


In short, it’s critical for federal IT pros to be aware of, monitor and set up notifications across the app stack – from back end storage, through application services and processes to front-end users – and provide high performance from a holistic perspective.


Find the full article on Government Computer News.

At the recent AWS Summit in Australia, a case was presented that had most I.T. folks in shock. A business user had gone outside of the I.T. controls of his organization to test a business capability in the Cloud. The organization was Australia’s largest provider of electricity and LPG gas and this guy was on stage as a hero.


In the post session write-up, the media was quick to clarify that only dummy data was used and no customer data was at risk. The person who initiated this didn’t want to go through the long and tedious process of an I.T. proof of concept just to run some data analytics. His heart was in the right place, with a drive to improve their business, but I.T. was getting in the way. You can read an article about it here.


So why did the rest of us have a heart attack at this news? Well, not only was AWS not on the organization’s approved vendors list, access to the platform had actually been blocked from the corporate network. The workaround? Use the free Wi-Fi across the road.

I’m sure this isn’t the only example of the business going around the outside of I.T.


When you work so hard to keep the Enterprise (or even SMB) secured, stable and legally compliant, it’s frustrating to know that those efforts can be completely ignored with a corporate credit card (or even a free trial)! What’s the solution if you’ve even blocked the website from your network?

SaaS is the hardest Cloud capability to integrate into an existing environment. It can impact so much of your I.T. footprint, with a system that you have very little control over. Secure data integration, identity management, access management, data storage, terms of use, APIs … the list goes on. There’s no point running a proof of concept if you don’t have answers for the longer term operation, maintenance and security of a SaaS application. But if it’s not needed as a long-term capability (such is the beauty of SaaS), is it worth having ALL the answers before we allow a dummy data test? Or do we want to get the hopes up of the business users, only to tell them there’s no way it would work with live data because it doesn’t meet your compliance regulations? Is it a “chicken or the egg” type question?

The currently reality is it IS easy for the business to go ahead without I.T. backing, though I’d love to see the reactions from the Legal & Compliance teams. With dummy data available, the business CAN try some cool stuff without touching Production systems or real data, minimising some of the risk. Are we making it too hard for the business to innovate, or are we protecting themselves from themselves?

Do you have a way to support fast initiation of SaaS proof of concept initiatives?  Does the risk just make it too hard? Is someone else in your organization holding up the NO card when it comes to Cloud (and SaaS in particular)?  Let me know what you think.




P.S. I'll be at Interop in Las Vegas this week from May 4-6. where I'll get to meet some SolarWinds Head Geeks in person! It's a long flight from Brisbane Australia, so come and find me and say Hi if you are attending.


In Logging We Trust

Posted by SomeClown May 3, 2016

Happy - Bob Ross Meme.jpg

Everyone in IT loves to log. We love to log our servers, our networks, our security devices, and our security. We log all the things. Sometimes we even look at those logs, but mostly we dump them to a tool which paints a nice, happy little dashboard… just right there, and then we forget about it until we get those pesky notices that something has gone wrong.


The challenges here are myriad, however, and not always easy to address because they require political as well as technical fixes. IT personnel are generally great with technology, but not so much with the politicking. Kissing hands and shaking babies is apparently the wrong approach to take, and so when rebuffed by the suits, or sometimes the Bobs, we retreat to our happy little world of dashboards and log data.


One major challenge to logging is mentally getting beyond logging. We don’t need logging for loggings sake. What we actually need is correlation. What do I mean? I mean that all of the bits of information we collect from all of our disparate systems sit idly by, locked in their own little bubbles, sending occasional notices that send us squirreling off to solve a problem. None of the information we collect is analyzed collectively, it’s not correlated at all, and so we miss patterns.


Think about it. We collect from all of our systems in a structured way. We also collect vast amounts of machine data from systems as diverse as badge readers, BLE beacons, tweets, failed domain lookups, etc., but we don’t do anything with it as a whole. What we need to do is to start looking at our data instead of random datum. If we normalize as much of that data as possible, make intelligent connections between it all, and use intelligent analysis, we can start to make sense of the random noise on the wire and stop chasing squirrels.


The other problem we face is the major impediment to what I’ve just described: silos. Let’s face it, within the IT industry as a whole we segment ourselves off by specialty. Security, networking, systems, applications, storage, voice, wireless, and probably even more sub categories are all common areas of expertise, and those areas are very frequently operated as different departments within the larger IT organization, either de facto, or de jure. And as often as not those departments don’t work together, don’t always like each other, and sometimes even work against one another.


So, each segment of the IT organization is gathering data using different tools and methodologies, and with a varying amount of fidelity to what the data is telling them. Data correlation in the big picture is mostly worthless if it’s not done in a deliberate way across the entire organization. To get a detailed picture of your organization you need everything collected, not just the bits from groups who get along. Without that, you won’t realize the benefits of big data analytics, what I’ve been calling correlation, in any meaningful way. You won’t be able to connect the proverbial dots to a place from which valid, useful conclusions may be drawn. And without that, we might as well go back to our insular worlds, and work on our squirrel chasing.


Save the date(s) September 14th & 15th for THWACKcamp, our annual virtual conference. This year’s going to be bigger and better than ever before! EMEA THWACKcamp will live-stream in local time on September 15th!


For an overview of what happened last year, and to access the on-demand sessions, head on over to: THWACKcamp 2015


What are you looking forward to most with this year’s THWACKcamp? Tell us below, or join the conversation on social using #THWACKcamp.

Filter Blog

By date:
By tag: