Skip navigation
1 15 16 17 18 19 Previous Next

Geek Speak

1,993 posts

The increasing rate of change in applications and its amplitude footprint are causing a lot of consternation within IT organizations. It’s no coincidence, either, since everything revolves around the application, which is innovation personified. It’s the revenue-generating, value-added differentiation, and it's potentially an industry game changer. Think Uber, Facebook, Netflix, Airbnb, Amazon, and Alibaba.

 

Accordingly, the rate and scale of change are products in the application lifecycle. For instance, applications deployed in a virtualization stack will live for months or years, while applications deployed in a cloud stack will live for hours or weeks. Applications deployed in containers or with microservices will live for microseconds or milliseconds.

AppLifeCycle.png

From my Interop 2016 DART Framework presentation.

 

For IT professionals, it’s good to know where job security is. As such, I’ve been keeping monthly tabs of the number of jobs with the key words virtualization, cloud, or (containers AND microservices), on dice.com. In the past year, since June 2015, the number of jobs with the key word "virtualization" has remained flat with around 2600 job openings. In that same time frame, the number of cloud jobs has increased by over 30% to 8900 job openings, while the number of container/microservices jobs has more than doubled, reflecting almost 600 job openings.

 

These trends re-affirm the hybrid IT paradigm and the need to deal efficiently and effectively with change in their application ecosystem. Let me know what you think in the comment section below.

The vast majority of my customers are highly virtualized, and quite potentially using Amazon or Azure in a shadow IT kind of approach. Some groups within the organization have deployed workloads into these large public provider spaces. It’s simply due to these groups having the need to gain access to resources and deploy them as rapidly as possible.

 

Certainly Development and Testing groups have been building systems, and destroying them as testing moves forward toward production. But also, marketing, and other groups may find that the IT team is less than agile in providing these services on a timely basis. Thus, a credit card is swiped, and development occurs. The first indication that these things are taking place is when the bills come.

 

Often, the best solution is a shared environment in which certain workloads deployed into AWS, Azure or even Softlayer, into peer data centers for a shared, but less public workload provide ideal circumstances for the organization.

 

Certainly these services are quite valuable to organizations. But, is it secure, or does it potentially expose the company to vulnerabilities of data and/or potentially an entrée into the corporate network? Are there compliance issues? How about the costs? If your organization could provide these services in a way that would satisfy the user community, would that be a more efficient, cost-effective, compliant, and consistent platform?

 

These are really significant questions. The answers rarely, though, are simple. Today, there are applications, such as Cloudgenera which will analyze the new workload and advise the analyst as to whether any of these issues are significant. It’ll also advise as to current cost models to prove out the costs over time. Having that knowledge prior to deployment could be the difference between agility and vulnerability.

 

Another issue to be addressed with opening your environment up to a hybrid or public workload is the learning curve of adopting a new paradigm within your IT group. This can be daunting. To address these kinds of shifts in approach, a new world of public ecosystem partners have emerged. These tools, create workload deployment methodologies that bridge the gap between your internal virtual environment, and ease or even facilitate that transition. Tools like Platform9’s create what is essentially a software tool that allows the administrator to decide from within vCenter’s Platform9 panel where to deploy that workload. The deployment of this tool is as simple as downloading an OVF, and deploying it into your vCenter. Platform9 leverages the VMware API’s and the AWS API’s to integrate seamlessly into both worlds. Simple, elegant, and learning curve is minimal.

 

There are other avenues to be addressed, of course. For example, what about latencies to the community? Are there storage latencies? Network latencies? How about security concerns?

 

Well, analytics against these workloads as well as those within your virtual environment will no longer be a nice-to-have, but actually a must-have.

 

Lately, I’ve become particularly enthralled with the sheer level of log detail provided by Splunk. There are many SIEM (Security Information and Event Management) tools out there, but in my experience, no other tool gives the functional use as Splunk does. To be sure, other tools, like SolarWinds provide this level of analytics as well, and do so with aplomb. Splunk, as a data collector is unparalleled, but beyond that, the ability to tailor your dashboards to show you the trends, analytics, and pertinent data against all of that volume of data in a functional at-a-glance method. The tool’s ability to stretch itself to all your workloads, security, thresholds, etc., and to present it in such a way that the monitor panel or dashboard can show you so simply where your issues and anomalies lie.

 

There is a large OpenSource community of SIEM software as well. Tools such as OSSIM, Snort, OpenVAS and BackTrack are all viable options, but remember, as OpenSource, they rarely provide the robust dashboards that SolarWinds or Splunk do. They will, as OpenSource, cost far less, but may require much more hand-holding, and support will likely be far less functional.

 

When I was starting out in the pre-sales world, we began talking of the Journey to the Cloud. It became a trope.  We’re still on that journey. The thing is, the ecosystem that surrounds the public cloud is becoming as robust as the ecosystem that exists surrounding standard, on-prem workloads.

interop.logo.2.jpg

I'm flying home after another incredible Interop experience. It’s the perfect time to capture the conversations, ideas, and feelings I experienced this week in the desert, before they fade like the tan lines I got while waiting ten minutes outside for an Uber.

 

100Gbps (The summary)

 

If money was no object, I would honestly say that this should be on our MUST ATTEND list every year. Even as a conference newbie who probably missed a ton of opportunities along the way, Interop generated an incredibly diverse set of interactions, stories, and ideas.

 

Even if money is an object (which happens to be true for most people and organizations), I would still say that making Interop a priority would reap rewards that totally justify the expense.

 

While vendors are certainly present at Interop, the overall tone is refreshingly agnostic compared to events like Cisco Live, Microsoft Ignite, and VMworld. That means sessions are more focused on the real shortcomings of products and solutions, which allows for conversations about work-arounds, alternatives, and comprehensive solutions.

 

It's not hard to guess what the big stories were at the show this year: cloud, security, and SDN all had places in the sun. More surprising was the level to which the DevOps narrative bled into conversations that were once considered pure networking.

 

Fat Pipe (The details)

  1. One example of that DevOps/NetOps transition was a talk by Jason Edelman about using Ansible to perform configuration backups on legacy (meaning SSH-connected, command-line driven) network devices. While it might sound strange to the THWACKâ community, familiar as we are with tools like NCM, it represents an extension of existing skills and technology to teams that are used to using Ansible to deploy and manage cloud- and hybrid-cloud based environments.

 

  1. There were also a few deep-dive sessions on building and leveraging coding skills, such as Pythonä for network outcomes, mostly in relationship to SDN, NVF, and the like.

 

This, in turn, led to an ongoing dialogue between speakers and attendees in several sessions on the best ways for network professionals to identify, acquire, and develop new skills that will allow them to make the leap to the new age of networking.

 

All of this built up to a narrative that was best championed during Martin Casados’ keynote. In one of the best comparisons I've heard to date, Casados compared the current movement from traditional data centers, networking, server, and storage to the evolution from in-car navigation systems to running Waze on your phone.

 

He pointed out that every layer of the data center that once featured specialized hardware-based solutions are now completely contained at the software layer.

 

This overall shift is leading to the "rise of the developer,” as Casados put it. This means no silo will be safe from hardware being optimized by a software solution. It also means developers will have more influence over choosing operational frameworks, i.e., the solutions that run the business.

 

  1. Developers, Casados pointed out, care little for Gartnerâ, or vendor-specific certifications that tie IT pros to specific solutions, or sales relationships, or the vagaries of bureaucratic procurement cycles.

 

The result is that this shift in software-as-infrastructure has the potential to disrupt everything we used to know about the business of IT. 

 

Packet Footer (Summary)

Were you at InterOp and saw/heard/discussed something I missed? Do you have a different take than mine? Do you want to hear more on a specific topic? Let me know in the comments below!

 

All of this and more (I haven't even gotten into the discussions about IoT, SDN, or IPv6 that I was able to participate in), made this one of the best conferences I have attended in a very long time.

 

It got me even more excited for conferences to come. Next up is CiscoLive in Las Vegas, July 10-14. I hope to see you there!

Interop 2016 kicked off the week with two days of IT summits that covered an amazing range topics, including cloud, containers, and microservices, IT Leadership, and cybersecurity, plus hands-on hacking tutorials. The following three days included the Expo floor opening as well as the session tracks.

 

Since the IT Leadership Summit was sold out, I decided to join the Dark Reading Cyber Security Summit Day 1. I was only planning on attending Day 1, but the content was so good that I eschewed Container Summit and attended Dark Reading's Day 2. To kick things off, the editors at Dark Reading shared some interesting insights followed by industry thought leaders.

DevOps-Sec.png

DevOps - SecOps Relational image via @petecheslock and his Austin DevOps Days 2015 presentation.

 

My top 10 takeaways from the Dark Reading Cybersecurity Summit Days are below.

  1. $71.1B was spent on cybersecurity last year.
  2. Security pros spend most of their time patching legacy stuff and fixing vulnerabilities versus addressing targeted, sophisticated attacks, which happens to be their primary security concern. Number two is phishing and social engineering attacks.
  3. Security is one of the most important priorities and one of the least resourced by IT organizations. Security pros make policy decisions, but non-security people make purchasing decisions.
  4. The weakest link is the end-user, who make up the surface area of vulnerability.
  5. There are not enough skilled security ops people. 500K to 2M more security pros are needed by 2020.
  6. The most talented security pros are hackers.
  7. The average time to detect an intrusion is 6-7 months.
  8. 92% of the intrusions, incidents, and attacks of the past 10 years fall into nine distinct patterns, which can be further reduced down to three.
  9. The cost of a breach is roughly $254 per record for breaches, including 100 records, while $0.09 per record for breaches involving 100M records. Note that the cost is a multi-variable function with many dimensions to factor in.
  10. Only 40% of attacks are malware, so stopping malware is not enough.

 

Attached below is my DART IT Skills Framework presentation from my Interop IT Leadership speaking session. One of the CIO's SLA is security, so the Cybersecurity Summit was timely.

 

Let me know what you think of the security insights, as well as my presentation below, in the comment section. I would be happy to present my DART session to our community if there is enough interest, so let me know and I will make it so.

sqlrockstar

The Actuator - May 11th

Posted by sqlrockstar Employee May 11, 2016

I'm back from Liverpool and SQLBits. It was a brilliant event, as always. If you were there I hope you came by to say hello.

 

Here's this week's Actuator, filled with things I find amusing from around the Internet...

 

What is ransomware and how can I protect myself?

You recover from backups. If you don't have backups then you are hosed.

 

Ivy League economist ethnically profiled, interrogated for doing math on American Airlines flight

To be fair, he is a member of the al-Gebra movement, and was carrying weapons of math instruction.

 

The Year That Music Died

Wonderful interactive display of the top five songs every day since 1958. Imagine if you had this kind of interaction with your monitoring data, with some machine learning on top.

 

Apple Stole My Music. No, Seriously.

Since we are talking about music, here's yet another reason why reading the fine print is important.

 

Apple's Revenue Declines For The First Time In 13 Years

I am certain it has *nothing* to do with the issues inherent in their software and services like Apple Music. None.

 

The Formula One Approach to Security

This article marks the first time I have seen the phrase "security intelligence" and now I'm thinking it will be one of the next big buzzwords. Still a great read and intro to NetFlow for those that haven't heard about that yet.

 

Study: Containers Are Great, but Skilled Admins Are Scarce

I wonder how long they spent studying this. I believe it's always been the case that skilled admins are scarce, which is why we have so many accidental admins in the world. There's more tech work available than tech people available.

 

My secret to avoiding jet lag for events revealed:

NDQE7187 copy.jpg

In the world of networking, you would be hard pressed to find a more pervasive and polarizing topic than that of SDN. The concept of controller-based, policy-driven, and application-focused networks has owned the headlines for several years as network vendors have attempted to create solutions that allow everyone to operate with the optimization and automation as the large Web-scale companies do. The hype started in and around data center networks, but over the past year or so, the focus has sharply shifted to the WAN, for good reason.

 

In this three-part series we are going to take a look at the challenges of current WAN technologies, what SD-WAN brings to the table, and what some drawbacks may be in pursuing an SD-WAN strategy for your network.

 

Where Are We Now?

 

In the first iteration of this series, we’re going to identify and discuss some of the limitations in and around WAN technology in today’s networks. The lists below are certainly not comprehensive, but speak to the general issues faced by network engineers when deploying, maintaining, and troubleshooting enterprise WANs.

 

Perspective – The core challenge in creating a policy-driven network is perspective. For the most part, routers in today's networks make decisions independent of the state of peer devices. While there certainly are protocols that share network state information (routing protocols being the primary example), actions based off of this exchanged information are exclusively determined through the lens of the router's localized perspective of the environment.

 

This can cause non-trivial challenges in the coordination of desired traffic behavior, especially for patterns that may not follow the default/standard behavior that a protocol may choose for you. Getting every router to make uniform decisions, each utilizing a different perspective, can be a difficult challenge and add significant complexity depending on the policy trying to be enforced.

 

Additionally, not every protocol shares every piece of information, so it is entirely possible that one router is making decisions off of considerably different information than what other routers may be using.

 

Application Awareness - Routing in current generation network is remarkably simple. A router considers whether or not it is aware of the destination prefix, and if so, forwards the packet on to the next hop along the path. Information outside of the destination IP address is not considered when determining path selection.  Deeper inspection of the packet payload is possible on most modern routers, but that information does not play into route selection decisions. Due to this limitation in how we identify forwarding paths, it is incredibly difficult to differentiate routing policy based off of the application traffic being forwarded.

 

Error Detection/Failover – Error detection and failover in current generation routing protocols is a fairly binary process. Routers exchange information with their neighbors, and if they don’t hear from them in some sort of pre-determined time window, they tear down the neighbor relationship and remove the information learned from that peer. Only at that point will a router choose to take what it considers to be an inferior path. This solution works well for black-out style conditions, but what happens when there is packet loss or significant jitter on the link? The answer is that current routing protocols do not take these conditions into consideration when choosing an optimal path. It is entirely possible for a link to have 10% packet loss, which significantly impact voice calls, and have the router plug along like everything is okay since it never loses connection with its neighbor long enough to tear down the connection and choose an alternate path. Meanwhile, a perfectly suitable alternative may be sitting idle, providing no value to the organization.

 

Load Balancing/Efficiency - Also inherent in the way routing protocols choose links is the fact that all protocols are looking to identify the single best path (or paths, if they are equal cost) and make it active, leaving all other paths passive until the active link(s) fail. EIGRP could be considered an exception to this rule as it allows for unequal cost load balancing, but even that is less than ideal since it won’t detect brown-out conditions on a primary link and move all traffic to the secondary. This means that organizations have to purchase far more bandwidth than necessary to ensure each link, passive or active, has the ability to support all traffic at any point. Since routing protocols do not have the ability to load balance based off of application characteristics, load balancing and failover is an all or nothing proposition.

 

As stated previously, the above list is just a quick glance at some of the challenges faced in designing and managing the WAN in today’s enterprise network.  In the second part of this series we are going to take a look at what SD-WAN does that helps remediate many of the above challenges.  Also keep your eyes peeled for Part 3, which will close out the series by identifying some potential challenges surrounding SD-WAN solutions, and some final thoughts on how you might take your next step to improving your enterprise’s WAN.

Did the title of this blog entry scare you and make you think, "Why in the world would I do that?"  If so, then there is no need to read further.  The point of this blog post is not to tell you why you should be doing so, only why some have chosen to do so, and what issues they find themselves dealing with after having done so. If you still think that the idea of moving any of your data center to the cloud is simply ludicrous, you may go back to your regularly scheduled programming.

 

If the demand for on your company's IT resources is consistent throughout the week and year, then the biggest reason for moving to the cloud really doesn't apply to you.  Consider how Amazon Web Services (AWS) got built. They discovered that most of the demand on their company's IT resources came from a few days of the year: Black Friday, Mother's Day, Christmas, etc. The rest of the year, the bulk of their IT resources were going unused. They asked themselves whether there might be other people who had the need for their IT resources when they weren't using them, and AWS was born. It has, of course, grown well beyond the simple desire to sell excess capacity into one of their most profitable business lines.


If your company's IT systems have a demand curve like that, then the public cloud might be for you. Why pay for servers to sit there for an entire year when you can rent them when demand is high and give them back when demand is low?  In fact, some companies even rent extra computing capacity by the hour when the demand is high. Imagine being able to scale the capabilities of your data center within minutes in order to meet the increased demand created by a Slashdot article or a viral video. This is the reason to go to the cloud. Then, once the demand goes down, simply give that capacity back.

 

The challenge for IT people looking to replace portions of their data center with the public cloud is automating it, and making sure that what they automate fits within the budget.  While a public cloud vendor can typically scale to whatever demand level you find yourself with, the bill will automatically scale as well. Unless the huge spike in demand is directly related to a huge spike in sales, your CFO might not take kindly to an enormous bill when your video goes viral. Make sure you plan for that ahead of time so you don't end up having to pay a huge and unexpected cost. Perhaps the decision will be made to just let things get slow for a while. After all, that ends up in the news, too. And if you believe all publicity is good publicity, then maybe it wouldn't be such a bad thing.

 

There are plenty of companies that have replaced all their data centers with the cloud. Netflix is perhaps the most famous company that runs their entire infrastructure in AWS.  But they argue that the constant changes in demand for their videos make them a perfect match for such a setup. Make sure the way your customers use your services is consistent with the way the public cloud works, and make sure that your CFO is ready for the bill if and when it happens. That's how to move things into the cloud.

As an avid cloud user, I'm always amused by people who suggest that moving things to the cloud means you don't have to manage them.  And, of course, when I say "amused," what I really mean is I feel lnigo Montoya in Princess Bride.  "You keep using that word.  I do not think it means what you think it means."

 

Why do I say this?  Because I am an avid cloud user and I manage my cloud assets all the time.  So where do we get this idea?  I'd say it starts with the idea that you don't have to manage the hardware.  Push a few buttons and a "server" magically appears in your web browser.  This is so much easier than creating a real server, which actually works similarly these days.  Push a few buttons on the right web site, and an actual server shows up at your front door in a few days.  All you have to do is plug it in, load the appropriate OS and application stack and you're ready to go.  The cloud VM is a little bit easier.  It appears in minutes and comes preloaded with the OS and application stack that you specified during the build process.

 

I think what most people think when they say their cloud resources don't need to be managed is that they don't have to worry about the hardware.  They know that the VM is running on highly resilient hardware that is being managed for them.  They don't have to worry about a failed disk drive, network controller, PCI card, etc.  It just manages itself. But anyone who thinks this is all that needs to be managed for a server must never have actually managed any servers.

 

There are all sorts of things that must be managed on a server that have nothing to do with hardware.  What about the filesystems?  When you create the VM, you create it with a volume of a certain size.  You need to make sure that volume doesn't fill up and take your server down with it.  You need to monitor the things that would fill it up for no reason, such as web logs, error logs, database transaction logs, etc.  These need to be monitored and managed.  Speaking of logs, what about those error logs?  Is anyone looking at them? Are they scanning them for errors that need to be addressed?  Somebody should be, of course.

 

Another thing that can fill up a filesystem is an excessive number of snaphshots.  They need to be managed as well.  Older snapshots need to be deleted and certain snapshots may need to kept for longer periods of time or archived off to different medium. Snapshots do not manage themselves.

 

What about my favorite topic of backups?  Is that VM getting backed up?  Does it need to be?  If you configured it to be backed up, is it backing up?  Is anyone looking at those error logs?  One of the biggest challenges is figuring out when a backup didn't run. It's relatively easy to figure out when a backup ran but failed; however, if someone configured the backup to not run at all, there's no log of that.  Is someone looking for backups that just magically disappeared?


Suffice it to say that the cloud doesn't remove the need for management.  It just moves it to a different place.  Some of these things may be able to be offloaded to the cloud vendor, of course.  But even if that's the case someone needs to watch the watcher.  There is no such thing as free lunch and there is no such thing as a server that manages itself.

Network variation is hurting us

Network devices like switches, routers, firewalls and load-balancers ship with many powerful features. These features can be configured by each engineer to fit the unique needs of every network. This flexibility is extremely useful and, in many ways, it's what makes networking cool. But there comes a point at which this flexibility starts to backfire and become a source of pain for network engineers.

Variation creeps up on you.  It can start with harmless requests for some non-standard connectivity, but I've seen those requests grow to the point where servers were plugging straight into the network core routers.  In time, these one-off solutions start to accumulate and you can lose sight of what the network ‘should’ look like.  Every part of the network becomes its own special snowflake.

I’m not judging here. I've managed quite a few networks and all of them end up with high-degrees of variation and technical debt. In fact, it takes considerable effort to fight the storm of snowflakes. But if you want a stable and useful network you need to drive out variation. Of course you still need to meet the demands of the business, but only up to a point. If you're too flexible you will end up hurting your business by creating a brittle network which cannot handle changes.

Your network becomes easier and faster to deploy, monitor, map, audit, understand and fix if you limit your network to a subset of standard components. Of course there are great monitoring tools to help you manage messy networks, but you’ll get greater value from your tools when you point them towards a simple structured network.

What’s so bad about variety?

Before we can start simplifying our networks we have to see the value in driving out that variability. Here are some thoughts on how highly variable (or heterogeneous) networks can make our lives harder as network engineers:

  • Change control - Making safe network change is extremely difficult without standard topologies or configurations. Making a change safely requires a deep understanding of the current traffic flows - and this will take a lot of time. Documentation makes this easier, but a simple standardized topology is best. The most frustrating thing is that when you do eventually cause an outage, the lessons learned from your failed change cannot be applied to other dissimilar parts of your network.
  • Discovery time can be high. How do you learn the topology of your network in advance of problems occurring? A topology mapping tool can be really helpful to reduce the pain here, but most people have just an outdated visio diagram to rely on.
  • Operations can be a nightmare in snowflake networks.  Every problem will be a new one, but probably one that could have been avoided - it's likely that you'll go slowly mad. Often you'll start troubleshooting a problem and then realize, ‘oh yeah, I caused this outage with the shortcut I took last week. Oops’.  By the way, it’s a really good sign when you start to see the same problems repeatedly. Operations should be boring, It means you can re-orient your Ops time towards 80/20 analysis of issues, rather that spending your days firefighting.
  • Stagnation -  You won't be able to improve your network until you simplify and standardize your network. Runbooks are fantastic tools for your Ops and Deployment teams, but the runbook will be useless if the steps are different for every switch in your network. Think about documenting a simple task...if network Y do step1, except if feature Z enabled then do something else, except if it’s raining or if it's a leap year.  You get the message.
  • No-Automation - If your process it too complicated to capture in a runbook you shouldn't automate it. Simplify your network, then your process, then automate.

 

Summary

Network variation can be a real source of pain for us engineers. In this post we looked at the pain it causes and why we need to simplify and standardize our networks. In Part 2 we'll look at the root causes for these complicated, heterogenous networks and how we can begin tackling the problem.

Data center consolidations have been a priority for years, with the objectives of combatting server sprawl, centralizing and standardizing storage, and streamlining application management and establishing shared services across multiple agencies.

 

But, consolidation has created challenges for federal IT professionals, including:

  • Managing the consolidation without an increase in IT staff
  • Adapting to new best practices like shared services and cloud computing
  • Shifting focus to optimizing IT through more efficient computing platforms

 

Whether agencies have finished their consolidation or not, federal IT pros have definitely felt the impact of the change. But how do the remaining administrators manage the growing infrastructure and issues while meeting SLAs?

 

One way data center administrators can stay on top of all the change is to modernize their monitoring system, with the objective of improved visibility, and troubleshooting.

 

The Value of Implementing Holistic Monitoring

 

A holistic approach to monitoring provides visibility into how each individual component is running and impacting the environment as a whole. It can bridge the gap that exists between the IT team and the program groups through connected visibility.

 

Responsibility

Who is responsible for what? Shared services can be hard to navigate.

 

Even though the data center team now owns the infrastructure and application operations, the application owners still need to ensure application performance. Both teams require visibility into performance with a single point of truth, which streamlines communication and eases the transition to shared services.

 

Application Performance

Application performance is critical to executing agency missions, so when users provide feedback that an application is slow, it is up to data center administrators to find the problem and fix it—or escalate it—quickly.

 

Individually checking each component of the IT infrastructure—the application, servers, storage, database or a virtualized environment—can be tedious, time consuming and difficult. End-to-end visibility into how each component is performing, allows for quick identification and remediation of the issues.

 

Virtualization

Virtualization can introduce complexities and management challenges. In a virtual environment, virtual machines can be cloned and moved around so easily and often that the impact on the entire environment can be missed, especially in a dynamically changing infrastructure.

 

Consolidated monitoring and comprehensive awareness of the end-to-end virtual environment is the answer to effective change management in the virtualized environment.

 

Efficiency

Efficiency was a key driver behind consolidations, but this can seem near impossible for the remaining data centers. But with integrated monitoring that provides end-to-end visibility, data center administrators can troubleshoot issues in seconds instead of hours or days and proactively manage their IT. With the right tools, administrators can provide end-users with high service levels.

 

Consolidation is part of the new reality for data center administrators. Holistic, integrated monitoring and management of the dynamically changing IT environment will help to refine the new responsibilities of being a shared service, ensure mission-critical applications are optimized and improve visibility into virtualized environments.

 

Find the full article on Signal.

Practitioners in nearly every technology field are facing revolutionary changes in the way systems and networks are built. Change, by itself, really isn't all that interesting. Those among us who have been doing this a while will recognize that technological change is one of the few reliable constants. What is interesting, however, is how things are changing.

 

Architects, engineers, and the vendors that produce gear for them have simply fallen in love with the concept of abstraction. The abstraction flood gates have metaphorically flown open following the meteoric rise of the virtual machine in enterprise networks. As an industry, we have watched the abstraction of the operating system -- from the hardware it lives on -- give us an amazing amount of flexibility in the way we deploy and manage our systems.  Now that the industry has fully embraced the concept of abstraction, we aim to implement it everywhere.

 

Breaking away from monolithic stack architecture

 

If we take a look at systems specifically, it used to be that the hardware, the operating system, and the application all existed as one logical entity.  If it was a large application, we might have components of the application split out across multiple hardware/OS combos, but generally speaking the stack was a unit. That single unit was something we could easily recognize and monitor as a whole. SNMP, while it has its limitations, has done a decent job of allowing operators to query the state of everything in that single stack.

 

Virtualization changed the game a bit as we decoupled the OS/Application from the hardware. While it may not have been the most efficient way of doing it, we could still monitor the VM like we used to when it was coupled with the hardware.  This is because we hadn't really changed the architecture.  Abstraction gave us some significant flexibility but our applications still relied on the same components, arranged in a similar pattern to the bare-metal stacks we started with.  The difference is that we now had two unique units where information collection was required, the hardware remained as it always had and the OS/Application became a secondary monitoring target.  It took a little more configuration but it didn't change the nature of the way we monitored the systems.

 

Cloud architecture changes everything

 

Then came the concept of cloud infrastructure. With it, developers began embracing the elastic nature of the cloud and started building their products to take advantage of it. Rather than sizing an application stack based off of guesstimates of the anticipated peak load, it can now be sized minimally and scaled out horizontally when needed by adding additional instances. Previously, just a handful of systems would have handled peak loads. Now those numbers could be dozens, or even hundreds of dynamically built systems scaled out based on demand. As the industry moves in this direction, our traditional means of monitoring simply do not provide enough information to let us know if our application is performing as expected.

 

The networking story is similar in a lot of ways. While networking has generally been resistant to change over the past couple of decades, the need for dynamic/elastic infrastructure is forcing networks to take several evolutionary steps rather quickly.  In order to support the cloud models that application developers have embraced, the networks of tomorrow will be built with application awareness, self-programmability, and moment-in-time best path selection as core components.

 

Much like in the systems world, abstraction is one of the primary keys to achieving this flexibility. Whether the new model of networks is built upon new protocols, or overlays of existing infrastructure, the traditional way of statically configuring networks is coming to an end. Rather than having statically assigned primary, secondary, and tertiary paths, networks will balance traffic based off of business policy, link performance, and application awareness. Fault awareness will be built in, and traffic flows will be dynamically routed around trouble points in the network. Knowing the status of the actual links themselves will become less important, much like physical hardware that applications use. Understanding network performance will require understanding the actual performance of the packet flows that are utilizing the infrastructure.

 

At the heart of the matter, the end goal appears to be ephemeral state of both network path selection as well as systems architecture.

 

So how does this change monitoring?

 

Abstraction inherently makes application and network performance harder to analyze. In the past, we could monitor hardware state, network link performance, CPU, memory, disk latency, logs, etc. and come up with a fairly accurate picture of what was going on with the applications using those resources. Distributed architectures negate the correlation between a single piece of underlying infrastructure and the applications that use it.  Instead, synthetic application transactions and real-time performance data will need to be used to determine what application performance really looks like. Telemetry is a necessary component for monitoring next generation system and network architectures.

 

Does this mean that SNMP is going away?

 

While many practitioners wouldn't exactly shed a tear if they never needed to touch SNMP again, the answer is no. We still will have a need to monitor the underlying infrastructure even though it no longer gives us the holistic view that it once did. The widespread use of SNMP as the mechanism for monitoring infrastructure means it will remain a component of monitoring strategies for some time to come. Next generation monitoring systems will need to integrate the traditional SNMP methodologies with deeper levels of real-time application testing and awareness to ensure operators can remain aware of the environments they are responsible for managing.

“With me, everything turns into mathematics.”

– Rene Descartes

 

 

Ransomware is not new. Beginning as misleading ads, and warnings that your computer is infected, Symantec traces ransomware deployments (including crypto lockers) back to 2005.[1] Early crypto locking extortion scams were not that successful. However, current business owners face increasing risk of cyber extortion, and crypto locking ransomware has been on the rise over the past two years. It has become so prevalent that the FBI issued a warning highlighting the increasing threat to businesses.[2]  Given the increasing velocity of deployment, the ease of infiltration, and the dire consequences of infection, we believe ransonware is a significant risk to businesses.

 

There are two primary factors contributing to the rise of ransomware:

 

  1. More real-time business data has been digitized, especially in health care and loan processing, which has increased the available pool of targets.
  2. Anonymous payment systems make monetizing ransomware easy, efficient, and risk-free for cyber criminals.

 

Observed samples of ransomware in 2014 totaled almost 9 million, yet in Q2 2015 alone, samples hit 4 million. This run rate is doubling year over year. Ransomware, unlike many vulnerabilities and malware, does not require administrative privileges, as its purpose is to encrypt the files useful to the end-user. Furthermore, the same types of scams and hooks that make ransomware successful on Windows are being deployed against other platform targets. 

What systems are at risk?

Cyber criminals have built ransomware kits that target a wide range of systems, including Windows, Linux, Android, and recently (March 2016) Mac OS. While the majority of ransomware successes are still on Windows, users should be alert to the increasing risk of ransomware on Android, which is on the rise.  Android ransomware could become particularly troubling in dedicated devices used in health care, manufacturing, and retail.

How does ransomware behave?

On Windows, ransomware works to impair your computer in one of three common ways:

 

  1. Encrypt your files (Locky and Cerber).
  2. Prevent you from accessing in certain apps (FakeBsod – locks browser).[3]
  3. Restrict access to the operating system itself (Revton – locks PC).

 

On Android, ransomware falls generally into one of two types:

 

  1. 1. Screen locking.
  2. 2. File encrypting.

 

Unfortunately for Android users, both forms of ransomware are increasingly seen in the wild. The chronology of Android ransomware follows a similar pattern to the Windows chronology; it begins with a fake antivirus, then fake police demands, followed by full cryptographic file locking. Versions of Simplocker malware on Android encrypt the SIM card; versions of Lockerpin acquire administrative privileges and prevent access to the device.[4]

 

On Linux, the most common target is web servers. The ransomware Linux.Encoder.1 has been reported in the wild since November 2015. This variant does require root privileges, and it walks the web server file directory structure as well as nginx, /root and others.[5]  The reported ransom for this variant is one bitcoin.

 

Fortunately for Mac OS users, the first reported ransomware that encrypts Mac OS files has not been widely deployed or successful. With only 6500 downloads identified, Mac OS ransomware is a drop in the proverbial bucket.

What organizations are likely targets?

As mentioned above, real-time access needs for critical data create the easiest targets for ransomware. While no individual or business is free from worry, public service (police stations) and health care (hospitals) have been successfully targeted in the last 12 months. We can infer that other businesses, such as title companies, car dealerships, and other loan processors are likely targets as well. The criticality of data in these organizations is intuitive, and most cyber criminals keep the ransom amount “reasonable” (around $10,000). This amount is low enough that it appears to be economically rational for businesses that need to restore access quickly. Additionally, setting up a bitcoin wallet is relatively straightforward, with a number of YouTube how-to videos readily accessible. For an individual system, or business with less real-time critical data, the price is usually a single bitcoin.  

 

What defensive steps can you take?

Prevention is, of course, the goal. However, between the ranges of infection vectors (SMS on Android, browser exploitation, spam malware, and exploit kits), and the volume of ransomware samples observed in the wild, the risk of initial infection of ransomware is difficult to eliminate. Therefore a combination of preventative tactics as well as planning for incident remediation is the best risk-mitigating course of action.

 

Preventative Actions

 

  1. Educate your users on the risk. Users who process a large number of inbound attachments and emails, such as accounts receivable processors, account managers, and marketing personnel, are particularly vulnerable.
  2. Maintain patches on desktop users’ systems, as well as critical data servers.  Desktop users are often updated in a haphazard manner, or not at all, which makes them vulnerable to exploitation.
  3. Reduce or eliminate automatic mapping of drives. Recommended by thwack community member Stephen Black, eliminating automatic drive mapping means the ransomware won’t be able to walk your network from one initial infected system.
  4. Monitor for infections to prevent contagion.  If you use LEM, there is a monitoring rule you can download and use. https://thwack.solarwinds.com/docs/DOC-186700

 

Incident remediation

If you find yourself in the unfortunate situation where a system has become locked with ransomware, you have limited options. While some researchers have been successful reverse engineering ransomware, the ability to do so takes time and depends on vulnerabilities in the ransomware code itself. If you were lucky enough to be hit by one of these old variants, you can use the techniques the researchers have published.[6]  But, realistically, for most situations there are only two real options:

 

  1. Restore from backup.
  2. Pay the ransom.

 

If your business fits in the class of organizations currently being targeted, or shares characteristics with organizations being targeted, it would be prudent to actually test your ability to restore from your backup media, whether that is a cloud backup, local backup, or offsite backup. Businesses with Android users are encouraged to explore mobile device backup, or at least educate your users on their options.[7] Unfortunately, the only time the restore from backup process is usually tested or validated is during an audit, or test of a business continuity or disaster recovery plan, which may be too late.

 

Do you have a favorite way to use LEM to look for malware? 

When did you last test your business continuity plan? 

Know anyone who has successfully recovered files after a ransomware attack?

Share your stories so we can all benefit.



[1] Symantec, Internet Security Threat Report, 2016 pg. 58

[2] https://www.fbi.gov/news/podcasts/thisweek/ransomware-on-the-rise.mp3/view

[3] https://www.microsoft.com/security/portal/mmpc/shared/ransomware.aspx

[4] http://www.welivesecurity.com/2015/09/10/aggressive-android-ransomware-spreading-in-the-usa/

[5] http://vms.drweb.com/virus/?i=7704004&lng=en

[6] https://labs.bitdefender.com/2015/11/linux-ransomware-debut-fails-on-predictable-encryption-key/

http://www.darkreading.com/cloud/cisco-offers-free-decryption-tool-for-ransomware-victims/d/d-id/1320188

[7] http://www.gottabemobile.com/2016/01/11/how-to-backup-android/

I'm on my way to Liverpool for SQLBits. So if you are reading this and find yourself near Liverpool this week, head on over to SQLBits and say hello.

 

Things I find amusing from around the Internet...

 

Star Wars: A Bad Lip Reading

In case you haven't seen this yet, I figured this was a good way to celebrate May the Fourth. Also? I want a wooden snowman.

 

Nearly All of Your ATMs are Insecure

Not sure what I find more amusing here, the fact that 'ATM not secure' is seen as something new or the fact that it's a Russian firm cited in the report.

 

Automating Change With Help From Fibonacci

As a math geek and IT pro, this article is full of so much win that I want to place it inside a Golden Rectangle and place it on top of a Klein Bottle.

 

Scientists Have Figured Out How To Put Electronics Inside Your Body

"We've always dreamed of infusing our bodies with technology". Um, no, we haven't. And as if building robots wasn't enough, now we have this to help usher in the singularity. (Darth Vader)

 

The advent of the citizen developer

Otherwise known as "shadow IT", and not something new. End users will turn to whatever tools they can find in an effort to do their jobs better than the day before.

 

FBI Says It Won't Disclose How It Accessed Locked iPhone

Because they don't know what they did, kinda like how I can never explain to my mother what I did to her computer to get her email working again.

 

Digital Genies

How can we ensure safety for humans when the robots rise up? Fascinating post here about how AI could go horribly wrong if it *thinks* it has the right data, but doesn't.

 

04f1d1ac34d65ed736718d13a16ee56f.jpg

jdgreen

Containers in The Real World

Posted by jdgreen May 3, 2016

All industry changing trends have an uncomfortable period where the benefit to adoption is understood but real world use is often exaggerated. The way the modern use of containers fundamentally changes the paradigm with which operations folks run their data centers means that the case for adoption needs to be extremely compelling before anyone will move forward.

 

Also, since change is hard, major industry-shifting trends come with lots of pushback from people who have built a career on the technology that is being changed, disrupted, or even displaced. In the case of containers, there exists a sizeable assembly of naysayers and not shockingly, they generally come from an Operations (and specifically virtualization) background.

 

To that end, I decided to dig deep into a handful of case studies and interview industry acquaintances about their experiences with containers in production. Making the case that containers can be handy for 2 developers on their laptops is easy; I was curious to find out what happens when companies adopt a container-based data center practice throughout the entire software lifecycle and at substantial scale. Here is what I found.

It’s Getting Better

One of the major challenges many people reported with containerization in the early stages with relation to products like Docker Engine and rkt was that at scale, it was very difficult to manage. Natively, these tools didn’t include any sort of single pane of glass management or higher level orchestration.

 

As the container paradigm has matured, tools like Docker Swarm, Kubernetes, and Cloud Foundry have helped adopters make sense of what’s happening across their entire environment and begin to more successfully automate and orchestrate the entire software development lifecycle.

Small Businesses Are Last, As Usual

As with other pivotal data center technologies like server virtualization, small businesses are sometimes least likely to see a valuable return by jumping on the bandwagon. Because of their small data center footprint, they don’t see the dramatic impact to the bottom line that enterprises do when making a change to the way their data center operates. While that’s obviously not always the case, my discussions with colleagues in the field and research into case studies seems to indicate that just like all the big shifts before it, full-steam-ahead containerization is primarily for the data centers of scale, at least for now.

 

One way this might change in the future is software distribution by manufacturers in a container format. While small businesses might not need to leverage containers to accelerate their software development practice, they may start getting forced into containerizion by the software manufacturers they deal with. Just like many, many ISVs today deliver their offering in an OVA format to be deploy into a virtualized environment, we may begin to see lots of containers delivered as the platform for running a particular software offering.

Containers are Here to Stay

As much as the naysayers and conservative IT veterans speculate about containers being mostly hype, the anecdotal evidence I’ve collected seems to indicate that many organization have indeed seen dramatic improvement in their operations, limited defects, and ultimately seen the impact to their bottom line.

 

I try to be very careful about buying in to hype, but it doesn’t look like containers are slowing down any time soon. The ecosystem that is developing around the paradigm is quite substantial, and as a part of the overall DevOps methodology trend, I see container-based technologies enabling the overall vision as much as any other sort of technology. It will be interesting to see how the data center landscape looks with regard to containers in 2020; will it be like the difference between virtualization in 2005 and 2015?

Today’s users demand access to easy-to-use applications even though the IT landscape has become a complex mishmash of end-user devices, connectivity methods, and siloed IT organizations, some of which contain further siloes for applications, databases and back-end storage.

 

These multiple tiers of complexity, combined with end users’ increasing dependency on accessible applications, creates significant difficulties for IT professionals across the globe, but especially in government agencies, with all their regulations and policies.

 

Figuring out how to maintain application performance in these complex environments has become a key objective for federal IT staff. Here are five methods for preserving a high-performance app stack:

 

1. Simplifying application stack management

 

A significant part of the effort lies in simplifying management of the application stack (app stack) itself, which includes the application, middleware and the extended infrastructure the application requires for performance. Think about the entire environment.

 

Rather than looking at networks, storage, servers and clients as distinct silos of individual responsibility, federal IT departments can reduce the complexity of the sometimes conflicting information they use to manage these silos. The simplification lies in the practice of monitoring all applications and the resources they use as a single application ecosystem, recognizing the relationships.

 

Working through the entire app stack lets federal IT pros understand where performance is degraded and improves troubleshooting.

 

2. Monitoring servers

 

Server monitoring is a significant part of managing the app stack. Servers are the engines that provide application services to the end user. And applications need sufficient CPU cycles, memory, storage I/O and network bandwidth to work effectively.

 

Monitoring current server conditions and analyzing historical usage trends is the key to ensuring problems are resolved rapidly or prevented.

 

3. Monitoring virtualization

 

Monitoring the virtualization infrastructure is key and Federal IT pros should monitor how and when VMs move from one host or cluster to another as well as the status of shared hosts, networks and storage resources, especially if they are over-subscribed.

 

Federal IT pros should prioritize how individual VMs on a host are working together, whether resource contention is occurring on a host or a cluster, and what applications are causing those conflicts. In addition, federal IT pros should keep tabs on network latency.

 

4. Monitoring user devices

 

Today’s users are running applications on all types of devices with a range of capabilities and connectivity options, all of which are significant factors in maintaining a healthy app ecosystem.

 

5. Bring it together with alerting

 

The last component is alerting, which notifies technicians when there is an issue with a component of the app stack prior to the first end-user noticing the problem.

 

The ability to set proactive performance baselines for devices and applications to signal when app stack issues arise helps both in day-to-day monitoring and future capacity planning.

 

In short, it’s critical for federal IT pros to be aware of, monitor and set up notifications across the app stack – from back end storage, through application services and processes to front-end users – and provide high performance from a holistic perspective.

 

Find the full article on Government Computer News.

Filter Blog

By date:
By tag: