Skip navigation
1 2 3 4 Previous Next

Geek Speak

2,081 posts

It was a busy week for service disruptions and security breaches. We had Amazon S3 showing us that, yes, the cloud can go offline at times. And we found out that out teddy bears may be staging an uprising. And we also found that Uber has decided to use technology and data to continue operating illegally in cities and towns worldwide. Not a good week for those of us that enjoy having data safe, secure, and available.


So, here's a handful of links from the intertubz I thought you might find interesting. Enjoy!


Data from connected CloudPets teddy bears leaked and ransomed, exposing kids' voice messages

The ignorance (or hubris) of the CloudPets CEO is on full display here. I am somewhat surprised that anyone could be this naive with regard to security issues these days.


Yahoo CEO Loses Bonus Over Security Lapses

Speaking of security breaches, Yahoo is in the news again. You might think that losing $2 million USD would sting a bit, but considering the $40 million she gets for running Yahoo into the ground I think she will be okay for the next few years, even with living in the Valley.


Hackers Drawn To Energy Sector's Lack Of Sensors, Controls

I'd like to think that someone, somewhere in our government, is actively working to keep our grid safe. Otherwise, it won't be long before we start to see blackouts as a result of some bored teenager on a lonely summer night.


Summary of the Amazon S3 Service Disruption in the Northern Virginia (US-EAST-1) Region

Thanks to a typo, the Amazon S3 service was brought to a halt for a few hours last week. In the biggest piece of irony, the popular website Is It Down Right Now? Website Down or Not? was, itself, down as a result. There's a lot to digest with this outage, and it deserves its own post at some point.


How Uber Deceives the Authorities Worldwide

I didn't wake up looking for more reasons to dislike how Uber is currently run as a business, but it seems that each week they reach a new low.


Thirteen thousand, four hundred, fifty-five minutes of talking to get one job

A bit long, but worth the read as it helps expose the job hiring process and all the flaws in the current system used by almost every company. I've written about bad job postings before, as well as how interviews should not be a trivia contest, so I enjoyed how this post took a deeper look.


If the Moon Were Only 1 Pixel - A tediously accurate map of the solar system

Because I love things like this and I think you should, too.


Just a reminder that the cloud can, and does, go offline from time to time:




Last week Amazon Web Services S3 storage in the East region went offline for a few hours. Since then, AWS has published a summary review of what happened. I applaud AWS for their transparency, and I know that they will use this incident as a learning lesson to make things better going forward. Take a few minutes to read the review and then come back here. I'll wait.


Okay, so it's been a few days since the outage. We've all had some time to reflect on what happened. And, some of us, have decided that now is the time to put on our Hindsight Glasses and run down a list of lingering questions and comments regarding the outage.


Let's break this down!


"...we have not completely restarted the index subsystem or the placement subsystem in our larger regions for many years."

This, to me, is the most inexcusable part of the outage. Anyone that does business continuity planning will tell you that annual checks are needed on such play books. You cannot just wave that away with, "Hey, we've grown a lot in the past four years and so the play book is out of date." Nope. Not acceptable.


"The servers that were inadvertently removed supported two other S3 subsystems."

The engineers were working on a billing system, and they had no idea that those billing servers would impact a couple of key S3 servers. Which brings about the question, "Why are those systems related?" Great question! This reminds me of the age-old debate regarding dedicated versus shared application servers. Shared servers sound great until one person needs a reboot, right? No wonder everyone is clamoring for containers these days. Another few years and mainframes will be under our desks.


"Unfortunately, one of the inputs to the command was entered incorrectly, and a larger set of servers was removed than intended."

But the command was allowed to be accepted as valid input, which means the code doesn't have any check to make certain that the command was indeed valid. This is the EXACT scenario that resulted in Jeffrey Snover adding the -WHATIF and -CONFIRM parameters into Powershell. I'm a coding hack, and even I know the value in sanitizing your inputs. This isn't just something to prevent SQL injection. It's also to make certain that as a cloud provider you don't delete a large number, or percentage, of servers by accident.


"While removal of capacity is a key operational practice, in this instance, the tool used allowed too much capacity to be removed too quickly."

So, they don't ever ask themselves, "What if?" along with the question, "Why?" These are my favorite questions to ask when designing/building/modifying systems. The 5-Whys is a great tool to find the root cause, and the use of "what if" helps you build better systems that help avoid the need for root cause reviews.


"We will also make changes to improve the recovery time of key S3 subsystems."

Why wasn't this a thing already? I cannot understand how AWS would get to the point that it would not have high availability already built into their systems. My only guess here is that building such systems costs more, and AWS isn't interested in things costing more. In the race to the bottom, corners are cut, and you get an outage every now and then.


"...we were unable to update the individual services’ status on the AWS Service Health Dashboard (SHD) because of a dependency the SHD administration console has on Amazon S3."

The AWS dashboard for the East Region was dependent upon the East Region being online. Just let that sink in for a bit. Hey, AWS, let me know if you need help with monitoring and alerting. We'd be happy to help you get the job done.


"Other AWS services in the US-EAST-1 Region that rely on S3 for storage...were also impacted while the S3 APIs were unavailable."

Many companies that rely on AWS to be up and running were offline. My favorite example is the popular website Is It Down Right Now? Website Down or Not? was itself, down as a result of the outage. If you migrate your apps to the cloud, you need to take responsibility for availability. Otherwise, you run the risk of being down with no way to get back up.


Look, things happen. Stuff breaks all the time. The reason this was such a major event is because AWS has done amazing work in becoming the largest cloud provider on the planet. I'm not here to bury AWS, I'm here to highlight the key points and takeaways from the incident to help you make things better in your shop. Because if AWS, with all of its brainpower and resources, can still have these flaws, chances are your shop might have a few, too. 

I have been talking about the complexity of resolving performance issues in modern data centers. I’ve particularly been talking about how it is a multi-dimensional problem. Also, that virtualization significantly increases the number of dimensions for performance troubleshooting. My report of having been forced to use Excel to coordinate brought some interesting responses. It is, indeed, a very poor tool for consolidating performance data.


I have also written in other places about management tools that are focused on the data they collect, rather than helping to resolve issues. What I really like about PerfStack is the ability to use the vast amount of data in the various SolarWinds tools to identify the source of performance problems.


The central idea in PerfStack is to gain insights across all of the data that is gathered by various SolarWinds products. Importantly, PerfStack allows the creation of ad hoc data collections of performance data. Performance graphs for multiple objects and multiple resource types can be stacked together to identify correlation. My favorite part was adding multiple different performance counters from the different layers of infrastructure to a single screen. This is where I had the Excel flashback, only here the consolidation is done programmatically. No need for me to make sure the time series match up. I loved that the performance graphs were re-drawing in real- time as new counters were added. Even better was that the re-draw was fast enough that counters could be added on the off chance that they were relevant. When they are not relevant, they can simply be removed.  The hours I wasted building Excel graphs translate into minutes of building a PerfStack workspace.


I have written elsewhere about systems management tools that get too caught up in the cool data they gather. These tools typically have fixed dashboards that give pretty overviews. They often cram as much data as possible into one screen. What I tend to find is that these tools are inflexible about the way the data is combined. The result is a dashboard that is good at showing that everything is, or is not, healthy but does not help a lot with resolving problems. The dynamic nature of the PerfStack workspace lends itself to getting insight out of the data, and helping identify the root cause of problems. Being able to quickly assemble the data on the load on a hypervisor and the VM operating system, as well the application statistics speeds troubleshooting. The ability to quickly add performance counters for the other application dependencies lets you pinpoint the cause of the issue quickly. It may be that the root cause is a domain controller that is overloading its CPU, while the symptom is a SharePoint server that is unresponsive.


PerfStack allows very rapid discovery of issue causes. The value of PerfStack will vastly increase as it is rolled out across the entire SolarWinds product suite.


You can see the demonstrations of PerfStack that I saw at Tech Field Day on Vimeo: NetPath here and SAM here.

As IT professionals, we have a wide variety of tools at our disposal for any given task. The same can be said for the attackers behind the increasing strength and number of DDoS attacks. The latest trend of hijacked IoT devices, like the Mirai Botnet, deserve a lot of attention because of their prevalence and ability to scale, mostly due to a lack of security and basic protections. This is the fault of both manufacturers and consumers. However, DDoS attacks at scale are not really a new thing, because malware-infected zombie botnets have been around for a while. Some fairly old ones are still out there, and attackers don’t forget their favorites.


One of the largest attacks in 2016 came in October, and measured in at 517 Gbps. This attack was not a complex, application-layer hack, or a massive DNS reflection, but a massive attack from malware that has been around for more than two years, called Spike. Spike is commonly associated with x86 Linux-based devices (often routers with unpatched vulnerabilities), and is able to generate large amounts of application-layer HTTP traffic. While Mirai and other IoT botnets remained top sources of DDoS traffic in 2016, they were not alone.




The complexity of these attacks continues to evolve. What used to be simple volumetric flooding of UDP traffic has moved up the stack over time. Akamai reports that between Q4 2015 and Q4 2016 there was a 6% increase in infrastructure layer attacks (layer 3 & 4), and a 22% increase in reflection-based attacks. At the same time, while overall web application attacks decreased, there was a 33% increase in SQLi attacks.


The application layer attacks become increasingly difficult to mitigate due to their ability to mimic real user behavior. They are more difficult to identify, and often have larger payloads. They are often combined with other lower-level attacks for variety and larger attack surface. This requires vigilance on the part of those responsible for the infrastructure we rely on, to protect against all possible attack vectors.




Not surprising is the fact that China and the United States are the primary sources of DDoS attacks, with China dominating Q1, Q2, and Q3 of 2016. The United States “beat” China in Q4 spiking to 24% of global DDoS traffic for that quarter. The increase in the number of source IP addresses here is dramatic, with the U.S. numbers leaping from about 60K in Q3 to 180K in Q4. This is largely suspected to be due to a massive increase in IoT (Mirai) botnet sources. Black Friday sales, perhaps?


While attacks evolve, become larger and more complex, some simple tried-and-true methods of disrupting the internet can still be useful. Old tools can become new again. Reports from major threat centers consistently show that Conficker is still one of the most prevalent malware variants in the wild, and it has been around since 2008.


Malware is often modeled after real biological viruses, like the common cold, and they are not easily eliminated. A handful of infected machines can re-populate and re-infect thousands of others in short order, and this is what makes total elimination a near impossibility.


There is no vaccine for malware, but what about treating the symptoms?


A concerted effort is required to combat the looming and real threat these DDoS attacks pose. Manufacturers of infrastructure products, consumer IoT devices, mobile phones, service providers, enterprise IT organizations, and even the government are on the case. Each must actively do their part to reinforce against, protect from, and identify sources of malware to slow the pace of this growing problem.


The internet is not entirely broken, but it is vulnerable to the exponential scale of the DDoS threat.

By Joe Kim, SolarWinds Chief Technology Officer


It’s time to stop treating data as a commodity and create a secure and reliable data recovery plan by following a few core strategies.


1. Establish objectives


Establish a Recovery Point Objective (RPO) that determines how much data loss is acceptable. Understanding acceptable risk levels can help establish a baseline understanding of where DBAs should focus their recovery efforts.


Then, work on a Recovery Time Objective (RTO) that shows how long the agency can afford to be without its data.


2. Understand differences between backups and snapshots


There’s a surprising amount of confusion about the differences between database backups, server tape backups, and snapshots. For instance, many people have a misperception that a storage area network (SAN) snapshot is a backup, when it’s really only a set of data reference markers. Remember that a true backup, either on- or off-site, is one in which data is securely stored in the event that it needs to be recovered.


3. Make sure those backups are working


Although many DBAs will undoubtedly insist that their backups are working, the only way to know for sure is to test the backups by doing a restore. This will provide assurance that backups are running — not failing — and highly available.


4. Practice data encryption


DBAs can either encrypt the database backup file itself, or encrypt the entire database. That way, if someone takes a backup, they won’t be able to access the information without a key. DBAs must also ensure that if a device is lost or stolen, the data stored on the device remains inaccessible to users without proper keys.


5. Monitor and collect data


Combined with network performance monitoring and other analysis software, real-time monitoring and real-time data collection can improve performance, reduce outages, and maintain network and data availability.


Real-time collection of information can be used to do proper data forensics. This will make it easier to track down the cause of an intrusion, which can be detected through monitoring.


Monitoring, database analysis, and log and event management can help DBAs understand if something is failing. They’ll be able to identify potential threats through things like unusual queries or suspected anomalies. They can compare the queries to their historical information to gauge whether or not the requests represent potential intrusions.


6. Test, test, test


If you’re managing a large database, there’s simply not enough space or time to restore and test it every night. DBAs should test a random sampling taken from their databases. From this information, DBAs can gain confidence that they will be able to recover any database they administer, even if that database is in a large pool. If you’re interested in learning more, check out this post, which gets into further detail on database sampling.


Data is quickly becoming a truly precious asset to government agencies, so it is critical to develop a sound data recovery plan.


Find the full article on our partner DLT’s blog, Technically Speaking.

I’ve long held the belief that for any task there are correct approaches and incorrect ones. When I was small, I remember being so impressed by the huge variety of parts my father had in his tool chest. Once, I watched him repair a television remote control, one that had shaped and tapered plastic buttons. The replacement from RCA/Zenith, I believe at the time, cost upwards of $150. He opened the broken device, determined that the problem was that the tongue on the existing button had broken, and rather than epoxy the old one back together, he carved and buffed an old bakelite knob into the proper shape, attached it in place of the original one, and ultimately, the final product looked and performed as if it were the original. It didn’t even look different than it had. This, to me, was the ultimate accomplishment. Almost as the Hippocratic Oath dictates, above all, do no harm. It was magic.


When all you have is a hammer, everything is a nail, right? But that sure is the wrong approach.


Today, my favorite outside work activity is building and maintaining guitars. When I began doing this, I didn’t own some critical tools. For example, an entire series of “Needle Files” and crown files are appropriate for the shaping and repair of frets on the neck. While not a very expensive purchase, all other tools would fail in the task at hand. The correct Allen wrench is necessary for fixing the torsion rod on the neck. And the ideal soldering iron is critical for proper wiring of pickups, potentiometers, and the jack. Of course, when sanding, a variety of grades are also necessary. Not to mention, a selection of paints, brushes, stains, and lacquers.


The same can be said of DevOps. Programming languages are designed for specific purposes, and there have been many advances in the past few years pointing to what a scripting task may require. Many might use Bash, batch, or PowerShell to do their tasks. Others may choose PHP or Ruby on Rails, while still others choose Python as their scripting tools. Today, it is my belief that no one tool can accommodate every action that's necessary to perform these tasks. There are nuances to each language, but one thing is certain: many tasks require the collaborative conversation between these tools. To accomplish these tasks, the ideal tools will likely call functions back and forth from other scripting languages. And while some bits of code are required here and there, currently it's the best way to approach the situation, given that many tools don't yet exist in packaged form. The DevOps engineer, then, needs to write and maintain these bits of code to help ensure that they are accurate each time they are called upon. 


As correctly stated in comments on my previous posting, I need to stress that there must be testing prior to utilizing these custom pieces of code to help ensure that other changes that may have taken place within the infrastructure are accounted for each time these scripts are set to task.


I recommend that anyone who is in DevOps get comfortable with these and other languages, and learn which do the job best so that DevOps engineers become more adept at facing challenges.


At some point, there will be automation tools, with slick GUI interfaces that may address many or even all of these needs when they arise. But for the moment,  I advise learning, utilizing, and customizing scripting tools. In the future, when these tools do become available, the question is, will they surmount the automation tools you’ve already created via your DevOps? I cannot predict.

As you spend more time in security, you start to understand that keeping up with the latest trends is not easy. Security is a moving target, and many organizations simply can’t keep up. Fortunately for us, Cisco releases an annual security report that can help us out in this regard. You can find this year's report, as well as past reports, here. In this post, I wanted to share a few highlights that illustrate why I believe security professionals should be aware of these reports.


Major findings

A nice feature of the Cisco 2017 Annual Cyber Security Report is the quick list of major findings. This year, Cisco notes that the three leading exploit kits -- Angler, Nuclear, and Neutrino --  are vanishing from the landscape. This is good to know, because we might be spending time and effort looking for these popular attacks while other lesser-known exploit kits start working their way into the network. And based on Cisco’s findings, most companies are using several security vendors with more than five security products in their environment, and only about half of the security events received in a given day are reviewed. Of that number, 28% are deemed legitimate, and less than half that number are remediated. We’re having a hard time keeping up, and our time spend needs to be at a live target, not something that’s no longer prevalent.


Gaining a view to adversary activity

In the report's introduction, Cisco covers the strategies that adversaries use today. These include taking advantage of poor patching practices, social engineering, and malware delivery through legitimate online content, such as advertising. I personally feel that you can't defend your network properly unless you know how you’re being attacked. I suppose you could look at it this way. Here in the United States, football is one of the most popular sports. It’s common practice for a team to study films of their opponents before playing them. This allows them to adjust their offensive and defensive game plan ahead of time. The same should be true for security professionals. We should be prepared to adjust to threats, and reviewing Cisco’s security report is similar to watching those game films.


In the security report, Cisco breaks down the most commonly observed malware by the numbers. It also discusses how attackers pair remote access malware with exploits in deliverable payloads. Some of what I gleaned from the report shows that the methods being used are the same as what was brought out in previous reports, with some slight modifications.


My take

From my point of view, the attacks are sophisticated, but not in a way that’s earth shattering. What I get from the report is that the real issue is that there are too many alerts from too many security devices, and security people can't sort through them efficiently. Automation is going to play a key role in security products. Until our security devices are smart enough to distinguish noise from legitimate attacks, we’re not going to be able to keep up. However, reading reports like this can better position our security teams to look in the right place at the right time, cutting down on some of the breaches we see. So, to make a long story short, be sure to read up on the Cisco Annual Security report. It’s written well, loaded with useful data, and helps security professionals stay on top of the security landscape.

In our pursuit of Better IT, I bring you a post on how important data is to functional teams and groups. Last week we talked aboutnti-patterns in collaboration, covering things like data mine-ing and other organizational dysfunctions. In this post we will be talking about the role shared data, information, visualizations, and analytics play in helping ensure your teams can avoid all those missteps from last week.


Data! Data! Data!

These days we have data. Lots and lots of data. Even Big Data, data so important we capitalize it!. As much as I love my data, we can't solve problems with just raw data, even if we enjoy browsing through pages of JSON or log data. That's why we have products like NPM Network Performance Monitor Release Candidate , SAM Server & Applications Monitor Release Candidate and DPADatabase Performance Analyzer RC,  to help us collect and parse all that data.  Each of those products have specialized metrics they collect, meaning they apply to them and visualizations to help specialized SySadmins to leverage that data. These administrators probably don't think of themselves as data professionals, but they are. They choose which data to collect, which levels to be alerted on, and which to report upon. They are experts in this data and they have learned to love it all.

Shared Data about App and Infrastructure Resources

Within the SolarWinds product solutions, data about the infrastructure and application graph is collected and displayed on the Orion Platform. This means that cross-team admins share the same set of resources and components and the data about their metrics. Now we havePerfStack Livecast with features to do cross-team collaboration via data. We can see entities we want to analyze, then see all the other entities related them. This is what I call the Infrastructure and Application Graph, which I'll be writing about later. After choosing Entities, we can discover the metrics available for each of the entities and choose the ones that make the most sense to analyze based on the troubleshooting we are doing now.




Metrics Over Time


Another data feature that's critical to analyzing infrastructure issues is the ability to see data *over time." It's not enough to know how CPU is doing right now, we need to know what it was doing earlier today, yesterday, last week, and maybe even last month, on the same day of the month. By having a view into the status of resources over time, we can intelligently make sense of the data we are seeing today. End-of-month processing going on? Now we know why there might be slight spike in CPU pressure.


Visualizations and Analyses


The beauty of Perfstack is that by choosing these Entities and metrics we can easily build data visualizations of the metrics and overlay them to discover correlations and causes. We can then interact with the information we now have by working with the data or the visualizations. By overlaying the data, we can see how statuses of resources are impacting each other. This collaboration of data means we are performing "team troubleshooting" instead of silo-based "whodunits." We can find the issue, which until now might have been hiding in data in separate products.




So we've gone from data to information to analysis in just minutes. Another beautiful feature of PerfStack is that once we've built the analyses that show our troubleshooting results, we can copy the URL, send it off to team members, and they can see the exact same analysis -- complete with visualizations -- that we saw. If we've done similar troubleshooting before and saved projects, we might be doing this in seconds.

Save Project.png

This is often hours, if not days, faster than how we did troubleshooting in our previous silo-ed, data mine-ing approach to application and infrastructure support. We accomplished this by having quick and easy access to shared information that united differing views of our infrastructure and application graph.


Data -> Information -> Visualization -> Analysis -> Action


It all starts with the data, but we have to love the data into becoming actions. I'm excited about this data-driven workflow in keeping applications and infrastructure happy.

Needles and haystacks have a storied past, though never in a positive

sense. Troubleshooting network problems comes as close as any to the

process to which that pair alludes. Sometimes we just don't know what we

don't know, and that leaves us with a problem: how do we find the

information we're looking for when we don't know what we're looking for?



The geeks over at Solarwinds obviously thought the same thing and decided

to do something to make life easier for those hapless souls frequently

finding themselves tail over teakettle in the proverbial haystack; that

product is PerfStack.


PerfStack is a really cool component piece of the Orion Platform as of

the new 12.1 release. In a nutshell, what it allows you to do is to find

all sorts of bits of information that you're already monitoring, and view

it all in one place for easy consumption. Rather than going from this page

to that, one IT discipline-domain to another, or ticket to ticket,

PerfStack gives you more freedom to mix and match, to see only the bits

pertinent to the problem at hand, whether those are in the VOIP systems,

wireless, applications, or network. Who would have thought that would be

useful, and why haven't we thought of that before?




In and of itself, those features would be a welcome addition to the Orion

suite--or any monitoring suite, for that matter--but Solarwinds took it

one step further and designed PerfStack in such a way that you can create

your own "PerfStacks" on the fly, as well as passing them around for other

people to use. Let's face it, having a monitoring solution with a lot of

canned reporting, stuff that just works right out of the box, is a great

thing, but having the flexibility to create your own reports at a highly

granular level is infinitely better. Presumably you know your environment

better than the team at Solarwinds, or me, or anyone else. You

shouldn't be forced into a modality that doesn't fit your needs.


Passing dashboards ("PerfStacks") around to your fellow team members, or

whomever, is really a key feature here. Often we have a great view to the

domain we operate within, whether that's virtualization, applications,

networking, storage; but we don't have the ability to share

that with other people. That's certainly the case with point products, but

even when we are all sharing the same tools it's not historically been as

smooth a process as it could be. That's unfortunate, but PerfStack goes

a long way toward breaking through that barrier.




There are additional features to PerfStack that bear mentioning: real-time updates to dashboards without redrawing the entire screen, saving of the dashboards, importing in real-time of new polling targets/events, etc. I will cover those details next time, but what we've talked about so far should be enough to show the value of the product. Solarwinds doesn't seem to believe in tiny rollouts. They've come out of the gates fast and strong with this update, and with good reason. It really is a great and useful product that will change the way you look at monitoring and troubleshooting.

Back in the office this week and excited for the launch of PerfStack. If you haven't heard about PerfStack yet, you should check out the webcast tomorrow: PerfStack Livecast


As usual, here's a handful of links from the intertubz I thought you might find interesting. Enjoy!


Cloudflare Coding Error Spills Sensitive Data

A nice reminder about how you are responsible for securing your data, not someone else. Although Cloudflare® was leaking data, a company such as 1Password was not affected because they were encrypting their data in more than one way. In short, 1Password assumed that SSL/TLS *can* fail, and took responsibility to secure their data, rather than relying on someone else to do that for them. We should all be mindful about how we treat our data.


Microsoft Invests in Real-time Maps for Drones, and Someday, Flying Cars

Can we skip autonomous cars and go right to flying cars? Because that would be cool with me. And Microsoft® is doing their part to make sure we won't need to use Apple® Maps with our flying cars.


Expanding Fact Checking at Google

Nice to see this effort underway. I'm not a fan of crowdsourced entities such as Wikipedia, as they have inherent issues with veracity. Would be good for everyone if we could start verifying data posted online as fact (versus opinion, or just fake).


Wikipedia Bots Spent Years Fighting Silent, Tiny Battles With Each Other

Did I mention I wasn't a fan of Wikipedia? As if humans arguing over facts aren't bad enough, someone thought it was a good idea to create bots to do the job instead.



Besides the issue with fact checking, the internet is also a cesspool of misery. Perspective is an attempt to use Machine Learning to help foster better (or nicer) conversations online. I'm curious to see how this project unfolds.


Microsoft Surface: NSA Approves Windows 10 Tablets for Classified Work

Interesting to note here that this is only for devices manufactured by Microsoft, and not other vendors such as HP® or Dell®. What's more interesting to note is how Microsoft continues to make progress in areas of data security for both their devices and the hosted services (Azure®).


Alphabet's Waymo Alleges Uber Stole Self-Driving Secrets

I am simply amazed at how many mistakes Uber® can make as a company and still be in business.


The weather has been warm and Spring-like, so I decided to put the deck furniture out. So now, if it snows two feet next week, you know who to blame:


I’ve discussed the idea that performance troubleshooting is a multi-dimensional problem and that virtualization adds more dimensions. Much of the time it is sufficient to look at the layers independently. The cause of a performance problem may be obvious in an undersized VM or an overloaded vSphere cluster. But sometimes you need to correlate the performance metrics across multiple layers. Worst of all is when the problem is intermittent. Apparently random application slowdowns are the worst to troubleshoot. The few times that I have needed to do this correlation I have always had a sinking feeling. I know that I am going to end up gathering a lot of performance logs from different tools. Then I am going to need to identify the metrics that are important and usually graph these together. There is a sinking feeling when I know I need to get the data from Windows Perfmon, the vSphere client, the SAN, and maybe a network monitor into a single set of graphs.


My go-to tool for consolidating all this data is still Microsoft Excel, mostly because I have a heap of CSV files and want a set of graphs. Consolidating this data has a few challenges. The first is getting a consistent start and finish times for the sample data. The CSV files are generated from separate tools and the time stamps may even be in different time zones. Usually looking at one or two simple graphs identifies the problem time window. Once we know when to look at we can trim each CSV file for the time range we want. Then there are challenges with getting consistent intervals for the graphs. Some tools log every minutes and others every 20 minutes. On occasion, I have had to re-sample the data to the lowest time resolution just to get everything on one graph. That graph also needs to have sensible scales, meaning applying scaling to the CSV values before we graph them. I’m reminded how much I hate having to do this kind of work and how much it seems like something that should be automated.


Usually, when I’m doing this I am an external consultant brought in to deal with a high visibility issue. Senior management is watching closely and demanding answers. Usually, I know the answer early and spend hours putting together the graph that proves the answer. If the client had a good set of data center monitoring tools and well-trained staff, they would not need me. It troubles me how few organizations spend the time and effort in getting value out of monitoring tools.


I have been building this picture of the nightmare of complex performance troubleshooting for a reason. Some of you have guessed the reason, PerfStack will be a great tool to avoid exactly this problem. Seeing an early demo of PerfStack triggered memories. Not good memories.

Ensure your IoT devices are secure, or face the consequences. That’s the message being sent to some hardware manufacturers by the Federal Trade Commission. In the aftermath of the ever-increasing number of attacks perpetrated by compromised IoT devices like routers and cameras, the Federal Trade Commission’s Bureau of Consumer Protection has targeted companies such as TrendNet, Asus, and more recently, D-Link.



Back in 2013, the FTC settled its very first action against a manufacturer of IP-enabled consumer products, TRENDnet. TRENDnet’s SecurView cameras were widely used by consumers for a wide range of purposes including home security and baby monitors. By their product name alone, these products were seemingly marketed as “secure." The FTC accused TRENDnet of a number of issues, including:


  • Failing to use reasonable security to design and test its software
  • Failing to secure camera passwords
  • Transmitting user login credentials in the clear
  • Storing consumers’ login information in clear, readable text on their mobile devices


In January of 2012, a hacker exposed these flaws and made them public, resulting in almost 700 live feeds being posted and freely available on the internet. These included babies sleeping in their cribs.



Once again the FTC fired a shot across the bow at manufacturers of consumer IoT devices when they leveled a complaint against ASUSTek Computer, Inc. This time, the security of their routers was questioned. ASUS had marketed their consumer line of routers with claims they would “protect computers from any unauthorized access, hacking, and virus attacks” and “protect the local network against attacks from hackers.” However, the FTC found several flaws in the ASUS products, including:


  • Easily exploited security bugs in the router’s web-based control panel
  • Allowing consumers to set and retain the default login credentials on every router (admin/admin)
  • Vulnerable cloud storage options AiCloud and AiDisk that exposed consumers’ data and personal information to the internet.


In 2014, hackers used these and other vulnerabilities in ASUS routers to gain access to over 12,900 consumers’ storage devices.



Now, in 2017, the FTC has targeted D-Link Corporation, a well-known manufacturer of consumer and SMB/SOHO networking products. This complaint alleges that D-Link has “failed to take reasonable steps to secure its routers and Internet Protocol (IP) cameras, potentially compromising sensitive consumer information including live video and audio feeds from D-Link IP cameras.”

The FTC complaint goes on to outline how D-Link has promoted the security of its devices with marketing and advertising citing “easy to secure” and “advanced network security," but outlines several issues:


  • Hard-coded login credentials (guest/guest) in D-Link camera software
  • Software vulnerable to command injection that could enable remote control of consumer devices
  • Mishandling of a private code signing key, which was openly available on a public website for six months
  • User login credentials store in clear, readable text on mobile device


The severity of an exposed and vulnerable router is amplified by the fact that these are a home networks’ primary means of defense. Once compromised, everything behind that router is then potentially exposed to the hacker, and the FTC emphasizes, could result in computers, smartphones, IP cameras, and IP-enabled appliances to be attacked as a result.


The DDoS Landscape

According to Akamai’s quarterly State of the Internet report, DDoS attacks continue to flourish and evolve as a primary means to attack both consumers and businesses. In Q4 2016, there was a 140% increase in attacks greater than 100Gbps, a 22 percent increase in reflection-based attacks, and a 6 percent increase in Layer 3 and 4 attacks. At the application layer, a 44% increase in SQLi attacks was observed over the same period. These examples are more evidence that these types of attacks are moving ever upwards in the stack.


Not surprisingly, the United States continues to be the largest source of these attacks, accounting for approximately 28 percent of the global web application attacks in Q4 2016. As IoT devices continue to proliferate at exponential rates, and companies like TREN,Dnet, ASUS and D-Link fail to secure them, these numbers may only increase.


There is hope however, that organizations like the FTC can send a strong message to device manufacturers in the upcoming months as they continue to identify and hold accountable the companies that fail to protect consumers, and the rest of us, from exposed and vulnerable devices.


Do you feel the FTC and FCC (or other government organizations) should be more or less involved in the enforcement of IoT security?

Devops, the process of creating code internally in an effort to streamline the functions of the administrative processes within the framework of the functions of the sysadmin, are still emerging within IT departments across the globe. These tasks have traditionally revolved around the mechanical functions the sysadmin has under their purview. However, another whole category of administration is now becoming far more vital in the role of the sysadmin, and that’s the deployment of applications and their components within the environment.


Application Development is taking on a big change. The utilization of methods like MicroServices, and containers, a relatively new paradigm about which I’ve spoken before, makes the deployment of these applications very different. Now, a SysAdmin must be more responsive to the needs of the business to get these bits of code and/or containers into production far more rapidly. As a result, the SysAdmin needs to have tools in place so as to to respond as precisely, actively, and consistently as possibly. The code for the applications is now being delivered so dynamically, that it now must be deployed, or repealed just as rapidly.


When I worked at VMware, I was part of a SDDC group who had the main goal of assisting the rollouts of massive deployments (SAP, JDEdwards, etc.) to an anywhere/anytime type model. This was DevOps in the extreme. Our expert code jockeys were tasked with writing custom code at each deployment. While this is vital to the goals of many organizations, but today the tools exist to do these tasks on a more elegant manner.


So, what tools would an administrator require to push out or repeal these applications in a potentially seamless manner? There are tools that will roll out applications, say from your VMware vCenter to whichever VMware infrastructure you have in your server farms, but also, there are ways to leverage that same VMware infrastructure to deploy outbound to AWS, Azure, or to hybrid, but non-VMware infrastructures. A great example is the wonderful Platform9, which exists as a separate panel within vCenter, and allows the admin to push out full deployments to wherever the management platform is deployed.


There are other tools, like Mezos which help to orchestrate Docker types of container deployments. This is the hope of administrators for Docker administration.


But, as of yet, the micro-services type of puzzle piece has yet to be solved. As a result, the sysadmin is currently under the gun for the same type of automation toolset. For today, and for the foreseeable future, DevOps holds the key. We need to not only deploy these parts and pieces to the appropriate places, but we also have to ensure that they get pushed to the correct location, that they’re tracked and that they can be pulled back should that be required. So, what key components are critical? Version tracking, lifecycle, permissions, and specific locations must be maintained.

I imagine that what we’ll be seeing in the near future are standardized toolsets that leverage orchestration elements for newer application paradigms. For the moment, we will need to rely on our own code to assist us in the management of the new ways in which applications are being built.

By Joe Kim, SolarWinds Chief Technology Officer


While it’s essential to have a website that is user-friendly, it’s equally important to make sure that the backend technologies that drive that site are working to deliver a fast and fluid performance. In short, good digital citizen engagement combines reliability, performance, and usability to help connect governments and citizens.


Assuming you’ve already developed a streamlined user interface (UI), it’s time to start centering your attention on the behind-the-scenes processes that will help you build and maintain that connection. Here are three strategies to help you achieve this ultimate ideal of form and function.


Closely monitor application performance


Slow or unresponsive applications can undermine federal, state, and local government’s efforts to use online solutions to connect with citizens. What’s the incentive for constituents to use their government’s digital platform if a site is slow and doesn’t easily or quickly lead to information that answers their questions? They might as well pick up the phone or (shudder) pay a visit to their local office.


Monitoring application performance is imperative to helping ensure that your digital platforms remain a go-to resource for citizens. Use application monitoring solutions to track and analyze performance and troubleshoot potential issues. The data you collect can help you identify the causes of problems, allowing you to address them quickly and, ideally, with minimal impact to site performance.


Log and manage events


The feds need to take care to plug any potential holes, and part of that effort entails logging and managing events. Events can range from a new user signing up to receive emails about local information, to potential intrusions or malware designed to collect personal information or compromise website performance.


Put steps in place to monitor any types of website events and requests, and this will help you identify, track, and respond to potential incidents before they do lasting damage. You can monitor and manage unauthorized users, failed login attempts, and other events, and even mitigate internal threats by changing user privileges.


Test site performance


Once application monitoring and log and event management processes are in place, continue to test your website’s performance to ensure that it delivers an optimal and user-friendly experience.


The goal is to identify any slow web services that may be impacting that experience. Use web performance monitoring solutions to identify infrastructure issues and various site elements that could be causing latency issues.


Your site should be tested from multiple devices, locations, and browsers to provide your users with a fast and reliable experience. These tests should be done proactively and on a regular basis to help ensure a consistently optimal performance that delivers on the promise of true citizen engagement.


Remember that as you strive to achieve that promise, it’s important to invest in the appropriate backend solutions and processes to power your efforts. They’re just as important as the surface details. Without them, you run the risk of having your site be just another pretty face that citizens may find wanting.


Find the full article on GovLoop.

Leon Adato

Trite Business Lies

Posted by Leon Adato Expert Feb 27, 2017


When someone brings up the topic of lies at work, most of the time people think of the big stuff, like embezzlement, falsifying data, or fudging fantasy football stats. But there's a whole other category of lying that happens all the time. Not only are these lies tolerated, but through some combination of repetition, vehemence, and mind control, employees of all stripes have come to believe them as true.


Part of the reason for this is that the lies are small (and therefore insidious). They often pass themselves off as sage advice or commonly-understood truisms. At worst, they may appear to be clichés or business tropes. Many are hard to argue with unless you've thought it through.


Nevertheless, they are lies, and especially in this era of #AlternateFacts, deserve to be exposed as such.


So take a look below and see which of these phrases match up with your personal experience or reality. If you have stories, reactions, or your own additions, let us know about them in the comments below!


“Perception is reality”

No, it’s not. I perceive myself to be an AMAZING dancer. But 48 years of evidence (as well as testimony from my wife and children. ESPECIALLY my children!) indicate otherwise.


The truer statement is that perception is powerful, and can override facts. Knowing this is so does NOT mean we must kowtow to a requestor’s built-in perceptions or preconceived notions, but rather that we need to understand those perceptions and build compelling stories that bring the person to the actual truth.


“The customer is always right”

The truth is that customers can be, and often are, wrong. This can be due to the fact that they’ve been misled, or un- (or under-) informed, or one of a hundred other reasons why they are asking for the wrong thing, looking for the wrong solution. And then there’s the rare (but not rare enough) case when they are simply bat-guano crazy and want something stupid.


However, the customer IS always the customer. They are always the one who wants to buy, and we are always the one who wants to sell. Understanding this relationship doesn’t mean we have to sell our soul, ethics, or values to sell our product. Rather, it gives us the freedom to find the right customer for our product, and come to terms with the fact that not EVERYONE is a customer (or at least not a customer right now.)


“Work smarter, not harder”

What am I supposed to think here? That I've been working stupider up until now? That I've willfully withheld a portion of my intellectual capacity? Or that I'm just phoning it in? Regardless of which inference you make, it's not a kind reflection on me, nor on what you think of my work product and work habits.


If you think I stink, or that I'm slacking off, or that I've gotten into a behavioral rut, then please just say it. If you think I'm overlooking something obvious, then say THAT. And if you don't think EITHER of those things, don't say this to fill the dead air.


“Think outside the box”

Like "work smarter, not harder," this one has the one-two punch of an insult AND the implication that I can’t solve a particular problem.


I list this as a lie because there IS no box. I may have fallen into a rut or a set of sub-optimal habits. I might be hyperfocused on a particular outcome or trajectory; I could just be lazy and unwilling to put in the extra effort that thinking about something differently might require.


Whatever the reason for my inability to find a novel solution is, it's not "a box." Calling it that doesn't get me any closer to changing my behavior OR solving the challenge.


“Lean and Mean”

Back in 2004, Bob Lewis translated this as "emaciated and unpleasant" ( and that has stuck with me. Lean is all well and good, as long as we mean "healthy" rather than "underfed.”


But "mean" (unless you mean "average,”which you probably don't) is a trait I would not find advantageous in the workplace. When would it be considered organizationally good to be rude, dismissive, short-tempered, or (to use Lewis' term) unpleasant?


Rather, we should strive for our teams and organizations to be healthy, focused, and determined. You can even throw in “hungry” if you want, as long as it’s not due to a lack of sustenance (i.e., resources).


"I just need you to hold out until..."

In my experience, this is a statement that comes in three month cycles. Just keep working on-call for a little longer. Just keep putting in the extra hours. Just deal with this (last) fire drill.


Just put up with this sub-optimal situation.


After two years of ongoing struggle where the answer was routinely, "This is only going to be until the end of quarter,” I finally confronted a manager about how the situation hadn't gotten better. The situation may have changed, but the workload, sense of crisis, etc. had not improved. His response was surprisingly transparent, "Oh, I didn't mean it would get better. I just meant that it would change."


A company, department, or team that has gotten into the habit of trying to wait out bad situations is one that has given up on solving problems. Just remember that.


If you find yourself on the receiving end of this particular lie, one response is to ask for specifics. Such as:

  1. “Then what,”  as in, what is the situation going to be at the end of the <time period>?
  2. How are we going to get there between now and then?
  3. What are your expectations of me during that period?
  4. What will we do if we miss our target (date or end-state)?


By using these questions, and then the logical follow-up responses along with copious documentation, you are asking management to commit to a set of goals and outcomes (this is a technique also known as “managing up”). It also gives you a fairly accurate barometer as to how hard you should be looking for a better situation.


“This company is like family”

No, it isn't. Most companies are too large to have the group dynamics of a family. A company lacks the long-term history, shared genealogy, etc. that create lasting bonds.


That's not to say that people working at a company can't feel a close friendship and camaraderie. But it's still not family.


Those who make this type of comment are typically trying to instill in you a sense of loyalty to them right before they ask you to do something that goes against your personal interests.


“Human Resources”

I know, I know, this is the name of a department. But the name of the group is a misnomer bordering on a trite falsehood. Almost all of the frustration I've ever had (or heard coworkers have) with HR stems from misunderstanding their core mission.


In my experience, HR exists for two primary reasons:

  1. To keep the company out of lawsuits arising from employee interactions
  2. To shield upper management from the messier aspects of employees, including salary negotiations, grievances, etc.


They are NOT there to help you grow as an employee. They are NOT there to provide a sounding board. They are NOT there to help create a positive work environment.


They are NOT a department designed to help the employee in any way, unless "help" intersects with one of the two areas above. Use them in the same way you would use the legal department. Because that's pretty much what they are an extension of.


“Treat Your Users Like Your Customers”

I have a few issues with this. First, when I run my own company I can choose which customers I want to deal with. I do that by deciding how and where I market, which jobs I accept, and which I'm too busy to take on right now, and by setting a price for each job that reflects the level of effort and aggravation I expect to have while doing the work.


Equally, my customers can choose whether to hire me or the person down the street.


Within a company, NONE of those things are true. I can't say no to the accounting department, and they can't find someone ELSE in the company to provide the same set of services.


In addition, customers and vendors come and go. But Sarah in the mail room today will become Sarah the head of accounting tomorrow. So how I treat her today matters for the duration of our time at this company.


“The squeaky wheel gets the grease”

I learned long ago that this is SOMETIMES true. But more often, the more accurate phrase is "The squeaky wheel gets REPLACED!"


When, how, and to whom one squeaks is a lesson many of us learn by experience (meaning: doing it wrong) over time.


So, anyone who blithely tells you this probably wants one of the following outcomes, none of which will be good for YOU:

  1. They don't realize the truth of the situation either
  2. They have the same complaint and know better than to open their mouths. but they are perfectly willing for you to lead the charge and take all the heat.
  3. They want to watch you make a spectacle of yourself for their entertainment


“The elevator pitch”

Brevity is wonderful when it pushes us to create simple elegance. But often it just causes us to stress more, talk faster, widen the margins, shrink the font, and try to jam more into less.


Effective storytelling is partially about knowing when your forum and format fit the story you want to tell. Otherwise, you end up babbling like a fool and ruining your chances of making a case later on.


Some issues, requests, and explanations are complicated and can't be reduced to a 30-second overview. If you run into someone who demands that all of your interactions, requests, etc. be put in that format, then they're not really listening anyway.


“That's not part of our corporate culture/DNA”

There's a famous story about five gorillas (you can read it here).


Phrases like, "that's now how we do things,” or "That's just how it's done here,” or "It's not part of our DNA,”  are all lies, and all fall under the heading of Not Invented Here (NIH). It is often code for, "I don't want to," or worse, "The thought of doing things that way scares me."


As stated earlier, companies are not family. They also aren't organisms and thus don't have DNA. If they have a culture, it is because the people who make up the company actively choose to propagate a set of habits or a particular perspective when doing business. And culture or no culture, a good idea is a good idea. People want to do well, want to succeed, want to get ahead.


How you respond to this lie depends largely on your stake in doing something differently, and your role in the company.


“If you hire good people, they won't require supervision”

Stephen Covey famously said (,

"If you can hire people whose passion intersects with the job, they won't require any supervision at all. They will manage themselves better than anyone could ever manage them. Their fire comes from within, not from without. Their motivation is internal, not external."


That's great. Let's see if they file their expense forms correctly and on time.


People at all levels of the organizational chart require supervision. What they may not need is meddling middle managers.


The best supervisors are part janitor, part secretary, and part cheerleader. They keep things clean (meaning they ensure an unobstructed path for their staff to pursue work); they attend higher level meetings and report back the information honestly and transparently so that staff can take the actions that support the business goals; and they publicly recognize successes so that their team feels validated in their work.


Also, this is insulting in the same way that "work smarter" and "think outside the box" are. It's a form of managerial "negging" ( It implies that if you DO need supervision, you are obviously not passionate enough about your work and may be the wrong person for the job.


“It's easier to ask forgiveness than permission”

Yes. And all planes are equipped to make a water landing. Well, at least once. Whether they can take off after that is just a minor detail, right?


In an unhealthy environment, people do things on the sly and hope they don't get caught. If they DO get caught,  it seems many believe throwing themselves on the mercy of the court is as good a strategy as any.


But in mature, professional, adult environments, asking for permission is always preferable and always easier for everyone.


“Failure is not an option”

Nope, in some situations (often ones where this lie is uttered), it's practically a sure thing!


There is no guaranteed success. There is no outcome that is 100% predictable. Some failures, once they are in motion, are unavoidable no matter how much planning was done beforehand, or how much staff are on hand during the failure to try to save it.


Not only is this statement a lie, it's also not particularly helpful advice.


“So what's the point?”

My point is certainly not that everything, or even most things, said at work are a lie. What I AM saying is that some of these trite and overused clichés have reached the end of their useful period.


Or as Sam Goldwyn said, "Let's have some new clichés."


Ain't that the truth.

Filter Blog

By date:
By tag: