Skip navigation
1 6 7 8 9 10 Previous Next

Geek Speak

2,105 posts

The story so far:


  1. It's Not Always The Network! Or is it? Part 1 -- by John Herbert (jgherbert)
  2. It's Not Always The Network! Or is it? Part 2 -- by John Herbert (jgherbert)
  3. It's Not Always The Network! Or is it? Part 3 -- by Tom Hollingsworth (networkingnerd)
  4. It's Not Always The Network! Or is it? Part 4 -- by Tom Hollingsworth (networkingnerd)
  5. It's Not Always The Network! Or is it? Part 5 -- by John Herbert (jgherbert)


Things always crop up when you least expect them, don't they? Here's the sixth installment, by Tom Hollingsworth (networkingnerd).


The View From Above: James, CEO


One of the perks of being CEO is that I get to eat well. This week was no exception, and on Tuesday night I found myself at an amazing French restaurant with the Board of Directors. The subject of our recent database issues came up, and the rest of the Board expressed how impressed they were with the CTO's organization, in particular the technical leadership and collaboration shown by Amanda. It's unusual that they get visibility of an individual in that way, so she has clearly made a big impact. Other IT managers have also approached me and told me how helpful she is; I think she has a great career ahead of her here. As dessert arrived and the topic of conversation moved on, I felt my smartwatch buzzing as a text message came in. I glanced down at my wrist and turned pale at the first lines of the message on the screen:


URGENT! We have a security breach...


I excused myself from the table and made a call to find out more. The news was not good. Apparently, we had been sent a message saying that our customer data has been obtained, and it will be made available on the black market if we don't pay them a pretty large sum of money. It made no sense; we have some of the best security tools out there, and we follow all those compliance programs to the letter. At least, I thought we did. How did this data get out? More to the point, would we be able to avoid paying the ransom? And even if we paid it, would the data be sold anyway? If this gets out, the damage to our reputation alone will cause us to lose new business, and I dread to think how many of our affected customers won't trust us with their data any more. The security team couldn't answer my questions, so I hung up and made another call, this time to Amanda.



The View From The Trenches: Amanda (Sr Network Manager)


I used to flinch every time I picked up phone calls from James. Now I can't help but wonder what problem he wants me to solve next. I must admit that I'm learning a lot more about the IT organization around here and it's making my ship run a lot tighter. We're documenting more quickly and anticipating the problems before they happen, and we have the Solarwinds tools to thank for a large portion of that. So I was pretty happy to answer a late evening call from James earlier this week, but this call was different. The moment he started speaking I knew something bad had happened, but I wasn't expecting to hear that our customer data had been stolen and was being ransomed. How far did this go? Did they just take customer data, or have they managed to extract the whole CRM database?


It's one thing to be fighting a board implementing bad ideas, but fighting hackers? This is huge! We're about to be in for a lot of bad press, and James is going to be spending a lot of time apologizing and hoping we don't lose all our customers. James told me that I am part of the Rapid Response Team being set up by Vince, the Head of IT Security, and I have the authority to do whatever I need to do to help them find out how to get this fixed. James says he's willing to pay the ransom if the team is unable to track the breach, but he's worried that unless we find the source, he'll just be asked to pay again a week later. I grabbed my keys and drove to the office.


I had barely sat down at my desk when Vince ran into my office. He was panting as he fell into one of my chairs, and breathlessly explained the problem in more detail. The message from the hacker included an attachment - a 'sample' containing a lot of sensitive customer data, including credit card numbers and social security numbers. The hacker wanted thousands of dollars in exchange for not selling it on the black market, and there was a deadline of just two days. I asked Vince if he had verified the contents of the attachment. He nodded his head slowly. There's no question about it. Somebody has access to our data.


I asked Vince when the last firewall audit happened. Thankfully, Vince said that his team audited the firewalls about once a month to make sure all the rules were accurate. I smiled to myself that we finally had someone in IT that knew the importance of regular checkups. Vince told me that the kept things up to date just in case he had to pull together a PCI audit. I told him to put the firewalls on the back burner and think about how the data could have been exfiltrated. He told me he wasn't sure on that one. I asked if he had any kind of monitoring tool like the ones I used on the network. He told me that he a Security Incident and Event Management (SIEM) tool budgeted for next year. Isn't that always the way? I told him it was time we tried something out to get some data about this breach fast. We only had a couple of days before the hacker's deadline, so we needed to get some idea of what was going on, and quickly.


While the security engineers on the Rapid Response team continued their own investigations, Vince and I downloaded the Solarwinds Log and Event Manager (LEM) trial and installed it on my Solarwinds server. It only took an hour to get up and running. We pointed it at our servers and other systems and had it start collecting log data. We decided to create some rules for basic things, like best practices, to help us sort through the mountain of data we just started digesting. Vince and I worked to put in the important stuff, like our business policies about access rights and removable media, as well as telling the system to start looking for any strange file behavior.


As we let the system do its thing for a bit, I asked Vince if the hacker could have emailed the files out of the network. He smiled and told me he didn't think that was possible because they had just finished installing Data Loss Prevention (DLP) systems a couple of months ago. It had caught quite a few people in accounting sending social security numbers in plain text emails, so Vince was sure that anything like that would have been caught quickly. I was impressed that Vince clearly knew what he was doing. He only took over as Head of IT Security about nine months back, and it seems like he has been transforming the team and putting in just the right processes and tools. His theory was that it was some kind of virus that was sending the data out a covert channel. Being in networking, I often hear things being blamed on the latest virus of the week, so I reserved my judgement until we knew more. All we could do now was wait while LEM did its thing, and the other security engineers continued their efforts as well. By this time it was well after midnight, and I put on a large pot of coffee.


When morning came and people started to come into work, we looked at the results from the first run at the data. Vince noted a few systems which needed to be secured to fall completely within PCI compliance rules. There was nothing major found, though; just a couple of little configurations that were missed. As we scrolled down the list though, Vince found a potential smoking gun. LEM had identified a machine in sales that had some kind of unknown trojan. On the same screen, the software offered the option to isolate the machine until it could be fixed. We both agreed that it needed to be done, so we removed the network connectivity for the machine through the LEM interface until we could send a tech down to remove the virus in person. More and more people were coming online now, so perhaps one of those systems would provide another possible cause.


We kept pushing through the data; we were now 18 hours into the two-day deadline. I was looking over the list of things we needed to check on when a new event popped up on the screen. I scrolled up to the top and read through it. A policy violation had occurred in our removable device policy rule. It looked like someone had unplugged a removable USB drive from their computer, and the system was powered off right after that. I checked the ID on the machine: it was one of the sales admins. I asked Vince if they had a way of tracking violations of the USB device policy. He told me that there shouldn't have been any violations as they had set a group policy in AD to prevent USB drives from being usable. I asked him about this machine in particular. Vince knitted his eyebrows together as he thought about the machine. He told me he was sure that it was covered too, but we both decided to walk down and take a look at it anyway.


We booted up the machine, and everything looked fine as it did the usual POST and came up to the Windows login screen. Wait, though; the background for the login screen was wrong. We have a corporate image on our machines with the company logo as the wallpaper. It wasn't popular but it also prevented incidents with more colorful pictures ... like the one I was looking at right now. Wow. Somehow this user had figured out how to change their wallpaper. I wondered what else this could mean. Vince and I spent an hour combing through the system. There were lots of non-standard things we found; lots of changes that shouldn't have been possible with our group policies (including the USB device policy), and the browser history of the user was clean. Not just clean from a perspective of sites visited, but completely cleared. Vince and I started to think that this system's user was someone we wanted to chat with.


I called James and told him we had a couple of possibilities to check out. He asked us to get back to him quickly; he had notified the rest of the Board, and they were pushing to hear that we had a solution as quickly as possible. Vince and I returned to my office and I scanned the SIEM tool for any new events while Vince contacted one of his team to arrange to have the suspect computer removed and re-imaged. Five minutes in, another event popped up. The same suspect system with the group policy had triggered an event for the insertion of a USB drive. I printed out the event, and Vince and I hurried back to the sales office to find out who had turned the computer on. We found the user hard at work, typing away; until, that is, we walked up to his desk. A flurry of mouse clicks later, he was back at his desktop. Vince asked him if he had anything plugged into his computer that wasn't supposed to be there. The user, a young man called Josh, said that he didn't. Vince showed him the event printout showing a USB drive being plugged in to the computer, but Josh shook his head and said that he didn't know what that was all about.


Vince wasn't having any of it. He started asking the sales admin all about the unauthorized changes on the machine that violated the group policies in place on the network. The sales admin didn't have an answer. He started looking around and stammering a bit as he tried to explain it. Finally, Vince said that he had enough. It was obvious something was going on and he wanted to get to the bottom of it. He told Josh to step away from the computer. Josh stood up and moved to the side, and Vince sat down at the computer, clicking around the system and looking for anything out of place. He glanced at the report from the Solarwinds SIEM tool, which showed that the drive was mounted in a specific folder location and not as a drive. As soon as he started clicking in the folder structure, Josh got visibly nervous. He kept inching closer to the chair and looked like he was about to grab the keyboard. When Vince clicked into the folder structure of the drive, his eyes got wide. Josh's head dropped and he stared resolutely at the carpet.


The post-mortem after that was actually pretty easy. Josh was the hacker who had stolen the information from our database. He had stored a huge amount of customer records on the USB drive and was adding more every day. He must have hit on the idea to ask us to pay for the records as a ransom, and he might have even been planning on selling them even if we paid up, although we'll never know. Vince's team analyzed the hard drive and found the exploits Josh had used to elevate his privileges enough to reverse the group policies that prevented him from reading and copying the customer data. We later found those privilege escalations in the mountain of data the SIEM collected. If we'd only had this kind of visibility before, we might have avoided this whole situation.


James came down to deal with the issue personally. Josh was pretty much frog-marched into a conference room, with James following close behind. The door slammed shut and the ensuing muffled shouting gave me some uncomfortable flashbacks to the day that my predecessor, Paul, was fired. Then Sam from Human Resources arrived with two of our attorneys from Legal in tow, and half an hour later Josh was being escorted from the building. I'm not privy to the exactly what the attorneys had Josh sign, but apparently he won't be making any noise about what he did.


From my perspective, I've built a really good relationship with the security team now, and of course, they've asked to keep Solarwinds Log and Event Manager. LEM paid for itself many times over this week, and there's no question that at some point it will help us avoid another crisis. For now though, James told Vince and I to take the rest of the week off. I'm not going to argue; I need some sleep!



>>> Continue reading this story in Part 7


(image courtesy of Marvel)


...I learned from "Doctor Strange"

(This is the last installment of a 4-part series. You can find part 1 here, part 2 here and part 3 here)


Don't confuse a bad habit that works for a good habit

The Ancient One observes that Strange isn't, "…motivated by power or the need for acclaim. You simply have a fear of failure." He replies, "I guess that fear is what made me a great doctor." She calls him on this little bit of b.s., saying,


"Fear is what has held you back from true greatness. Arrogance and fear still keep you from learning the simplest and most significant lesson of all."


Strange asks, "Which is?"

The answer? "It's not about you."


After 30 years in IT, I've come to realize that our daily work is full of positive rewards for poor choices. We work long hours, come in early after an overnight change control, check systems on our days off, learn new skills for work on our own time, don't venture too far from a network connection, just in case, and so on. We do this because we are rewarded for giving 110 percent. We’re lionized (at least for a moment) when we manage to bring up the crashed system in record time; we receive bonuses and other incentives for closing the largest number of tickets, and so on.


But that doesn't make any of those behaviors good.


I'm not saying that sometimes putting in longer hours, or more effort, or rushing to help rescue a system or team is a bad thing. But our motivation for doing so – like fear of failure – should be identified for what it is and dealt with honestly.


RTFM before you try running commands

After being firmly warned about the perils of manipulating time, Strange grumps, "Why don't they put the warnings before the spell?" Later, he repeats this sentiment as the villain is hoisted on his own mystical petard.


Often, we find a potential solution and rush pell-mell into implementation without testing, or, as in the case with code, you find in the middle of a long forum thread, without reading to the end to find out it doesn't really address your issue, and, in fact, breaks several other things. Or worse, you discover that someone decides to be a smart@ss and tell you the solution is to run rm-fr / as root. If you don't read down to the next post, you may never find that warning that tells you this would erase all the files on your system.


This is the reason all IT pros should know the magical incantation, RTFM.


Being flawed doesn't mean you're broken.

Kaecilius, the villain of the movie, points out at one point that Kamar Taj is filled with broken souls to whom the Ancient One teaches "parlor tricks, and keeps the real power for herself.” While the second half of that sentiment is clearly not true, the first half has some merit. Look closely and you can see that each character you meet in the mystical fortress is flawed, either externally (in the case of Master Hamir, who is missing his left hand) or internally (as with Mordo, battling his inner demons). What is interesting is that, while some of the characters succumb to obstacles related to these flaws, none allow themselves to be defined by those flaws.


It is obvious to the point of cliché that none of us are perfect. Nor have any of us had perfect IT training, or career paths, or experiences. But those flaws, deficiencies, and missteps do not invalidate us as people, nor do they disqualify us as credible sources of IT expertise.


Artist Allie Jenson once said,


"I am proud of my flaws and mistakes. They are the building blocks of my strengths and beauty.”


In fact, the Japanese practice of Kintsugi is the art of taking flaws in an object and emphasizing them to create even greater beauty in the piece.


We need to remind ourselves that the ways in which we live with – and  sometimes overcome –our flaws are often what makes us special.


The path to mastery is not easy, but simple

Sitting at the feet of the Ancient One, Strange despairs of learning the secrets of the magic she offers. "But even if my fingers were able to do that," he says, "How do I get from here..." (indicating where he's sitting) " there." (pointing to where she sits.) She asks, "How did you become a doctor?" He answers, "Study and practice. Years of it."


Over the course of my 30-year career in IT, I’ve had the privilege to work with an astounding number of brilliant minds. These talented engineers and designers have unselfishly passed along hints and secrets on a daily basis. For that, I am sincerely grateful.


Even so, none of what we do comes easily. It requires, as Doctor Strange observed, study and practice, and often years of it to truly develop mastery. And usually in IT, the thing we're trying to master is a moving target, morphing from one form to another as technology continues to evolve at a breakneck pace.


But despite that, the mastery we acquire is rarely as impossible as it feels on that first day when we attempt to write our first line of code, configure our first router, or install our first server.



Even if words aren’t spells, they have power and must be treated with care

In the moments before Strange exposes the secret of the Ancient One's long life, she warns him, "Choose your next words very carefully, Doctor Strange." Not heeding her warning, Strange barrels on. In doing so, he sows the seeds of distrust and anger that ultimately lead to his friend Mordo becoming a lifelong nemesis.


It's important to recognize that nothing that Strange said was wrong. Nor was he wrong in challenging the Ancient One's choices. But doing so publicly, and in anger, and using the words he did, created more problems than he could have ever predicted.


In IT, we place great value in the Truth. In fact, I’ve written about it a lot lately:


But there is a difference between being honest and being insulting; between being assertive and aggressive; between uncovering the truth and exposing faults purely for the sake of diminishing.


It's an undeniable reality that the world has become more crass. Dangerously so, in fact. Not just as IT professionals, but as good faith participants in humanity, we have the ability and responsibility to change that trend, if we can. It means that even when we understand the pure facts, that we, like Doctor Strange, also choose our words very carefully.


Never doubt, diminish, or dismiss your value or importance

Denying that magic exists, Doctor Strange exclaims, "We are made of matter and nothing more. We're just another tiny, momentary speck in an indifferent universe." This is the point at which the Ancient One opens Strange’s eyes to the infinitude of reality, and asks, "Who are you in this vast multiverse, Mr. Strange?" The question is not meant to diminish Strange, but to point out that there is, in fact, a place and role and opportunity for greatness for every living being.


Walk into the convention hall at Cisco Live!, Microsoft Ignite, VMWorld, or CeBIT, and you begin to grasp the enormity of the IT community. In doing so, it's easy to believe that nothing we have to say or contribute is new or even meaningful in any way. We fall into the trap of being a technological Ecclesiastes, thinking there's nothing new under the sun.


The truth is that nothing could be further from the truth. It is our experiences, and our willingness to share them, that makes IT such a vibrant profession and community of individuals. Our struggles provide the motivation for solutions that otherwise would never be imagined. It is the intersection of our humanity with our abilities that create the compelling stories that inspire the next generation of IT professionals.


Did you find your own lesson when watching the movie? Discuss it with me in the comments below.



It takes a radio signal about 1.28 seconds to get to the Moon (about 239,000 miles away), and about 2.5 seconds for round trip communication between our secret moon base and Earth. So, therefore this common SQL Server error message number 833...


SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [E:\SQL\database.mdf] in database [database]. The OS file handle is 0x0000000000000000. The offset of the latest long I/O is: 0x00000000000000


...implies that the round trip time is over 15 seconds, so using 7.5 seconds (as a minimum estimate, we really don't know how long it is taking) we see the underlying SAN disks are over 1,396,500 miles away, or about 5.8 times as far away as the Moon. No, I don't have any idea how they got there, either. But how else to explain this error? For all I know this SAN could be on Mars!


Now, I've seen this error message many times in my career. The traditional answers you find on the internet tell you to look at your queries and try to figure out which ones are causing you the I/O bottleneck. In my experience, this guidance was wrong more than 95% of time. In fact, this is the type of guidance that usually results in people wanting to just throw hardware at the problem. I've seen that error message appear with little to no workload being run against the instance.


In my experience the true answer was almost always "shared storage" or "the nature of the beast that is known as a SAN". Turns out that when several servers share the same storage arrays you can end up being a victim to what is commonly called a "noisy neighbor". One workload, on one particular server, causing performance pain for a seemingly unrelated server elsewhere.


What's more frustrating is that sometimes the only hint of the issue is with the SQL Server error message. Often the conventional tools used to monitor the SAN don't necessarily show the problem, as they are focusing on the overall health of the SAN and not on the health of specific applications, database servers, or end-user experience.


And just when I thought I had seen it all when it comes to the error message above, along comes something new for me to learn.


Snapshot and Checkpoints

No, they aren't what's new. I've been involved with the virtualization of database servers for more than eight years now and the concept of snapshots and checkpoints are not recent revelations for me. I've used them from time to time when building personal VMs for demos and I've seen them used sparingly in production environments. Why the two names? To avoid confusion, of course. (Too late.)


The concept of a snapshot or checkpoint is simple: to create a copy of the virtual machine at a point in time. The reason for wanting this point in time copy is simple as well: recovery. You want to be able to quickly put the virtual machine back to the point in time created by the snapshot or checkpoint. Think of things like upgrades or service packs. Take a snapshot of the virtual machine, apply changes, sign off that everything looks good, and remove the snapshot. Brilliant!


How do they work?

For snapshots in VMWare, the documentation is very clear:

When you create a snapshot, the system creates a delta disk file for that snapshot in the datastore and writes any changes to that delta disk.

So, that means the original file(s) used for the virtual machine become read-only, and this new delta file stores all of the changes. To me, I liken this to the similar "copy-on-write" technology in database snapshots inside of SQL Server. In fact, this VMWare KB article explains the process in the same way:

The child disk, which is created with a snapshot, is a sparse disk. Sparse disks employ the copy-on-write (COW) mechanism, in which the virtual disk contains no data in places, until copied there by a write.

OK, so we know how they work, so let's talk about their performance impact.


Are they bad?

Not at first, no. But just like meat left out overnight they can become bad, yes. And the reason why should be very clear: the longer you have them, the more overhead you will have as the delta disk keeps track of all the changes. Snapshots and checkpoints are meant to be a temporary thing, not something you would keep around. In fact, VMware suggests that you keep a snapshot for no more than 72 hours, due to the performance impact. Here's a brief summary of other items from the "Best practices for virtual machine snapshots in the VMware environment" KB article:


  • Snapshots are not backups, and do not contain all the info needed to restore the VM. If you delete the original disk files, the snapshot is useless.
  • The delta files can grow to be the same size as the original files. Plan accordingly.
  • Up to 32 snapshots are supported (unless you run out of disk), but you are crazy to use more than 2-3 at any one time.
  • Rarely, if ever, should you use a snapshot on high-transaction virtual machines such as email and database servers.
  • Snapshots should only be left unattended for 24-72 hours, and don't feed them after midnight, ever.


OK, I made that last part up. You aren't supposed to feed them, ever, otherwise they become like your in-laws at Christmas and they will never leave.


So, snapshots and checkpoints can have an adverse affect on performance! And I found out about it through this Spiceworks thread, then from other articles on the internet that detailed this very same issue.


So this performance issue wasn't exactly an unknown, but rather new to me since I hadn't come across issues related to snapshots or thought to check for them in production. And, from what I can tell, most people don't have this experience either, hence the reason for scratching our heads when we see the affects of snapshots and checkpoints on our database servers.


Do I have one?

I don't know, you'll need to look for yourself. For VMware, you have three methods as detailed in this KB article:


1. Using vSphere

2. Using the virtual machine Snapshot Manager

3. Viewing the virtual machine configuration file

4. Viewing the virtual machine configuration file on the ESX host


Yes, that's four things. I didn't write the KB article. You can read it for yourself. Consider number 4 to be a bonus option or something. Or maybe they meant to combine the last two. Again, I didn't write the article, I'm just pointing you to what it says.


Now, for Hyper-V, we can look at the Hyper-V Manager GUI as well, which is essentially similar to using vSphere. But we could also use the Hyper-V Powershell cmdlets as listed here. In fact, this little piece of code is all you really need:

PS C:\> get-vm "vmName" | get-vmsnapshot

Also worth mentioning here is that Virtualization Manager tracks snapshots as well. You can find information about sprawl and snapshots here.



Snapshots and checkpoints are fine tools for you to use, but when you are done with them you should get rid of them, especially for a database server. Otherwise you can expect to see a lot of disk latency and high CPU as a result. And should you see such things but your server team reports back that everything looks normal, I hope this post will stick in your head enough for you to remember to go looking for any rogue snapshots that may exist.

With Christmas just under two weeks away most of the corporate world is in what I call "holiday mode", that period of time when work needs to get done but the urgency wanes as everyone is forced to balance work and holiday tasks. Toss in a few snow days that close or delay school and it's easy to see how work schedules can be hectic for a period of time well beyond the holiday season.


Of course that won't stop me from putting together the Actuator each week. So here's a bunch of links I found on the Intertubz that you may find interersting, enjoy!


'Crime as a Service' a Top Cyber Threat for 2017

Just a reminder that things can, and will, get worse before they get better.


Microsoft to offer option of 16 years of Windows Server, SQL Server support through new Premium Assurance offer

Just what we wanted for Christmas, six more years of supporting Windows 2008 and SQL Server 2008!


Six maps that show the anatomy of America’s vast infrastructure

Because I like maps and I think you should too, here's one showing roads, railroads, and even the state of disrepair of our bridges. I'd like to see these same graphs over time, to get a sense if we are falling further behind on infrastructure upkeep.


Who needs traditional storage anymore?

Eventually we will reach a point where all the "nerd knobs" are taken away. We won't be tuning hardware. The traditional resource bottlenecks (memory, CPU, disk, network) will be out of our hands.


We’re set to reach 100% renewable energy — and it’s just the beginning

Say what you want about Google, but their efforts in this area are quite admirable.


How I detect fake news - O'Reilly Media

The downside to this is the amount of effort it takes to verify a story. Trusted resources are hard to come by these days. This is especially true when most "news" programming is essentially editorials and opinions. Gone are the days of merely reporting on an event, now we are subjected to an endless (and mindless) spouting of opinions AS facts, leading to mass confusion for everyone.


Using AWS Lambda to call and text you when your servers are down

Hey folks, I just wanted you to know that tools like this already exist. No need to reinvent the wheel here, you know.


A primer on blockchain

In case you were wondering what all the hype was about blockchain, here's an easily digestible infographic.


Total Cost of Ownership (TCO) Calculator

In case you needed to provide some data to your CFO on reasons to migrate to the cloud.


Last week in Orlando I delivered 4 sessions in 3 days at 2 events, but this was by far my favorite:

thwack - 1.jpg

It can be tough to get a good handle on government agencies’ increasingly complex database environments. Today, federal database administrators are in charge of everything ranging from on-premises solutions to cloud or hybrid systems. DBAs are like the central nervous system of the human body -- they are in charge of disseminating information throughout the entire agency.


That’s a big responsibility, and things are not going to get much easier anytime soon. The amount of data will skyrocket, and concerns surrounding security, efficiency and cost will continue. Fortunately, there are a few ways DBAs can reduce headaches and database management complexities.


1. Make sure that everything is on the same page, especially when it comes to application response times.


In order to streamline processes, it’s vitally important to ensure that all databases have a common set of goals, metrics and service-level agreements. Acceptable application response times will vary depending on unique needs.


Work with management to determine appropriate response times, and then implement the solutions that can deliver on that agreement. If applications aren’t responsive, or databases aren’t doing their jobs, then productivity and uptime could be significantly impacted, affecting the delivery of the agency’s mission.


2. Carefully document your processes and implement log and event management.


To help keep a close eye on all of the data that’s passing through a network and to ensure its security, establish a documentation system. Begin by documenting a consistent set of processes for database backup and restore, data encryption, detection of anomalies and potential security threats.


Log and event management tools can send alerts when suspicious activity is spotted in the log data. By doing so, you’ll be able to respond to them in a timely manner and automatically kill suspicious applications.


3. Reduce workload costs by planning ahead.


If you are considering moving to the cloud, there are a couple of things to keep in mind. First, carefully map out a strategy and establish guidelines. Be sure to deploy on a certified platform, and plan everything to ensure that the transition is seamless.


Second, consider moving to cloud solutions with lower licensing costs or to open source software, which is often less expensive. Remember that the goal of a DBA is not only to help provide colleagues with better, faster and more secure data access, it’s also to help save the agency money.


4. Keep things in perspective so you don’t go crazy.


No one said database administration was going to be easy. Government data is a tough business, and it’s only going to get tougher.


But, it can also be incredibly rewarding. Think of it: DBAs are the foundation of everything that happens in the agency. They control where the information goes, whether or not critical applications are working properly and, in effect, how effectively the agency completes its mission.


Yes, a DBA’s role is extremely complex. But making a few simple adjustments can reduce that complexity, ensuring that information keeps pumping and the agency’s vital operations stay healthy.


Find the full article on Government Computer News.

The story so far:


  1. It's Not Always The Network! Or is it? Part 1 -- by John Herbert (jgherbert)
  2. It's Not Always The Network! Or is it? Part 2 -- by John Herbert (jgherbert)
  3. It's Not Always The Network! Or is it? Part 3 -- by Tom Hollingsworth (networkingnerd)
  4. It's Not Always The Network! Or is it? Part 4 -- by Tom Hollingsworth (networkingnerd)


Easter is upon the team before they know it, and they're being pushed to make a major software change. Here's the fifth installment, by John Herbert (jgherbert).


The View From Above: James (CEO)


Earlier this week we pushed a major new release of our supply chain management (SCM) platform into production internally. The old version simply didn't have the ability to track and manage our inventory flows and vendor orders as efficiently as we wanted, and the consequence of that has been that we've missed completing a few large orders in their entirety because we have been waiting for critical components to be delivered. Despite the importance of this upgrade to our reputation for on-time delivery (not to mention all the other cost savings and cashflow benefits we can achieve by managing our inventory on a near real-time basis), the CTO has been putting this off for months because the IT teams have been holding back on giving the OK. Finally the Board of Directors had enough with the CTO's push back, and as a group we agreed that there had been plenty enough time for testing, and the directive was issued that unless there were documented faults or errors in the system, IT should proceed with the new software deployment within the month.


We chose to deploy the software over the Easter weekend. That's usually a quieter time for our manufacturing facilities, as many of our customers close down for the week leading up to Easter. I heard grumbling from the employees about having to work on Easter, but there's no way around it. The software has to launch, and we have to do whatever we need to do to make that happen, even if that means missing the Easter Bunny.


The deployment appeared to go smoothly, and the CTO was pleased to report to the Board on Monday morning that the supply chain platform had been upgraded successfully over the weekend. He reported that testing had been carried out from every location, and every department had provided personnel to test their top 10 or so most common activities after the upgrade so that we would know immediately if a mission-critical problem had arisen. Thankfully, every test passed with flying colors, and the software upgrade was deemed a success. And so it was, until Tuesday morning when we started seeing some unexplained performance issues, and things seemed to be getting worse as the day progressed.


The CTO reported that he had put together a tiger team to start troubleshooting, and opened an ongoing outage bridge. This had the Board's eyes on it, and he couldn't fail now. I asked him to make sure Amanda was on that team; she has provided some good wins for us recently, and her insight might just make the difference. I certainly hope so.


The View From The Trenches: Amanda (Sr Network Manager)


With big network changes I've always had a rule for myself that just because the change window has finished successfully, it doesn't mean the change was a success, regardless of what testing we might have done. I tend to wait a period of time before officially calling the change a success, all the while crossing my fingers for no big issues to arise. Some might call that paranoia, and perhaps they are right, but it's a technique that has kept me out of trouble over time. This week has provided another case study for why my rule has a place when we make more complex changes.


Obviously I knew about the change over the Easter weekend; I had the pleasure of being in the office watching over the network while the changes took place. Solarwinds NPM made that pretty simple for me; no red means a quiet time, and since there were no specific reports of issues, I really had nothing to do. On Monday the network looked just fine as well (not that anybody was asking), but by Tuesday afternoon it was clear that there were problems with the new software, and the CTO pulled me in to a war room where a group of us were tasked to focus on finding the cause of of performance issues being reported with the new application.


There didn't seem to be a very clear pattern to the performance issues, and reports were coming in from across the company. On that basis we agreed to eliminate the wide area network (WAN) from our investigations, except at the common points, e.g. the WAN ingress to our main data center. The server team was convinced it had to be a network performance issue, but when I got them to do some ping tests from the application servers to various components of the application and the data center, responses were coming back in 1 or 2ms. NPM also still showed the network as clean and green, but experience has taught me not to dismiss any potential cause until we can disprove it by finding what the actual problem is, so I shared that information cautiously but left the door open for it to still be a network issue that simply wasn't showing in these tests.


One of the server team suggested perhaps it was an MTU issue. A good idea, but when we issued some pings with large payloads to match the MTU of the server interface, everything worked fine. MTU was never really a likely cause--if we had MTU issues, you'd have expected the storage to fail early on--but there's no harm in quickly eliminating it, and that's what we were able to do. We double checked interface counters looking for drops and errors in case we had missed something in the monitoring, but those were looking clean too. We looked at the storage arrays themselves as a possible cause, but checking Solarwinds Storage Resource Monitor we confirmed that there were no active alerts, there were no storage objects indicating performance issues like high latency, and there were no capacity issues, thanks to Mike using the capacity planning tool when he bought this new array!


We asked the supply chain software support expert about the software's dependencies. He identified the key dependencies as the servers the application ran on, the NFS mounts to the storage arrays and the database servers. We didn't know about the database servers, so we pulled in a database admin and began grilling him. We discovered pretty quickly that he was out of his depth. The new software had required a shift from Microsoft SQL Server to an Oracle database. This was the first Oracle instance the DB team had ever stood up, and while they were very competent monitoring and administering SQL Server, the admin admitted somewhat sheepishly that he really wasn't that comfortable with Oracle yet, and had no idea how to see if it was the cause of our problems. This training and support issue is something we'll need to work on later, but what we needed right then and there was some expertise to help us look into Oracle performance. I was already heading to the Solarwinds website because I remembered that there was a database tool, and I was hopeful that it would do what we needed.


I checked the page for Solarwinds' Database Performance Analyzer (DPA), and it said: Response Time Analysis shows you exactly what needs fixing - whether you are a database expert or not. That sounded perfect given our lack of Oracle expertise, so I downloaded it and began the installation process. It wasn't long before I had DPA monitoring our Oracle database transactions (checking them every second!) and starting to populate data and statistics. Within an hour it became clear what the problem was; DPA identified that the main cause for performance problems was occurring on database updates, where entire tables were being locked rather than using more a granular lock, like row-level locking. Update queries were being forced to wait while the previous query executed and released the lock on the table, and the latency in response was having a knock-on effect on the entire application. We had not noticed this at the weekend because the transaction loads were so low out of normal business hours that this problem didn't raise its head. But why didn't this happen on Monday? On a hunch I dug into NPM and looked at the network throughput for the application servers. As I had suspected, the Monday after Easter showed the servers handling about half the traffic that hit it on the Tuesday. At a guess, a lot of people took a 4-day weekend, and when they returned to work on Tuesday, that tipped the scales on the locking/blocking issue.


While we discussed this discovery, our supply chain software expert had been tapping away on his laptop. You're not going to believe this, he said, It turns out we are not the first people to find this problem. The vendor says that they posted a HotFix for the query code about a week after this release came out, but I just checked, and we definitely do not have that HotFix installed. I don't know how we missed that, but we can get it installed overnight while things are quiet, and maybe we'll get lucky. I checked my watch; I couldn't believe it was 7.30PM already. We really couldn't get much more done that night anyway, so we agreed to meet at 9AM and monitor the results of the application of the HotFix.

The next morning we met as planned, and watched nervously as the load ramped up as each time zone came on line. By 1PM we had hit a peak load exceeding Tuesday's peak, and not a single complaint had come in. Solarwinds DPA now indicated that the blocking issue had been resolved, and there were no other major alerts to deal with. Another bullet dodged, though this one was a little close for comfort. We prepared a presentation for the Board explaining the issues (though we tried not to throw the software expert under the bus for missing the HotFix), and presented a list of lessons learned / actions, which included:


  • Set up a proactive post-change war-room for major changes
  • Monitor results daily for at least one week for changes to key business applications
  • Provide urgent Oracle training for the database team (the accelerated schedule driven by the Board meant this did not happen in time)
  • Configure DPA to monitor our SQL Server installations too


We wanted to add another bullet saying "Don't be bullied by the Board of Directors into doing something when we know we aren't ready yet", but maybe that's a message best left for the Board to mull on for itself. Ok, we aren't perfect, but we can get better each time we make mistakes, so long as we're honest with ourselves about what went wrong.



>>> Continue reading this story in Part 6

I’m a little late, but I wanted to do a quick wrap-up of last week’s challenge.

Day 3: Search

Richard Phillips  Dec 5, 2016 12:28 PM

Reminded us that it’s not all about immediate gratification: “In a day when search means typing something into a browser and getting back an answer we need to remember that the search isn't just about getting an answer, it's about learning and gaining data that can be used now and in the future.”


Meanwhile, EBeach

Waxed philosophical with the quote: “A man travels the world over in search of what he needs and returns home to find it – George A. Moore.”


And tomiannelli Added some philosophical thoughts of his own, but closed with:

“Remember that sometimes not getting what you want is a wonderful stroke of luck.”

  ― Dalai Lama XIV


Day 4: Understand

mlotter pointed out that “A true mark of Successful person is the ability to listen to understand instead of listening with the intent to reply.”


Not to be outdone, mtgilmore1 Invoked the first man on the moon, with, "Mystery creates wonder and wonder is the basis of man's desire to understand."  Neil Armstrong


And THWACK MVP  Countered one word with another when he said, “To me learning something new requires one of two two things acceptance or understanding.” And followed it up with a clip of Comedian Michael Jr as he explains the power of Why.


Michael Jr: Know Your Why - YouTube


Day 5: Accept

desr Kept it short and sweet: “Accept who you are and what you are that is all that matters. If you accept yourself your true beauty will shine.”


silverwolf started off with enthusiasm: “Working in IT, Studying in IT,  even waay back in elementary school...or even before that, I've always known I would be involved with IT way or another, with computers...with technology! I Accepted that fact a looong time ago. I mean... it was all so COOL!  It STILL IS! It's an ADVENTURE!” and then brought a series of awesome Star Trek memes.


And in what I hope is a purposeful mis-spelling, wbrown said, “Sometimes we just have to except that everything has an acception”


Day 6: Believe

Several folks started out their entries with some type of definition of the word of the day, but bleggett managed to weave the word into their analysis of the definition of the word (how meta): “Interestingly, the etymology of believe is not quite what I expected.  Not unbelievable, but I believe it's credible. 

Online Etymology Dictionary


  1. also started off with a definition, but with a bit more detail: “Belief is objective. Unfortunately, today belief is too often accepted as subjective. By quick definition:

An objective perspective is one that is not influenced by emotions, opinions, or personal feelings - it is a perspective based in fact, in things quantifiable and measurable. A subjective perspective is one open to greater interpretation based on personal feeling, emotion, aesthetics, etc.”


sparda963 Dec 6, 2016 4:42 PM reflected on how belief informs his work: “I am not much of a believer honestly. I don't take it on faith that something is going to work unless I know it is going to work. Far to many people who I have worked with in IT over the years believe that just because they did something that it will work exactly as their believe it will. This often does not turn out to be the case.”


Day 7: Choose

Day 7 is noteable because it marked the first post by another member of the SolarWinds staff – in this case Head Geek Destiny Bertucci. In response, Richard Phillips highlighted one of the biggest career divides that IT Pro’s encounter: “Ahh, Decisions, Decisions. I've heard lots of complaints over the years "That boss couldn't do my job half as good as I do and he makes so much more money" But what you see in successful people and those that "climb the ladder." is their ability to make decisions, quickly and decisively. They own the fact that they won't always be the best or sometimes even the correct decision, but they own it and own the results. That's what keeps them moving and that's why people follow them.”


prowessa chose to use their entry to express their support: “I choose to go the write path and be a better Thwackster.”


And Michael Kent quoted one of the greatest sysadmins in history (if you judge by the length of his beard), Albus Dumbledore, who said “Dark times lie ahead of us, and there will be a time when we must choose between what is easy and what is right”


Day 8: Hear

Radioteacher  Dec 11, 2016 1:29 AM

Related the word to his experience on the set of THWACKcamp this year: “I am in the white first year ThwackCamp shirt below. When sqlrockstar spoke on stage right I would stare at the back of Patrick's head knowing that from the cameras point of view it would look like I was looking at sqlrockstar. In some ways that it made it easier to focus on his voice, hearing what was said and reacting.”


Meanwhile, nedescon reflected on how hearing loss has affected their perception of the value of receiving information: “I think the most important thing to take away is there is a lot of noise that will get in the way if you let it. I truly am the only thing that I have any power or influence over. In other words, I can give my attention, but I can't always get another's.”


And  succinctly pointed out the difference between hearing and listening: “hearing is one of the 5 senses of the human body...there is no skill involved. Listening is a skill and an art, and requires focus and commitment...and separates managers from leaders.”


Day 9: Observe

SolarWinds product marketing specialist Diego Fildes Torrijos submitted the lead essay for day 9, analyzing the way we observe the world around us (and how often we choose not to).


  1.  pointed out how the power of observation can serve us both personally and as monitoring enthusiasts: “The response to the "It's the [insert scapegoat here...but we know they are going to say 'network']" is a tool like Orion that allows you to observe your environment in near real time, coorelate data points and makee a logic deduction. Observer the world around you and make wise decisions. Observe the environment you are monitoring and make informed decisions.”


Then jamison.jennings  Offered a suggestion for the next SolarWinds product line: “In the future, when AI becomes more of a regular integral part of IT, SolarWinds should reserve OBSERVE for a name for their AI module. Observe will be the watcher for changes that happen to nodes on the network and learn and adapt to the ever changing network. New interface turned up over night during a maintenance window, Observe automagically starts monitoring. Drives removed and new ones added, no problem. Observe sees those "down" volumes and will go and look for new ones and add them in without relying on a scheduled network discovery.”


And Steve Hawkins  Used a quote by Andrew Carnegie to elaborate on how observation can inform our understanding: "As I grow older, I pay less attention to what men say. I just watch what they do." Sometimes the most important statements people make is how they react to a particular situation.


Keep the amazing comments coming (they’re worth 200 THWACK points each day) and tune in every day for the next essay challenge. Thank you to all the amazing contributors!

Starting Thursday, I'll be in Israel to meet some customers, attempt to eat my body weight in kosher shwarma, and speak at DevOpsDays Tel Aviv.


Since I'll be tweeting about it (@LeonAdato and @DevOpsDaysTLV) incessantly I figured I would give you all a heads up and let you know what I hope to achieve and hope to learn. You know, besides how much shwarma I can eat before it kills me. But what a way to go!


First, very much like my time at DevOpsDays Ohio, I hope to continue to have conversations about monitoring in a world of "cattle, not pets."


Second, I am looking forward to soaking up as much knowledge as I can as our industry continues the shift from on-premises to cloud. Seeing how companies big and small are adapting to the new reality of computing is both exciting to me as a veteran of IT and a source of great insight for where monitoring may be going in the future.


Finally, I am eager to see how the flavor of DevOps changes outside of the United States. You see, even within the U.S., there are nuances. In Austin, the crowd was almost entirely developers-who-do-ops. But in Ohio, it was 70% operations folks who were coming to grips with how they've also become developers. So I expect the event in Tel Aviv is going to teach me some more about this amazing, vibrant, and diverse community.


More to come on this after the event next week!

The story so far:


  1. It's Not Always The Network! Or is it? Part 1 -- by John Herbert (jgherbert)
  2. It's Not Always The Network! Or is it? Part 2 -- by John Herbert (jgherbert)
  3. It's Not Always The Network! Or is it? Part 3 -- by Tom Hollingsworth (networkingnerd)


The holidays are approaching, but that doesn't mean a break for the network team. Here's the fourth installment of the story, by Tom Hollingsworth (networkingnerd).


The View From Above: James (CEO)


I'm really starting to see a turn around in IT. Ever since I put Amanda in charge of the network, I'm seeing faster responses to issues and happier people internally. Things aren't being put on the back burner until we yell loud enough to get them resolved. I just wish we could get the rest of the organization to understand that.


Just today, I got a call from someone claiming that the network was running slow again when they tried to access one of their applications. I'm starting to think that "the network is slow" is just code to get my attention after the unfortunate situation with Paul. I decided to try and do a little investigation of my own. I asked this app owner if this had always been a problem. It turns out that it started a week ago. I really don't want to push this off on Amanda, but a couple of my senior IT managers are on vacation and I don't have anyone else I can trust. But I know she's going to get to the bottom of it.



The View From The Trenches: Amanda (Sr Network Manager)


Well, that should have been expected. At least James was calm and polite. He even told me that he'd asked some questions about the problem and got some information for me. I might just make a good tech out of the CEO after all!


James told me that he needed my help because some of the other guys had vacation time they had to use. I know that we're on a strict change freeze right now, so I'm not sure who's getting adventurous. I hope I don't have to yell at someone else's junior admin. I decided I needed to do some work to get to the bottom of this. The app in question should be pretty responsive. I figured I'd start with the most basic of troubleshooting - a simple ping. Here's what I found out:


icmp_seq=0 time=359.377 ms

icmp_seq=1 time=255.485 ms

icmp_seq=2 time=256.968 ms

icmp_seq=3 time=253.409 ms

icmp_seq=4 time=254.238 ms


Those are terrible response times! It's like the server is on the other side of the world. I pinged other routers and devices inside the network to make sure the response times were within reason. A quick check of other servers confirmed that response times were in the single digits, not even close to the bad app. With response times that high, I was almost certain that something was wrong. Time to make a phone call.


Brett answered when I called to the server team. I remember we brought him on board about three months ago. He's a bit green, but I was told he's a quick learner. I hope someone taught him how to troubleshoot slow servers. Our conversation started off as well as expected. I told him what I found and that the ping time was abnormal. He said he'd check on it and call me back. I decided to go to lunch and then check in on him when I got finished. That should give him enough time to get a diagnosis. After all, it's not like the whole network was down this time, right?


I got back from lunch and checked in on Brett The New Guy. When I walked in, he was massaging his temples behind a row of monitors. When I asked what was up, he sighed heavily and replied, "I don't know for sure. I've been trying to get into the server ever since you called. I can communicate with vCenter, but trying to console into the server takes forever. It just keeps timing out."


I told Brett that the high ping time probably means that the session setup is taking forever. Any lost packets just make the problem worse. I started talking through things at Brett's desk. Could it be something simple? What about the other virtual machines on that host? Are they all having the same problem?


Brett shrugged his shoulders. His response, "I'm not sure? How do I find out where they are?"


I stepped around to his side of the desk and found a veritable mess. Due to the way the VM clusters were setup, there was no way of immediately telling which physical host contained which machines. They were just haphazardly thrown into resource pools named after comic book characters. It looked like this app server belonged to "XMansion" but there were a lot of other servers under "AsteroidM". I rolled my eyes at the fact that my network team had strict guidelines about naming things so we could find it at a glance, yet the server team could get away with this. I reminded myself that Brett wasn't to blame and kept digging.


It took us nearly an hour before we even found the server. In El Paso, TX. I didn't even know we had an office in El Paso. Brett was able to get his management client to connect to the server in El Paso and saw that it contained exactly one VM - The Problem App Server. We looked at what was going on and figured that it would work better if we moved it back to the home office where it belonged. I called James to let him know we fixed the problem and that he should check with the department head. James told me to close the ticket in the system since the problem was fixed.


I hung up Brett's phone. Brett spun his chair back to his wall of monitors and put a pair of headphones on his head. I could hear some electronic music blaring away at high volume. I tapped Brett on the shoulder and told him, "We're not done yet. We need to find out why that server was halfway across the country."


Brett stopped his music and we dug into the problem. I told Brett to take lots of notes along the way. As we unwound the issues, I could see the haphazard documentation and architecture of the server farm was going to be a bigger problem to solve down the road. This was just the one thing that pointed it all out to us.


So, how does a wayward VM wind up in the middle of Texas? It turns out that the app was one of the first ones ever virtualized. It had been running on an old server that was part of a resource pool called "SavageLand". That pool only had two members: the home server for the app and the other member of the high availability pair. That HA partner used to be here in the HQ, but when the satellite office in El Paso was opened, someone decided to send the HA server down there to get things up and running. Servers had been upgraded and moved around since then, but no one documented what had happened. The VMs just kept running. When something would happen to a physical server, HA allowed the machines to move and keep working.


The logs showed that last week, the home server for the app had a power failure. It rebooted about ten minutes later. HA decided to send the app server to the other HA partner in El Paso. The high latency was being caused by a traffic trombone. The network traffic was going to El Paso, but the resources the server needed to access were back here at the HQ. So the server had to send traffic over the link between the two offices, listen for the response, and then send it back over the link. Traffic kept bouncing back and forth between the two offices, which saturated the link. I was shocked that the link was even fast enough to support the failover link. According to Brett's training manuals, it barely met the minimum. We were both amused that the act of failing the server over to the backup cause more problems than just waiting for the old server to come back up.


Brett didn't know enough about the environment to know all of this. And he didn't know how to find the answers. I made a mental note to talk to James about this at the next department meeting after everyone was back from vacation. I hoped they had some kind of documentation for that whole mess. Because if they didn't, I was pretty sure I knew where I could find something to help them out.



>>> Continue reading this story in Part 5

Wow, can you believe it? 2016 is almost over, the holidays are here I didn’t even get you anything!   It’s been a bit of a wild rollercoaster of a year through consolidation, commoditization, and collaboration!


I’m sure you have some absolute favorite trends or notable things which have occurred here throughout 2016.  Here are some that in particular have been a pretty recurring trend throughout the year.



  • Companies going private such as Solarwinds (closed in February), DellEMC (closed in September)
  • Companies buying other companies and consolidating industry like Avago buying Broadcom (Closed Q1), Brocade buying Ruckus (Closed Q3), Broadcom buying Brocade (Initiated in October)
  • Or companies divesting of assets like Dell selling off SonicWall and Quest, and Broadcom selling off Brocade’s IP division



Alright so that’s some of the rollercoaster at least a small snapshot of it, and the impact those decisions will have on practitioners like you and I only time will tell (I promise some of those will be GREAT and some of those, not so much!)


But what else, what else?! Some items I’ve very recently discussed include.



All three of these net-net benefit in the end really means that we will continue to see better technology, with deeper investment and ultimately (potentially) lower costs!


On the subject of Flash though if you haven’t been tracking the Density profiles have been insane this year alone and that trend is only continuing with further adoption and better price economics with technology like NVMe.  I particularly love this image as it reflects the shrinking footprint of the data center while reflecting our inevitable need for more.


Moores Law of Storage.png



This is hardly everything that happened in 2016 but these are particular items which are close to my heart and respectively my infrastructure.   I will give a hearty congratulation to this being the 16th official “year of vdi” a title we continue to grant it yet continues to fail to fulfill on its promises.  


Though with 2016 closing quickly on our heels there are a few areas you’ll want to be on the watch for in 2017!


  • Look for Flash Storage to get even cheaper, and even denser
  • Look to see even more competition in the Cloud space from Microsoft Azure, Amazon AWS and Google GCP
  • Look to Containers to become something you MIGHT actually use on a regular basis and more rationally than the very obscure use-cases promoted within organizations
  • Look to vendors to provide more of their applications and objects as Containers (EMC did this with their ESRS (Secure Remote Support)
  • Obviously 2017 WILL be the Year of VDI… so be sure to bake a cake
  • And strangely with the exception of pricing economics making adoption of 10GigE+ and Wireless wave2 we’ll see a lot more of the same as we saw this year, maybe even some retraction in hardware innovation
  • Oh and don’t forget, more automation, more DevOps, more “better, easier, smarter”


But enough about me and my predictions, what were some of your favorite and notable trends of 2016 and what are you looking to see coming forward looking to 2017?


And if I don’t get a chance to… Happy Holidays and a Happy New Year to ya’ll!

After the network perimeter is locked down, servers are patched, and password policies enforced, end-users themselves are the first line of defense in IT security. They are often the target for a variety of attack vectors making them the first step of triage when a security incident is suspected. Security awareness training, which should be a part of any serious IT security program, should be based in common sense, but what security professionals consider common sense isn’t necessarily common sense for the average end-user.


In order to solve this problem and get everyone on the same page, end-users need the awareness, knowledge, and tools to recognize and prevent security threats from turning into security breaches. To that end, a good security awareness program should be guided by these three basic principles:


First, security awareness is a matter of culture.


Security awareness training should seek to change or create a culture of awareness in an organization. This means different things to different security professionals, but the basic idea is that everyone in the organization should have a common notion of what good security looks like. This doesn’t mean that end-users know how to spot suspicious malformed packets coming into a firewall, but it does mean that it’s part of company culture to be suspicious of email messages from unknown sources or even from known sources but with unusual text.


The concerns of the organization’s security professionals need to become part of the organization's culture. This isn’t a technical endeavor but a desire to create a heightened awareness of security concerns among end-users. They don’t need to know about multi-tenant data segmentation or versions of PHP, but they should have an underlying concern for a secure environment. This is definitely somewhat ambiguous and subjective, but this is awareness.


Second, security awareness training should empower end-users with knowledge.


After a culture of security awareness has been established, end-users need to know what to actually look for. A solid security awareness program will train end-users on what current attacks look like and what to do when facing one. This may be done simply with weekly email newsletters or required quarterly training sessions.


End-users need to actually learn why it’s not good to plug a USB stick found in the parking lot into their computer, and users need to get a good feel for what phishing emails look like. They should know that they can hover over a suspicious link and sometimes see the actual hidden URL, and they should know that even that can be faked.


Ultimately, they need to know what threats look like. The culture of awareness makes them concerned, and knowledge gives them the ability to identify actual problems in the real world.


Third, security awareness training is concerned with changing behavior.


The whole point here is that end-users take action when there is suspicion of malicious activity. Security awareness training is useless if no one takes action and actually acts like the first line of defense they really are (or can be).


A good security awareness program starts with culture, empowers end-users with knowledge, and seeks to change behavior. This means making significant effort to provide end-users with clear directions for what to do when encountering a suspected security incident. Telling users to simply “create a ticket with the helpdesk” is just not enough. End-users need clear direction as to what they can actually do in the moment when they are dealing with an issue. This is where the whole “first line of defense” becomes a reality and not just a corporate platitude.


For example, what should end-users actually do (or not do) when they receive a suspected phishing email? The directions don’t need to be complicated, but they need to exist and be communicated clearly and regularly to the entire organization.


Security awareness training is the most cost-effective part of a security program in that it doesn’t require purchasing millions of dollars of appliances and software licenses. There is a significant time investment, but the return on investment is huge if done properly. A strong security awareness training program needs to be based in common sense, change culture, empower end-users with knowledge, and change behavior.


(image courtesy of Marvel)


...I learned from "Doctor Strange"

(This is part 3 of a 4-part series. You can find part 1 here and part 2 here)


Withhold judgment and give respect when seeking answers

Standing outside the door to Kamar Taj, having just been saved from muggers, Strange is still glib and sarcastic about the nature of the environment he is in. Mordo stops him and says,


"I was in your place, once. I, too, was disrespectful. So might I offer you some advice? Forget everything that you think you know."


Recently, I was involved in a discussion about monitoring containers. I said,  "Maybe I'm being naive, but it seems like we already solved this problem. Only it was 2001 and we called it LPARs running on AIX." There was some nervous laughter, a few old-timers got the joke, and the rest of the group explained how containers were completely different, and all that old stuff wouldn't apply.


I wrote about this a year ago ("Respect Your Elders") and that sentiment still holds true. If you are not willing to give respect and credence to older ideas (if not older IT pros), then you are going to insult a lot of people, miss a lot of solutions, and spend a lot of extra time fixing old problems all over again.


Redundancy is your friend

In the movie, we discovered that the world is protected from mystical threats by three Sanctum Sanctorums, located in London, Hong Kong, and New York. When London falls, the world is still protected by the other two. Only after Hong Kong falls can the world be overwhelmed by hostile forces.


The message to us in IT is clear: failover systems, disaster recovery plans, high availability solutions, and the rest are all good things.


To say any more about this would be redundant.


Find a teacher and trust them to lead you

Stephen Strange travels to Kathmandu, to the mystical school of Kamar Taj, and meets the Ancient One. His mind is opened to the existence of magic in the world, and he begs to be accepted as a student. The Ancient One then guides Strange in his journey to master the mystical arts, monitoring his progress and helping him avoid pitfalls along the way. Later, she rebukes him by saying, "When you came here you begged me to teach you. Now I'm told you question every lesson and prefer to study on your own."


The correlating lesson for us in IT is that many of us tend to fall into the trap of solitary study. We find our education in the form of online blog posts, web-based tutorials, and PDFs. But there is something to be said for having a teacher, a mentor who understands you; where you started, where you'd like to go, how you learn best, and what your shortcomings are. If you are learning a single skill, self-directed learning is a great way to go. But when you are thinking about your career, it's worth taking the time to find a trusted advisor and stick with them. They will often see things in you that you cannot see in yourself.


Be comfortable with confusion

At one point in the story, Strange complains, "This doesn't make any sense!" The Ancient One replies, "Not everything does. Not everything has to." The point in the movie is that Strange has to let go of his need for things to make sense before he engages with them. Sometimes it needs to be enough to know that something simply is, regardless of how. Or that something works a particular way, irrespective of why.


"Yes, but now I know how it works," is what I say after I've burned hours de-constructing a perfectly working system. It's not that the education wasn't important, it's that it may not have been important at that moment. When our need for things to make sense impedes our ability to get on with our daily work, that's when we need to take a step back and remember that not everything has to make sense to us now, and that inevitably, some things in IT will never make sense to us.


When events pull you a certain direction, take a moment and listen

In the middle of a fight, Strange reaches for an axe hanging on the wall, only to have his semi-sentient cloak pull him toward a different wall. Despite repeated attempts to get the weapon, the cloak insistently pulls him away, until Strange finally realizes that the cloak is trying to tell him about an artifact that would restrain, rather than harm, his opponent. (For comic book geeks, those were a more down-to-earth version of the Crimson Bands of Cytorrak).


Despite our best laid plans and deepest desires, sometimes life pushes us in a different direction. This isn't strictly relegated to our career plans. Sometimes you believe the best solution lies with a particular coding technique, or even a specific language. Or with your chosen hardware platform, a trusted vendor, or even a specific software package.


And yet, despite your rock-solid belief that this is the best and truest way to achieve your goal, you can't seem to get it done.


In those moments, it's useful to look around and see where events are pushing you. What is over there? Is it something useful?


Even if others label it useless, be proud of the knowledge you have

During surgery, the anesthesiologist quizzes Doctor Strange on his musical knowledge, asking him to identify Chuck Mangione's hit, "Feels So Good." Later on, in an aside that goes by too fast for many in the audience, Strange tells his colleague that he traveled to Kathmandu. She asks "Like the Bob Seger song?" He responds, "Beautiful Loser album, 1975, A-side, third cut? Yes. In Nepal."


No, having this knowledge didn't help our hero save the day, but it was still a tangible part of who he was. Strange is a gifted doctor, an unapologetically arrogant ass, a talented sorcerer... and an unashamed music geek.


We in IT have to remember that we are also whole people. We're not just storage engineers or SysAdmins or pen testers or white hat hackers. We have other aspects of our lives that are important to us, even if they aren't central or even relevant to the plot of our story. They provide richness and depth of character. We shouldn't lose sight of that, and we shouldn't ignore our need for hobbies, interests, and non-IT outlets in our life.


Did you find your own lesson when watching the movie? Discuss it with me in the comments below. And keep an eye out for parts 4, coming next week.

It would be really easy to just post this link to Sam Harris’ TED talk and say “Discuss!” Sam Harris: Can we build AI without losing control over it?


But for you, busy people, let me distill some of Sam’s points and add a few of my own.

Sam does a brilliant job of pointing out that we’re not as worried about the impact of artificial intelligence as we should be.


“Death by science fiction is fun. Death by famine is not fun.”


If we knew we would all die in 50 years from a global famine, we’d do a heck of a lot to stop it. Sam is concerned that there’s a risk to humans once artificial intelligence surpasses us and it will, it’s only a matter of time.


"Worrying about AI safety is like worrying about over population on Mars."


So, we’re using a time frame as an excuse? That we shouldn’t worry our pretty little heads about it because it won’t occur in our lifetime? In half my lifetime, I’ve gone from having an Amstrad CPC 6128 running DOS to now carrying the Internet in my pocket. Also, I have kids and hopefully one day grandkids, so I’m a little worried for them.


Information processing is the source of intelligence.  And we wouldn't consider for a moment the option that we'd ever stop improving our technology. We will continue to improve our intelligent machines until they are more intelligence than we are and they will continue to improve themselves.


Elon Musk’s OpenAI group released Universe this week, providing a way for machines to play games and use websites like humans do. That's not a big deal if you’re not worried about the PC beating you at GTA V. It's a slightly bigger deal if you are a travel agent and the machines can now use comparison websites and book the cheapest fare without you. And while you’d hopefully have a more compass screening what you’d do online, do the machines have one? Could they get cheeky and ship their enemies glitter, or something more sinister?


Robert "Uncle Bob" Martin (author of The Clean Coder and other books), sets out 10 rules for software developers that he calls "The Scribe's Oath". One of those rules is you will not write code that does harm. But the issue isn't that a human will write code that shuts down a city's water treatment plant. The issue is that we're writing code that constructs deep learning neural networks, allowing machines to make decisions by themselves. We're enabling them to become harmful on their own, if we're not able to code a sense of morals, ethics and values into them.


Then we get into the ethical debates. If there's only two outcomes for an incident with a self-driving car, one that preserves the life of the driver and one that preserves the life of a pedestrian or another car driver, which one should the machine choose? Do we instil a human-like self-preservation/survival instinct?


Is this all the fault of (or a challenge for) the software developer? How does this apply to systems administrators & systems architects?


We've talked about autonomic computing before. If we are configuring scripted and self-healing systems, are we adding to the resilience of the machines and will this ultimately be detrimental to us? How outlandish does that seem right now though - that we'd enable machines to be so self-preserving that they won't even die if we want them too? We've even laughed in these comments about whether the machines will let us pull the power plug on them. Death by science fiction is funny.  But the machines can now detect when we are lying, because we built them to be able to do that. Ooops.


Technosociologist Zeynep Tufekci says “We cannot outsource our responsibilities to machines. We must hold on ever tighter to human values and human ethics." Does this means we need to draw a line about what machines will do and what they won’t do? Are we conscious in our AI developments about how much power and decision making control we are enabling the machines to have, without the need for a human set of eyeballs and approval first? Are we building this kind of safety mechanism in? With AI developments scattered among different companies with no regulation, how do we know that advances in this technology are all ethically and morally good? Or are we just hoping they are, relying on the good nature of the human programmers?


Ethics in AI has come up a few times in the comments of my articles so far. Should we genuinely be more worried about this than we are? Let me know what you think.



Updating your Active Directory Schema is something that needs to be done from time to time whether we like it or not. It is done to either support a new version of the OS Domain controller or because an AD integrated application such as Exchange, Skype for Business or SCCM requires the update. Regardless of the reasons the mere mention of an Active Directory ( AD) Schema update would make administrators cringe. The dreaded fear of the schema update is mostly due to the fact that this an update that cannot be undone. There is no uninstall button that allows you to reverse your changes. Things would get complicated if you have AD-integrated applications or have third party applications that also extended your schema.


Active Directory is like a beating heart


For those not sure what Active Directory is, it is a database of objects that represents users, computers, groups etc in your network, as well as being used for authentication and authorization. The schema is the component of Active Directory that defines all the objects with classes and attributes. For each version of Windows Server Domain Services, for instance the schema is different between AD 2003 and AD 2008 and AD 20012. When you introduce a new Domain controller with newer version OS you will need to update your schema.



I sometimes refer to AD as the heart of the network. The flow of the network, your enterprise objects, pass through this beating heart and if it has a brief hiccup or is slowed down it can affect the overall function of your network. Users not being able to login to their computers can have major impacts to the business and productivity loss can cost lost dollars. A non-working heart can be almost paralyzing for some businesses.


Upgrade all things NOW!


If there is mention of a schema update most would tend to delay an upgrade until they felt it was “safe”. Now this push of new product releases every 18 -24 months by Microsoft, it has introduced a re-thinking of sorts. In effort to reduce the fear and increase upgrades they have made these schema updates a little less painful and sometimes almost transparent. With each new release they simplify and make it easier to deploy and update.


With Windows server 2012 they made that process simpler by simplifying the upgrade process. The functions of with adprep and /forestprep, /domainprep have now been wrapped up into the Active Directory Domain Services role installation process making the process much easier through a few click of next. You can still use the command and do it manually if you want to be old school.



Schema updates are almost required for every Exchange Service Pack or major CU update now. The same can be said for other Microsoft applications such as Skype for Business and SCCM. They have made it so easy that in some cases, by installing the Application update such as a CU for Exchange 2013 the schema update process was built into the application. Given that the account you were using to run the Exchange update had all the appropriate permissions to update AD the schema, the update would be easy and seamless.

I think the level of fear of schema updates has decreased somewhat in past several years with administrators having to do it more often and the process to update keeps getting easier by Microsoft. Now if you have third party applications that extend your schema that may not be as pain free. As with any upgrade/update, you should always plan accordingly and test as much as possible, even the simple point and click ones.

I'm in Orlando this week for SQL Live as well as the Orlando SWUG meeting. With three sessions to deliver at SQL Live, working the booth there, and the session at the SWUG it is going to be a busy week. If you are in the Orlando area I hope you can stop by and say hello.


Anyway, here's a bunch of links I found on the Intertubz that you may find interersting, enjoy!


Canadian Money May Contain Animal Fat, Bank Of Canada Confirms

Gives new meaning to "put your money where your mouth is", because it's so tasty!


Everything announced at AWS re:Invent 2016

Oh, yeah, AWS: reInvent was this past week, in case you didn't know. Here's the list of everything they announced. I like the idea of Snowmobile, wonderful marketing gimmick. And now I know why Amazon operates with such thin margins, so they can afford things like Snowmobile.


Apple is reportedly using drones to beat Google Maps

Well, if the drones use Apple Maps, this might take a while.


34 tips to boost iPhone & iPad battery life

After fighting with applications slowly consuming the memory on my iMac this week I figured I would share some tips on battery life for your iPhone. Yes, I am assuming you have an iPhone, because I know you don't have a Windows Phone, but many of the tips work for Android as well.


Five Things You Need to Know About the U.K.’s Mass Surveillance Law

In case you had not heard about this, but the UK government has made it legal to do some questionable data collections.


NYPD plans to expand Smart car fleet to replace scooters

Not only are these cars adorable to look at, but I'm guessing they are a stepping stone to autonomous patrolling and data collection.


Big Data Poised to Get Much Bigger in 2017

All I want for Christmas is data, of course. Let's just hope it doesn't sit out and ROT all year.


Am I in the Christmas spirit yet? You be the judge:


Filter Blog

By date:
By tag: