Skip navigation
1 2 3 4 5 Previous Next

Geek Speak

2,431 posts

It's good to be home after two weeks on the road, and just in time for a foot of snow. How do I unsubscribe from Winter?


As always, here are some links from the Intertubz that I hope will hold your interest. Enjoy!


Geek Squad's Relationship with FBI Is Cozier Than We Thought

If you are committing a crime, and someone finds out you are committing a crime, they have every right to notify the authorities. There is no such thing as a "Geek Squad client privilege."


Audit finds Department of Homeland Security's security is insecure

They literally have the word "Security" in their title. Oh, the irony.


MoviePass removes 'unused' location feature that tracked cinema-goers' movements

But not until after their CEO bragged about how they were tracking everyone. Stay classy, MoviePass.


Half of All Orgs Hit with Ransomware in 2017

Well, half of the folks that reply to this survey, sure. But clickbait headlines aside, there is one important fact in this article. The fact that paying a ransom does not guarantee you get your data. If you don't have backups, you will have problems.


Fake News: Lies spread faster on social media than truth does

There's a link in this article to the MIT research paper, which is a bit of a longer read, but worth your time if you are interested. I think humans have a need to be "in the know" ahead of others, and this leads to our innate desire to spread false information faster than the truth (because we assume the truth is already known, or boring, perhaps). I think Paul McCartney said it best: "Sunday's on the phone to Monday. Tuesday's on the phone to me."


Waymo self-driving trucks are hauling gear for Google data centers

It's either self-driving trucks, or filming for the Maximum Overdrive reboot has begun.


Cory Doctorow: Let's Get Better at Demanding Better from Tech

This. So much this. The world of tech advances at an accelerated rate. It's time we find a way to demand better from tech, before we dig too deep a hole.


LIve-action footage of me shoveling snow yesterday:


Risk Management is an important part of IT. Being able to identify risks and remediation options can make a huge difference if or when disaster strikes. If you've moved part of or all of your enterprise to Office 365, you now have no control over a large portion of your IT environment. But what sorts of risks do you face, and how do you deal with them?




It has happened in the past where Office 365 has become unavailable for one reason or another. There is also a very high likelihood of it happening again in the future. One of the great things about using a cloud-based platform such as Office 365 is that enterprise IT doesn't need to maintain large amounts of the infrastructure. One of the big downfalls is that is still their problem to deal with. But what sorts of implication could this have?


What is your organization's plan if, all of a sudden, Exchange Online is unavailable? Will it grind things to a halt, or will it be a minor inconvenience? The same holds true for services such as SharePoint. If all of your critical marketing material is in SharePoint Online and the service goes down, will your salespeople be left high and dry?




Not all risk is equal. Chances are that the risk of a user deleting a document won't have the same impact as something like inbound email coming to a halt. That is why you need to measure these risks. You'll want to consider the likelihood of an event occurring, and what the impact will be if it does.


Why is this step important? By performing an assessment, you'll be able to identify areas that you can mitigate, or possibly eliminate, risks. Knowing their impact is extremely important to justify priorities, as well as budgets.




As enterprise customers, we can't control how Microsoft maintains their services. But what we can do is understand what our critical business processes are, and build contingency plans for when things fall apart.


Let's use an inaccessible Exchange Online service as an example. How can you mitigate this risk? If you are running a hybrid deployment, you might be able to leverage your on-premises services to get some folks back up and running. Other options might be services from Microsoft partners. There are, for example, services that allow you to use third-party email servers to send and receive emails if Exchange Online goes offline. When service returns, the mailboxes are merged, and you can keep chugging along like nothing happened.


If you measured your risks ahead of time, you'll hopefully have noted such a possibility.




Service availability isn't the only risk. Data goes missing. Whether it is "lost," accidentally deleted, or maliciously targeted, data needs to be backed up. If you've moved any data into Office 365, you need to think about how are you going to back it up. Not only that, but what if you have to do a large restore? How long would it take you to restore 1 TB of data back into SharePoint? What impact would that window have on users?


Although a lot of the "hands-on" management is removed from IT shops when they migrate to Office 365, that doesn't mean that their core responsibilities are shifted. At the end of the day, IT staff are responsible for making sure that users can do their jobs. Just because something is in the cloud doesn't mean that it will be problem free.

By Paul Parker, SolarWinds Federal & National Government Chief Technologist


Here is an interesting article from my colleague Joe Kim, in which he explores database heath and performance.


Part of the problem with managing databases is that many people consider database health and performance to be one and the same, but that’s not necessarily the case. Let’s take a closer look at these terms.


Health versus performance: What’s the difference?


Health and performance are certainly closely related, even interconnected. But assuming they are one in the same is potentially a recipe for disaster. If you’re homed in exclusively on your database’s health, you may be overlooking critical metrics that affect your database’s performance. Here’s why:


Database health is inclusive of data points. When you take into consideration such factors as CPU utilization, I/O statistics, and memory pressure, you can determine if your database is capable of proper performance. But these metrics alone cannot confirm that the system’s performance is running optimally.


Database performance integrates an element of time measurement to explain how database queries are being executed. It’s this time component that comes into play when talking about true performance.


Diagnosing the root cause: Database performance management best practices


Identifying the true root cause of database performance issues is the goal of every federal database manager. And yet, without the proper metrics in hand, you lack the tools necessary to resolve more comprehensive problems.


That said, let’s take a closer look at some best practices that take into account both health and performance to create efficient, well-optimized database processes.


Acquire data and metrics. You need granular metrics like resource contention and a database’s workload to identify the root cause of a performance issue. Without good, deep intelligence, you lack the ability to troubleshoot accurately and effectively.


Establish meaningful data management. Every database manager has his or her own way of arranging data, but the key is to arrange it in a way that will help you quickly identify and resolve the root cause of a potential problem. Establishing a system that allows you to do so quickly can help keep your databases running efficiently.


Triangulate issues. The ability to triangulate makes it easy to answer all-important questions regarding who, what, when, where, and why. These questions help you determine the details of a performance issue. Understanding who and what was impacted by poor performance and what caused the impact are important to know.


Review execution plans. Query optimizers are critical database components that analyze Structured Query Language (SQL) queries and determine efficient execution for those queries. The problem is that optimizers can be a bit of a black box; it’s often difficult to see what’s going on inside of them.


Establish a baseline. It’s impossible to tell if your database isn’t performing optimally if you lack a baseline of normal, day-to-day performance.


It’s a marathon, not a sprint


IT pros want their databases to be in good health and to perform optimally. While both are equally important, it’s the end result that matters.


So, make sure you are looking at all criteria of database health and performance. If you are deploying the best practices and tools to help ensure the overall health and performance of your database, your stakeholders will thank you for it.


Find the full article on our partner DLT’s blog Technically Speaking.

Leon Adato

Traveling With Joy

Posted by Leon Adato Expert Mar 12, 2018

Recently, two people I respect very much tweeted about travel, and how to remain positive and grateful while you do it. You can read those tweets here ( and here (


When I saw Jessica's first tweet, I wanted to respond, but thought, "She doesn't need my noise in her twitter feed. But when Josh jumped in with his thoughtful response, I had to join in. If you prefer tweets, you can find the starting point here. For old-fashioned folks who still like correct spelling, complete sentences, and non-serialized thoughts, read on:


First, you need to understand that I have some very strong opinions about how someone should carry themselves if they are lucky enough to get to do "exciting" travel for work. When I say exciting travel, I mean:

  • Travel to some place that YOU find exciting
  • Travel that someone ELSE might find exciting


Here's why I feel so strongly:


As I've written before (, my Dad was a musician. His combination of talent, youth, and connections (mostly talent) gave him the opportunity to join a prestigious orchestra, one that traveled extensively from the time he joined (in 1963) until he retired 46 years later. My dad went everywhere. He was escorted through Checkpoint Charlie twice in the 60s. He wandered around cold-war, iron-curtain Moscow around the same time. He traveled to Australia, Mexico, all over Europe, and, of course, to almost every state in the United States.


It was a charmed life. To be sure, he worked hard to get where he was and made sacrifices along the way. But at the end of the day, he got to play great music with talented colleagues in front of sell-out audiences around the world. It was SO remarkable, that people sometimes had a hard time believing that was all he did.


Because I would "go to work" with him from time to time (which meant a lot of sitting in the green room, wandering backstage, and standing next to him during intermission when he'd come out for some fresh air, I was privy to him meeting audience members without really being part of their conversation, which would often follow a very specific pattern:


"So what do you do during the day?" they'd ask, figuring that he--like the musicians they probably knew--did this as a side gig while they worked an office job or plied a trade to pay the bills. When they found out that this was ALL he did, that he got paid a living wage to perform music, their sense of amazement increased. That's when they would begin asking (i.e. gushing) about the traveling. While some of these people were well-off, many were folks who often had never left the state where they were born, let alone the country, let alone been on a plane. That's when it became hard to watch.


He'd shrug and say, "I get on a plane, sleep, get off the plane, get on the bus, go to the hall, rehearse, eat, play the concert, get on a bus, go to the next town, sleep, get up, rehearse, eat, play. I could be in Timbuktu or Topeka."


From my fly-on-the-wall vantage point, I'd watch the other person deflate. They had hoped to feel a sense of wonder imagining the exotic, the special. Instead, they had the dawning recognition that they might as well have been talking to a plumber about the stores he visits. (No disrespect to plumbers. You folks rock.)


As I grew up and settled into a career in IT, I never thought I'd have the kind of work that would give me opportunities to travel the way my dad did. Which is why, years later, I stood crying under the Eiffel tower. Not because of the wonder of the structure, but for the miracle that I was standing there AT ALL. I was overwhelmed by the sheer impossible magic of being in a role where traveling from Cleveland, Ohio to Paris was possible in any context other than a once-in-a-lifetime, piggy-bank-breaking vacation.


A three-month project in Brussels followed Paris. A year in Switzerland came after that. In between were shorter trips, no less inspiring for being closer to home. Just getting onto a plane and taking off was an adventure in itself.


And through it all were the people. As Jessica said in her tweet, "Thousands of unseen humans help me get to my destination." I was meeting these people, hearing their stories, and being asked to tell mine.


In those moments--in the Lyft on the way to the airport; checking in at the hotel; sitting next to someone on the shuttle to the car rental area--I'm reminded of those moments when I stood next to my dad during intermission. While there are many things about the man that I admire, he's not infallible, and there are definitely habits of his that I choose not to emulate. This is one of them.


So I try to write (sometimes more than is strictly required of me) when I go to new and different places. When I have the time and focus, I write before I go about what I hope to see/do/learn; and then I write again afterward, detailing what I saw, who I met, and how it went.


As Head Geek for SolarWinds, I write these essays partly because it's actually my job. (Best. Job. Ever.) But I also do it because I'm aware that jobs like mine are unique. I want to provide a vicarious experience for those who might want it, so that they can share a sense of wonder about the exotic, the special.


I also write so that, if someone has chosen to forego these types of opportunities, either due to ambivalence, anxiety, or uncertainty, that maybe they might find motivation, reassurance, or insight; that in reading about my experiences, they might realize they have more to gain than they thought.


Finally, I write about my travels for myself. To remind me that, like both Jessica and Josh said, in each trip, thousands of things go right and thousands of people are helping me get where I need to go. To remind me of the wonder, the exotic, the special.


And the blessing.

In the final blog of this series, we’ll look at ways to integrate Windows event logs with other telemetry sources to provide a complete picture of a network environment. The most common way of doing this is by forwarding event logs to a syslog server or SIEM tool.


The benefits of telemetry consolidation are:

  1. Scalability and performance – log collectors are built for and focused on collecting logs.
  2. False Positive Reduction – some events, even if they generate an alert, are not meaningful on their own. By combining them with other events in a query, the security analyst can determine if there was a compromise. For example multiple login failures on their own must be examined in conjunction with other events to rule out threat versus driver error.
  3. Determination of the extent of a compromise – attack detected and verified, the next step is to look for lateral movement, the route of entry to the asset initially compromised, any user specific data gleaned from the activities, failure of a security element such as a firewall or IPS to detect the issue, or conversely threat blocked at a specific point due to the successful application of the security policy. Visibility across the breadth of the organization is critical to incident response and remediation.


Windows Event Logs to a Syslog Host and Beyond


The following is an example of forwarding Windows event logs to a syslog server and from there pushing these events to a basic SIEM tool. I’m showing SolarWinds Event Log Forwarder to Kiwi Syslog Server to ELK (Elasticsearch, Logstash, Kibana) because they are great tools for illustrating the process, and they are all free in their basic form, which means you can have some gratuitous fun testing things out.


Step 1: Configure the event log forwarder agent on the host that is collecting the Windows event logs (refer to last week’s blog for configuring forwarding and collection).


Define the transport to the syslog server.



Define the event log subscription, which is the list of events to be sent to the syslog server.


Step 2: The syslog server should be configured to listen on the correct port. It will receive those events defined in the subscription above.


Step 3: The syslog server can be configured to forward events to another device, such as a SIEM tool. The example below shows how to configure an action that will forward the Windows events from the syslog server via syslog to another host. The events may have an RFC 3164 syslog header appended to them to indicate the original IP of the syslog server (useful if NAT may change the source address of the IP datagram), or you can send the syslog message using the IP of the original source of the event. Another option is to use just the original source IP address of the syslog host. This decision often relies on how the receiving host application process and indexes events.



Step 4: Install the SIEM tool, in this case Elasticsearch, Logstash and Kibana, known as the ELK stack, are installed and configured. There are some references for accomplishing this at the end of this blog.


The key concepts to bolt them together include defining a Logstash-simple.config file that takes an input (for example the TCP/514 events coming from your Syslog server), and outputs those to Elasticsearch which indexes your event data. Localhost:9200 is the default setting.


input {

     tcp {

         port => 514



output {

elasticsearch { hosts => ["localhost:9200"] }



Once Kibana is installed it will be your user interface for viewing, indexing, searching and visualizing your events. By default it runs on localhost:5601.



Your Windows logs can then become part of an overall view of all the telemetry sources and types in your network, viewable and searchable through a single interface. This enables you to build queries across all your data types. By correlating events you increase the fidelity of your investigations by adding visibility.



Working example of a threat hunt


The following table summarizes the types of information that can be gathered and analyzed from a single-pane of glass provided by a log aggregator with good search and index capabilities or a SIEM tool or service.

In this case, the initial trigger is a potential suspicious lateral movement within an organization. When investigating such an event, it’s important not to treat it as an isolated incident, even if you receive only one trigger or alert. Correlation is the key to eliminating false positives. Remember the goal is to rule out false positives, and if the threat is legitimate, you must understand the extent of the attack and when and where it began.




Indicators of Compromise


Detect unusual host to host activity

528, 529, 4624, 4625: Type3 (network) or 10 (RDP) login/logout

Network Information:
Collect Calling Workstation information Name:

Source Network Address: IP
Source Port:  Port

Verify Privilege Escalations

552, 4648

Runas or privilege escalation

Account Whose Credentials Were Used:
Account Domain: DOMAIN

Verify Schedule Tasks

602, 4698

Unusual task names scheduled and quickly deleted

Scheduled Task created:
File Name: Name

Command: Cmd

Triggers: When run

Verify PS Exec

601, 4697

Remote code execution at CMD line following service installation

Attempt to install service:
Service Name: Internal Svc Name
Service File Name: path/name
*Service Type: Code
*Service Start Type: Code

Check VirusScan logs on Hosts

Filenames, Process name, Hashes

Activities may have been attempted by other tools on the host detected and blocked.

Check Firewall Policy

Network access policies on AAA devices

Audit logs on other critical assets

Event Timestamps, IPs, Usernames

Determine if a FW or other security element should be modified to stop further attacks based on IP addresses, ports, or other IOCs

Pull Malicious File Hashes

SHA-256 etc Submit to Sandbox or Analysis Tool

Derive other IOCs representative of this malware and search events for other occurrences and better idea of time attack may have started.

Failure of rule-based element

Set of verifiable IoCs

Update rulesets, virus.dat’s, signature sets. Patch known vulnerabilities.

*The sc query command will show you information on the active services on a workstation


From this example you can see it’s a best practice to start small by reacting to the initial trigger and from here collect other important artifacts that will help you cast a wider net across the entire network. Some of these artifacts will also help you to become more proactive as IoCs can be mapped to security policies and rule sets and applied to key security elements.


Windows logs are an important tool in your attack detection toolbox. Hopefully this series has given you some useful information on best practices and deployment.


Recommended References:

I’m in Redmond this week for the Microsoft MVP Summit. This will be my ninth Summit, but I’m as excited as if it was my first. The opportunity to meet with the people that make and ship the bits, provide valued feedback on their products, and connect with other data professionals is something I treasure. Here’s hoping they keep me around for another year.


As always, here are some links from the Intertubz that I hope will hold your interest. Enjoy!


Here’s How Much Money Dropbox Saved by Moving Out of the Cloud

Probably none, because the article does not talk about how much money it costs Dropbox to manage the infrastructure themselves.


China's hypersonic aircraft would fly from Beijing to New York in two hours

That sounds cool and all, but, um, what about the G-force felt during acceleration, deceleration, and turns at that speed? Well, that sounds cool, too. Sign me up.


AI vs. Lawyers

OK, forget self-driving cars. The first thing I want AI to replace are lawyers. These results are encouraging.


The Man Who Claimed to Invent Bitcoin Is Being Sued for $10 Billion

Oh, good. With this case heading to court there’s a chance that under oath someone might unwittingly admit to how Bitcoin is more scam than currency. And I love how they want to be paid in Bitcoin. By the time this case is settled they may get enough to buy a cup of coffee, and the transaction will take six hours to process.


GitHub Hit by 1.35Tbps Memcached DDoS

“Hey, let’s just use GitHub for our source control! It’s FREE!”


New Study Shows 20% of Public AWS S3 Buckets are Writable

Proof that the cloud is just as secure as your own data center. People are going to misunderstand technology no matter where it is hosted.


The Deadlock Empire Slay dragons, master concurrency!

Because I’m a database geek and I want y’all to understand that deadlocks are caused by application code, nothing more. The next time you have a deadlock, don’t blame the DBA. Instead, take a look in the mirror.


Being able to hang out with fellow Microsoft Data Platform geeks for three days is the highlight of my year:


Thus far, we have gone over how to classify our disasters and how to have some of those difficult conversations with our organization regarding Disaster Recovery (DR). We've also briefly touched on Business Continuity, an important piece of disaster recovery. Now the time has come to gather all our information and put together something formal in terms of a Disaster Recovery plan. As easy as it sounds, it can be quite a daunting task once you begin. DR plans, just like their disasters, come in all forms, and you can go as broad or as detailed as you like. There is no real “set in stone” template or set of instructions for DR plan creation. For example, some DR plans may just cover how to get services back up and going at the 100-foot level, maybe focusing on more of a server level. Others may contain application-specific instructions for restoring services, while others cover how to recover from yet another disaster at your secondary site. The point is that it’s your organization's DR plan, so you can do as you like. Just remember that it might not be you, or even your IT department, executing the failover, so the more details the better. That said, I mentioned that once we begin to create our DR plan, it can become quite overwhelming. That is why I always recommend starting at that 100-foot level and circling back to input details later.


So, with all that said, we can conclude that our DR plans can be structured however we wish, and that’s true. A quick Google search will yield hundreds of different templates for DR plans, each unique in their own way. However, to have a legible, solid, successful DR plan, there are five sections it needs to contain.




The introduction of a DR plan is as important as one found in a textbook. Basically, this is where you summarize both the objectives and the scope of the plan. A good introduction will include all the IT services and locations that are protected, as well as the RTOs and RPOs associated with each. Aside from the technical aspect, the introduction should also contain the testing schedule and maintenance scope for the plan, as well as a history of revisions that have been made to the plan.


Roles and Responsibilities


We have talked a lot in this series about including stakeholders and application owners outside of the IT department in our primary discussions. This is the section of the plan where you will formally list all your internal and external departments and personnel who are key to each DR process that has been covered in our DR plan. Remember, execution of this plan is normally run under the event of a disaster, so names are not enough. You need brief descriptions of their duties, contact information, and even alternate contact information to ensure that no one is left in the dark.


Incident Response(s)


This is where you will include how a disaster event is being declared, who has the power to do so, and the chain of communication that shall immediately follow. Remember, we can have many different types of disasters, therefore we can also have many different types of disaster declarations and incident responses. For instance, a major fire will yield a different incident response than that of an attempted ransomware attack. We need to know who is making the declaration, how they are doing so, and whom will be contacted, so on and so forth, down the chain of command.


DR Procedures


Once your disaster has been declared, those outlined within the Roles and Responsibilities can begin to act on steps to bring the production environment back up within your secondary location. This is where those procedures and instructions are laid out, step by step, for each service that is identified within the plans’ scope. A lot of IT departments will jump right into this step, and this where our plan creation can tend to get out of control. A rule of thumb is to really start broad with your process, define any prerequisites, and then dive into details. Once you are done with that, you can circle back for yet another round of details.


For example, “Recover Accounting Services” may be a good place to start. You then can dive into the individual servers that support the service as a whole, listing out all the servers (names, IPs, etc.) you need to have available. You can then get into finer details about how to get each server up and running to support the service as a whole. Even further, you may need to make changes to the application for it to run at your secondary location (maybe you have a different IP scheme, different networks, etc.), or have support for external hardware, such as a fax server to send out purchase orders.




This is where you place a collection of any other documents that may be of value to your organization in the event of a disaster. Vendor contacts, insurance policies, support contracts, can all go into an appendix. If there is a certain procedure to recover a server (for example, you use the same piece of software to protect all services), and you've already provided--in the DR Procedures section--an exhaustive list of instructions, you can always add it here as well, and simply reference it from within the DR plan.


With these five sections filled out, you should be certain that your organization is covered in the event of a disaster. A challenge, however, may be keeping your document up to date as your production environment changes. Today’s data centers are far from the static providers they once were. We are always spinning up new services, retiring old ones, moving things to and from the cloud. Every time that happens--to be successful in DR--we need to reassess that service within our DR plan. It needs to be a living document, right from its creation, and must always be kept up to date! And remember, it’s your DR plan, so include any other documents or sections that you or your organization wants to. At the end of the day, it’s better to have more information available than not enough, especially if you aren’t the person responsible for executing it! Also, please store a copy of this at your secondary location and/or in the cloud. I’ve heard too many stories of organizations losing their DR plan along with their production site.


I’d love to hear your thoughts about all this! How do you structure your DR plans? Are you more detailed or broader in terms of laying out the instructions to recover? Have you ever had to execute a DR plan you weren’t a part of? If so, how did that change your views on creating these types of procedures and documents? Thanks for reading!

In the age of exploration, cartographers used to navigate around the world and map the coastlines of unexplored continents. The coastline of IT, and moreover the inner landscapes and features, has become much more complex than a decade ago. The cost and effort needed to perform adequate mapping the old way has gone way upwards, and manual mapping is no longer an affordable endeavor, save for a productive one. Organizations and administrators need a solution to the problem, but where to start from?


To continue on this analogy, explorers of old had a few things to help themselves: maps of the known world, navigation instruments and the stars. They also set sail to discover the vast world and uncover its riches, at the price that most of us know now. Back to our modern world: our goal is to understand which services are critical to a business service, and the reason why we want to understand this is clear. We want to ensure the delivery of IT services with the best possible uptime and performance, without disruptions if possible.


It’s essential to start from the business service view. We need to base ourselves, like explorers of old, on existing maps and features as a reference point. Each organization will have its own way of documenting (hopefully), but the most probable starting point would be a service Business Impact Assessment (BIA). The BIA would give a description of upstream and downstream dependencies of a given service, application platforms (and eventually named systems) involved in supporting the service. From there, we can eventually be led to documentation that describes an application, its components, architecture, and systems.


Creating and maintaining a catalog of business impact assessments diverges from the usual kind of works IT personnel does. It might not even be a purely IT endeavor, as compliance departments in larger organizations may own the process. Nevertheless, it is essential that IT is involved because a BIA is the ideal place to capture criticality requirements. It helps articulate how a given process or service impacts the organization’s ability to conduct business operations, assess how the organization is impacted in case of failure, and determine the steps to recover the service. Capturing adverse impact is a key activity because it helps to classify the criticality of the service itself in case of failure. Impact can be financial (loss of revenue, loss of business), reputational (loss of trust from investors/ customers/partners, press scrutiny), or regulatory (loss of trust from regulatory bodies/legislative authorities, regulatory scrutiny, regulatory audits, and eventually even revocation of license to operate in a given country/region for regulated businesses).


The inconvenience with any BIA or written document is that they are a point-in-time description of a service, which is cast in stone until the next documentation revision date. Therefore it is a necessity to engage with the business process owners, and eventually with application teams, to understand if any changes were introduced. While this allows for a better view of the current state, it has the disadvantage of being a manual process with a lot of back-and-forth interactions. Another challenge we might encounter is that the BIA strictly covers a single process, without mentioning any of the upstream/downstream dependencies, or perhaps mentioning them, but without referring to any document (because there was no BIA done for another service, for example). It might also be impossible to even get one done, because a given process could rely on a third-party service or data source, over which we have no control.


There’s also another challenge looming: Shadow IT. Shadow IT broadly characterizes any IT systems that support an organization’s business objectives, but fall outside of IT scope either by omission or by a deliberate will to conceal the existence of such systems to IT. Because these systems exist outside of a formally documented scope, or are not known to IT organizations, it is very difficult to assert their criticality, at least from an IT standpoint. Portions of business processes or entire business divisions may be leveraging external or third-party services, upon which IT has no oversight or control, and yet IT would be held responsible in case of failure.


How can IT understand the criticality of a given application service in the context of a business service when the view is incomplete or even unknown?


  • From a business perspective, the organization leadership should assert or reassert IT’s role in the organization’s digital strategy, by making IT the one-stop shop for all IT related matters. Roles and responsibilities must be well established, and the organization’s leadership (CIO / CTO) should take an official stance on how to handle shadow IT projects.
  • From a compliance perspective, clear processes must be established about services & systems documentation. The necessity to document business processes and underlying technical systems / platforms is evident, critical services from a business perspective should be documented via Business Impact Analysis and collected/regularly reviewed in the documentation that covers the organization business continuity strategy (usually a Business Continuity Plan).
  • From a technical perspective, the IT organization should be involved into compliance / documentation processes not only for review purposes but also to provide the technical standpoint and provide the necessary technical steps that fall under the Business Continuity/Disaster Recovery strategy.


To encompass these three perspectives, regular checkpoints, meetings or review can help maintain the consistency of the view and the strategy. Is this however sufficient? Unfortunately, not always. Those concepts work perfectly with consistent and stateful processes/systems, but the gradual advent of ephemeral workloads that can be spinned up or scaled down on demand becomes difficult to keep full track.


While a well-defined documentation framework is necessary to establish processes that must be adhered to, and while documented processes with prioritization and criticality levels are essential, it is also necessary to complement this approach with a dynamic and real-time view of the systems.


Modern IT operations management tools should allow the grouping of assets not only by category or location, but also by logical constructs, such as an application view or even a process view. These capabilities have existed in the past, but were always performed manually. Advanced management platforms should leverage traffic flow monitoring capabilities to understand which systems are interacting together, and logically group them based on traffic types. This requires a certain level of intelligence built into the tool. For example, in a Windows-based environment, many systems will communicate with the Active Directory domain controllers, or with a Microsoft Systems Center Configuration Manager installation. The existence of traffic between multiple servers and these servers doesn’t necessarily imply an application dependency. The same could be said on a Linux environment where traffic happens between many servers and an NTP server or a yum repository. On the other hand, traffic via other ports could hint at application relationships. A web server communicating with another server via port 3306 would probably mean a MySQL database is being accessed and would constitute plausible evidence of an application dependency.


Knowing which services are critical to a business service doesn’t require the use of a Palantir. It should be a wise blend of relying on solid business processes and on modern IT operations management platforms, with a holistic view of interactions between multiple systems and intelligent categorization capabilities.

IT organizations manage security in different ways. Some companies have formalized security teams with board-level interest. In these companies, the security team will have firm policies and procedures that apply to network gear. Some organizations appoint a manager or director to be responsible for security with less high-level accountability. Smaller IT shops have less formal security organizations with little security-related accountability. The security guidance a network engineer receives from within their IT organization can vary widely across the industry. Regardless of the direction a network engineer receives from internal security teams, there are reasonable steps he or she can take to protect and secure the network.


Focus on the Basics


Many failures in network security happen due to a lack of basic security hygiene. While this problem extends up the entire IT stack, there are basic steps every network engineer should follow. Network gear should have consistent templated configuration across your organization. Ad-hoc configurations, varying password schemes, and a disorganized infrastructure opens the door for mistakes, inconsistencies, and vulnerabilities. A well-organized, rigorously implemented network is much more likely to be a secure network.


As part of the standard configuration for your network, pay special attention to default passwords, SNMP strings, and unencrypted access methods. Many devices ship with standard SNMP public and private communities. Change these immediately. Turn off any unencrypted access methods like telnet or unsecure web (http). If your organization doesn't have a corporate password vault system, use a free password vault like KeePass to store enable passwords and other sensitive access information. Don't leave a password list lying around, stored on Sharepoint, or unencrypted on a file share. Encrypt the disk on any computer that stores network configurations, especially engineer laptops which can be stolen or left accidentally.


To Firewall or Not to Firewall


While many hyperscalers don't use firewalls to protect their services, the average enterprise still uses firewalls for traffic flowing through their corporate network. It's important to move beyond the legacy layer 4 firewall to a next-generation, application-aware firewall. For outbound internet traffic, organizations need to build policy based on more than the 5-tuple. Building policies based on username and application will make the security posture more dynamic without compromising functionality.


Beyond the firewall, middle boxes like load balancers and reverse-proxies have an important role in your network infrastructure. Vulnerabilities, weak ciphers, and misconfigurations can leave applications and services wide open for exploit. There are many free web-based tools that can scan internet-facing hosts and report on weak ciphers and easy-to-spot vulnerabilities. Make use of these tools and then plan to remediate the findings.


Keep A Look Out for Vulnerabilities


When we think of patch cycles and vulnerability management, servers and workstations are top of mind. However, vulnerabilities exist in our networking gear too. Most vendors have mailing lists, blogs, and social media feeds where they post vulnerabilities. Subscribe to the relevant notification streams and tune your feed for information that's relevant to your organization. Make note of vulnerabilities and plan upgrades accordingly.


IT security is a broad topic that must be addressed throughout the entire stack. Most network engineers can't control the security posture of the endpoints or servers at their company but they do control networking gear and middle boxes which have a profound impact on IT security. In most instances, you can take practical, common sense steps that will dramatically improve your network security posture.

By Paul Parker, SolarWinds Federal & National Government Chief Technologist


Here's an interesting article from my colleague Leon Adato, in which he suggests that honesty is best policy.


IT professionals have a tough job. They face the conundrum of managing increasingly complex and hybrid IT platforms. They must protect their networks from continually evolving threats and bad actors. Budgets are restrictive and resources slim. And there are political agendas.


Given all of these factors, it’s understandable if we might feel compelled to tell some little white lies to ourselves on occasion. “Everything’s fine,” we might say, even if we’re not entirely sure that it is true. We might also be willing to engage in some little excuses and statements of overconfidence.


However, it’s important we acknowledge we may not have all the answers. We must continue to be honest with ourselves to avoid living in a world of gray.


You Don’t Know What You Don’t Know


Sometimes it’s more difficult to truly know how your infrastructure operates. That’s especially true in hybrid IT models. It’s very difficult to gain a complete view of our entire operation without the proper monitoring tools.


As pessimistic as that may seem, sometimes users aren’t honest, particularly in agency environments with very strict rules.


If an agency has a policy against using USB devices, for example, what happens if an employee breaks that rule and introduces the potential for unnecessary risk? From the confines of IT, it is sometimes difficult to assess what might be going on in other sections of the agency, which could pose some problems.


Unearthing the Truth = No More Little White Lies


Keeping everyone honest is essential to maintaining network integrity. The best way to do that is to adopt monitoring solutions and strategies that allow our IT teams to maintain visibility and control over every aspect of our infrastructure, from applications hosted off-site to the mobile devices used over networks.


We should adopt monitoring tools that are comprehensive and encompass the full range of networked entities. These solutions should also be able to provide insight into network activity regardless of whether the infrastructure and applications are on-site or hosted. We must be able to monitor activity at the hosting site and as data passes from the hosting provider to the agency.


After all, a true monitoring solution must monitor and provide a true view of what’s going on within the network. Shouldn’t it offer the ability to probe? To drill down? Those capabilities are essential if we are to truly unearth the root cause of whatever issues we may be trying to address or avert. And with the ability to monitor connections to external sources, we’ll be able to better identify break points when an outage occurs.


Let’s not forget everyone else in the agency. It’s important to keep tabs on network traffic to identify red flags and shine a light on employees who may be using unauthorized applications, again, as a means to keep everyone honest.


Being left in the dark may lead us to rely on half-truths simply because we lack the full picture. Instead of fooling ourselves, we should seek out solutions that provide us with true clarity into our networks, rather than shades of gray. This will result in more effective and secure network operations.


Find the full article on Nextgov.

If you have done any work in enterprise networks, you are likely familiar with the idea of a chassis switch. They have been the de facto standard for campus and data center cores and the standard top tier in a three-tier architecture for quite some time, with the venerable and perennial Cisco 6500 having a role in just about every network that I’ve ever worked on. They’re big and expensive, but they’re also resilient and bulletproof. (I mean this in the figurative and literal sense. I doubt you can get a bullet through most chassis switches cleanly.) That being said, there are some downsides to buying chassis switches that don’t often get discussed. In this post, I’m going to make a case against chassis switching. Not because chassis switching is inherently bad, but because I find that a lot of enterprises just default to the chassis as a core because that’s what they’re used to. To do this I’m going to look at some of the key benefits touted by chassis switch vendors and discussing how alternative architectures can provide these features, potentially in a more effective way.


High Availability


One of the key selling features of chassis switching is high availability. Within a chassis, every component should be deployed in N+1 redundancy. This means you don’t just buy one fancy and expensive supervisor, you buy two. If you’re really serious, you buy two chassis, because the chassis itself is an unlikely, but potential, single point of failure. The reality is that most chassis switches live up to the hype here. I’ve seen many chassis boxes that have been online for entirely too long without a reboot (patching apparently is overrated). The problem here isn’t a reliability question, but rather a blast area question. What do I mean by blast area? It’s the number of devices that are impacted if the switch has an issue. Chassis boxes tend to be densely populated with many devices either directly connected or dependent upon the operation of that physical device.


What happens when something goes wrong? All hardware eventually fails, so what’s the impact of a big centralized switch completely failing? Or more importantly, what’s the impact if it’s misbehaving, but hasn’t failed completely? (Gray-outs are the worst.) Your blast radius is significant and usually comprises most or all of the environment behind that switch. Redundancy is great, but it usually assumes total failure. Things don’t always fail that cleanly.


So, what’s the alternative? We can learn vicariously from our friends in Server Infrastructure groups and deploy distributed systems instead of highly centralized ones. Leaf-spine, a derivative of Clos networks, provides a mechanism for creating a distributed switching fabric that allows for up to half of the switching devices in the network to be offline with the only impact to the network being reduced redundancy and throughput. I don’t have the ability to dive into the details on leaf-spine architectures in this post, but you can check out this Packet Pushers Podcast if you would like a deeper understanding of how they work. A distributed architecture gives you the same level of high availability found in chassis switches but with a much more manageable scalability curve. See that section below for more details on scalability.




Complexity can be measured in many ways. There’s management complexity, technical complexity, operational complexity, etc. Fundamentally though, complexity is increased with the introduction and addition of interaction surfaces. Most networking technologies are relatively simple when operated in a bubble (some exceptions do apply) but real complexity starts showing up when those technologies are intermixed and running on top of each other. There are unintended consequences to your routing architecture when your spanning-tree architecture doesn’t act in a coordinated way, for example. This is one of the reasons why systems design has favored virtualization, and now micro-services, over large boxes that run many services. Operation and troubleshooting become far more complex when many things are being done on one system.


Networking is no different. Chassis switches are complicated. There are lots of moving pieces and things that need to go right, all residing under a single control plane. The ability to manage many devices under one management plane may feel like reducing complexity, but the reality is that it’s just an exchange of one type of complexity for another. Generally speaking it’s easier to troubleshoot a single purpose device than a multi-purpose device, but operationally it’s easier to manage one or two devices rather than tens or hundreds of devices.




You may not know this, but most chassis switches rely on Clos networking techniques for scalability within the chassis. Therefore, it isn’t a stretch to consider moving that same methodology out of the box and into a distributed switching fabric. With the combination of high speed backplanes/fabrics and multiple line card slots, chassis switches do have a fair amount of flexibility. The challenge is that you have to buy a large enough switch to handle anticipated and unanticipated growth over the life of the switch. For some companies, the life of a chassis switch can be expected to be upwards of 7-10 years. That’s quite a long time. You either need to be clairvoyant and understand your business needs half a decade into the future, or do what most people do: significantly oversize the initial purchase to help ensure that you don’t run out of capacity too quickly.


On the other hand, distributed switching fabrics grow with you. If you need more access ports, you add more leafs. If you need more fabric capacity, you add more spines. There’s also much greater flexibility to adjust to changing capacity trends in the industry. Over the past five years, we’ve been seeing the commoditization of 10Gb, 25Gb, 40Gb, and 100Gb links in the data center. Speeds of 400Gpbs are on the not-too-distant horizon, as well. In a chassis switch, you would have had to anticipate this dramatic upswing in individual link speed and purchase a switch that could handle it before the technologies became commonplace.




When talking about upgrading, there really are two types of upgrades that need to be addressed: hardware and software. We’re going to focus on software here, though, because we briefly addressed the hardware component above. Going back to our complexity discussion, the operation “under the hood” on chassis switches can often be quite complicated. With so many services so tightly packed into one control plane, upgrading can be a very complicated task. To handle this, switch vendors have created an abstraction for the processes and typically offer some form of “In Service Software Upgrade” automation. When it works, it feels miraculous. When it doesn’t, those are bad, bad days. I know few engineers who haven’t had ISSU burn them in one way or another. When everything in your environment is dependent upon one or two control planes always being operational, upgrading becomes a much riskier proposition.


Distributed architectures don’t have this challenge. Since services are distributed across many devices, losing any one device has little impact on the network. Also, since there is only loose coupling between devices in the fabric, not all devices have to be at the same software levels, like chassis switches do. This means you can upgrade a small section of your fabric and test the waters for a bit. If it doesn’t work well, roll it back. If it does, distribute the upgrade across the fabric.


Final Thoughts


I want to reiterate that I’m not making the case that chassis switches shouldn’t ever be used. In fact, I could easily write another post pointing out all the challenges inherent in distributed switching fabrics. The point of the post is to hopefully get people thinking about the choices they have when planning, designing, and deploying the networks they run. No single architecture should be the “go-to” architecture. Rather, you should weigh the trade-offs and make the decision that makes the most sense. Some people need chassis switching. Some networks work better in distributed fabrics. You’ll never know which group you belong to unless you consider factors like those above and the things that matter most to you and your organization.

I am in Germany this week, presenting sessions on database migrations and upgrades at SQL Konferenz. It’s always fun to talk data, and help people understand how to plan and execute data migration projects.


As always, here are some links from the Intertubz that I hope will hold your interest. Enjoy!


Intel partners with Microsoft, Dell, HP, and Lenovo to make 5G laptops

Stop me if you’ve heard this one before, but we are being told this next-gen network will solve all our problems.


Apple devices are butt dialing 9-1-1 from its refurbishing facility — 20 times per day

Hey, at least the calls are going through. Most days I can’t get any reception with mine.


Your orange juice exists because of climate change in the Himalayas millions of years ago

Forget bringing back dinosaurs, or fruits, I want to use DNA to bring back ancient bacon.


The FCC’s Net Neutrality Order Was Just Published, Now the Fight Really Begins

We are officially on the clock now, folks. And we still don’t know if this is a good or bad thing.


How to protect your browser from Unicode domain phishing attacks

I like the idea of browser extensions for safety, especially if they are part of a domain policy.


That microchipped e-passport you've got? US border cops still can't verify the data in it

Ten years. Embarrassing. On the upside, think of all the bottles of water they prevented from flying in that time.


The quantum internet has arrived (and it hasn’t)

I find myself fascinated by this concept. I’m also scared to think about my first quantum cable bill.


Made it to Darmstadt and it took only five minutes to find the best worscht in town:

So far in this series, we've covered setting expectations as well as migrating to Office 365. Now that your organization is up and running on the new platform, how do you measure your organization's health? Are you running as efficiently as you can, or were? Are there areas that you are being wasteful with? In this post, we'll cover some steps that you can take in order to give your organization a health check.




One of the great things about Office 365 is that there is no shortage of packages to choose from. Whether you are looking to host a single email account, or if you need to host your entire communications platform--including phones--there are packages that will fit. But how can you tell if you have "right-sized" your licenses?


Office 365 has an easy to understand activity report. Pulling up this report will let you see statistics on a lot of the services being offered. For example, you can see who is using OneDrive and how much data they are storing. You can also see how popular Skype for Business is amongst your users.


At a high-level, you can take this list and see who is or isn't using these features. Depending on the needs and the features, users might be able to be shifted to a lower tiered planned. Given the range of prices for the various plans, this can yield fairly significant savings.




Taking the same data above, you can find a list of folks who aren't using particular products or features. This is a perfect opportunity to find out why they aren't taking advantage of their license's full potential. Is it because they don't need the product/server? Maybe they aren't aware that they can access it. Or, maybe they don't know how to use it.


Taking this approach can be a great way to figure out what users need training and in what areas. Given how frequently Microsoft is adding new features to Office 365, it is also a great way to see adoption rates. Using this data, you can start to develop training schedules. Maybe once a month you can offer training sessions on some of the lesser-used areas. The great thing is, you will be able to easily tell if your training is useful by looking at the usage metrics again in the future.




One of the key points I highlighted back in the first post of this series was the value that this migration can bring to an enterprise. When planning out projects, we do so anticipating that we will get value out of this. Actually measuring the value after a migration is just as important. If reports and studies come back showing that your organization is, in fact, utilizing the new platform to its full potential, then great! If not, then you need to identify why not, and take the opportunity to fix it.


If you have been part of a team who migrated an enterprise environment to Office 365, how did you perform a health check? Did you uncover any surprises in your findings? Feel free to leave a comment below.


By Paul Parker, SolarWinds Federal & National Government Chief Technologist


It’s always good to have a periodic reminder to consider what we’re monitoring and why. Here's an applicable article from my colleague Joe Kim, in which he offers some tips on avoiding alert overload.


If you’re receiving so much monitoring information that you don’t see the bigger-picture implications, then you’re missing the value that information can provide. Federal IT pros have a seeming overabundance of tools available for network monitoring. Today, they can monitor everything from bandwidth to security systems to implementation data to high-level operational metrics.


Many federal IT pros are tempted to use them all to get as much information as possible, to ensure that they don’t miss even a bit of data that can help optimize network performance.


That is not the best idea.


First, getting too much monitoring information can cause monitoring overload. Why is this bad? Monitoring overload can lead to overly complex systems that, in turn, may create conflicting data. Conflicting data can then lead to management conflicts, which are counter-productive on multiple levels.


Second, many of these tools do not work together, providing a larger possibility for conflicting data, a greater chance that something important will be missed, and an even greater challenge seeing the bigger picture.


The solution is simpler than it may seem: get back to basics. Start by asking these three simple questions:


  1. For whom am I collecting this data?
  2. What metrics do I really need?
  3. What is my monitoring goal?


Federal IT pros should start by looking specifically at the audience for the data being collected. Which group is using the metrics—the operations team, the project manager, or agency management? Understand that the operations team will have its own wide audience and equally wide array of needs, so be as specific as possible in gathering “audience” information.


Once the federal IT pro has determined the audience, it will be much easier to determine exactly which metrics the audience requires to ensure optimal network performance—without drowning in alerts and data. Identify the most valuable metrics and focus on ensuring those get the highest priority.


The third question is the kicker, and should bring everything together.


Remember, monitoring is a means to an end. The point of monitoring is to inform and enhance operational decisions based on collected data. If a federal IT pro has a series of disconnected monitoring products, there is no way to understand the bigger picture; one cannot enhance operational decisions based on collected data if there is no consolidation. Opt for an aggregation solution, something that brings together information from multiple tools through a single interface that provides a single view.


Network monitoring and network optimization are getting more and more complex. Couple this with an increasing demand for a more digital government, and it becomes clear that gathering succinct insight into the infrastructure and application level of the IT operations within the agency is critical.


The most effective course of action is to get back to the basics. Focus on the audience and the agency’s specific needs. This will ensure a more streamlined monitoring solution that will help more effectively drive mission success.


Find the full article on Federal Technology Insider.



(This is the fourth and final part of a series. You can find Part One here, Part Two here and Part Three here.)


It behooves me to remind you that there are many spoilers beyond this point. If you haven't seen the movie yet, and don't want to know what's coming, bookmark this page to enjoy later.


New IT pros may take your tools and techniques and use them differently. Don't judge.


One of the interesting differences between Logan and Laura is that she has two claws that come from her hands (versus Logan's three), and one that comes out of her foot. Charles speculates that females of a species develop different weapons for protection versus hunting. Logan seems unimpressed even though he just witnessed Laura taking out at least three soldiers with her foot-claws alone.


The lesson for us is to remember that tools are there to be used. If it achieves the desired result and avoids downstream complications, then it doesn't matter if the usage diverges from "the way we did it in my day.” Thinking outside the box (something my fellow Head Geek, Destiny Bertucci, talks about all the time is a sign of creativity and engagement, two things that should never be downplayed.


Your ability to think will always trump the capability of your tools.


Yes, Logan is stab-y and can heal. But Charles, at the end of his life, can still flatten a city block.


And it is here where we descend into the realm of "who would win in a fight between Superman® and God?" This is, admittedly, a realm that the SolarWinds THWACK® March Madness bracket battle has been willing to take on for several years in a row



but I'm going to go there anyway. Logan/Wolverine® is one of the darlings of the X-Men® (and Marvel®) franchise. He's captured imaginations since his first appearance in 1974, and appeared in countless comics with the X-Men and solo. But even within the context of the X-Men movie franchise, he's far from the most powerful.


Magneto: “You must be Wolverine. That remarkable metal doesn't run through your entire body, does it?”


No, it's pretty clear that the most powerful being, certainly in Logan , but also in the mutant-verse, is Charles. Again, the ability to contact every human mind on the planet is nothing to sneeze at, and it puts healing ability and metal claws to shame.


Here’s what I want you to take from this: your ideas, thoughts, and ability to reason are the things that make you an IT powerhouse. It doesn’t matter that your PC has a quad-core processor and 128Gb of RAM. Nobody cares that your environment is running the latest container technology, or that your network has fiber-to-the-desktop. You have a veritable encyclopedia of CLI commands or programming verbs in your head? So what.


You are valued for the things that you do with your tools. Choose wisely. Think actively. Engage passionately.


It's never about what you do (or what you have achieved, fixed, etc.). The story of your IT career has always been and will always be about who you met, who you helped, and who you built a connection with.


The movie Logan is not, at its heart, about stabbing people in the head with metal claws, or car chases, or mutant abilities. While there is plenty of that, the core of the movie is about two men coming to terms with themselves and their legacy, and how that legacy will affect the world after they are gone.


It is a movie about the very real father-son relationship between Logan and Charles - how they love each other but wish the other could be "better" in some way. They understand that they cannot change the other person, but have to learn to live with them.


It is also about caring for another person: about whether we choose to care or not, about how we express that care, about how those feelings are received by the other person and reciprocated (or not).


Once again, I am invoking the blog post by fellow Head Geek Thomas LaRock: "Relationships Matter More Than Money" (


"When you use the phrase, "It's not personal, it's just business," you are telling the other person that money is more important than your relationship. Let that sink in for a minute. You are telling someone, perhaps a (current, maybe soon-to-be-former) friend of yours, that you would rather have money than their friendship. And while some jerk is now getting ready to leave the comment “everything has a price,” my answer is “not my friends.” If you can put a price on your friendships, maybe you need better ones.


Why are you in IT? Odds are very good it's not for the money. Okay, the money isn't bad, but no matter what the payout is, ultimately it’s probably not enough to keep you coming back into the office day after day. You are in IT for something else. Maybe you like the rush of finding a solution nobody else ever thought of. Or the pure beauty of the logic involved in the work. Or the chance to build something that someone else wanted but couldn't figure out how to make for themselves.


But underneath it all, you are probably in IT because you want to help people in some meaningful way.


That's the IT lesson we can take from Logan. The climax of the movie isn't when Laura shoots X24 in the head with an adamantium bullet.


It's when she clutches Logan's hand as he's dying and cries out, "Daddy!" in her loss and grief, and he accepts both her name and love for him, even if he doesn't feel he's worthy of either.


We are here - on this planet, in this community, at this company, on this team, on this project, doing this job - to forge connections with the people that we meet. To learn, mentor, befriend, lead, help, teach, follow, grow, foster, mentor, and so much more. The rest are just technical details.


1 “Logan” (2017), Marvel Entertainment, distributed by 20th Century Fox

Filter Blog

By date: By tag:

SolarWinds uses cookies on its websites to make your online experience easier and better. By using our website, you consent to our use of cookies. For more information on cookies, see our cookie policy.