Skip navigation
1 2 3 Previous Next

Geek Speak

2,407 posts

So, I wanted to at least touch base with everyone on the “scandal” of the week. Is it fake news? New ways for stock gouging? New ransom type embankments? Corporate espionage?


I waited until at least some of the dust had settled to write this post. I wanted to be able to make accurate judgment calls and present a level-headed offering of thoughts and ideas. Here they are:


  1. Yes, there are security flaws (over a dozen) within these processors.
  2. No, at this time they are not mission critical because they have to have physical access AND the administrator\root information.
  3. The lab that sent out these security flaws had stock associated with their finds.
  4. They only gave AMD 24 hours to resolve the issue before they sent the processors out.


People are still discussing the processor story, so consider this an up-to-date discussion. Let it also be a friendly reminder that we have to check the general “sky is falling” mentality, especially in security. Key takeaway? Focus on best practices.



We should strive to have due diligence on the risk, determine appropriate measures to respond, and showcase the balance between risk and business as usual.


Since I believe you can benefit from them, here are my top three security practices:


Infrastructure monitoring

Determining baselines winds up bringing incredible value to any organization, department, and technology as a whole. The importance and power of baselines sometimes gets overlooked, and that saddens me. It is all too common for folks to wait until after they experience an incident to set up monitoring. That is simply a reaction, not a proactive approach.


Once you begin monitoring, you can start comparing solutions to risk. This is how you can test solutions to risks and vulnerabilities before you go full on “PLAID” mode (Spaceballs reference. #sorrynotsorry), only to find that you have created a larger issue than the risk itself. Comparative reporting is an excellent way to prove that you have done your due diligence in understanding the impact of the threat and the solution as a whole.


Threat management policies

You should determine a policy that addresses ways to deal with threats, vulnerabilities, and concerns immediately and openly.  It should live where everyone can access it, and be clearly outlined so everyone knows what is happening even before you have the solution. This helps to stop or at least slow down management fire alarms, universally expressed as, “What are we going to do NOW?”


The policy should include a timeline of events that everyone can understand. For example, let everyone know that there will be an email update outlining next steps with 48 hours of the incident.  In other words, you are telling everyone, “ Hey, I’m working on the issue and I’ll make sure I update you. In the meantime, I’m doing my due diligence to make sure the outcome is beneficial for our company.”


Asset Management

You can't quickly assess your infrastructure if you are not aware of everything you manage, period.


There is power in knowing what you are managing many realms, but my first go-to are asset reports. I need to know quickly what could—and, more importantly—what could not be associated with any new threats, concerns, or vulnerabilities.


The types of tools that allow me to monitor and update my assets give me much needed insight into where my focus should be, which is why I go there first. Doing so ensures that I won’t be distracted or overwhelmed by data points that aren’t relevant.


Finally, the responsibility of tracking and understanding any types of threat should be proactive and fully vetted. We should want to understand the issues before we blindly implement Band-Aids that can, potentially, hinder our business goals.


Using information to better the security within our organizations also brings us into the fabric of the business, assisting efforts to keep business costs low.


I hope you join this conversation because there are several touch points here. I’m very curious to hear your thoughts, comments, and opinions. For example, did you believe, when the processors were released, that they were a form of ransom? Do you see other opportunities to manhandle a company’s earnings by highlighting exploits for others’ gain?  Or, maybe you just sit back, watch the news with a scotch in your hand, and laugh.


Let's talk this over, shall we?




The SolarWinds trademarks, service marks, and logos are the exclusive property of SolarWinds Worldwide, LLC or its affiliates. All other trademarks are the property of their respective owners.

With the influx of natural disasters, hacks, and increasingly more common ransomware, being able to recover from a disaster is quickly moving up the priority list for IT departments across the globe. In addition to awareness, we are seeing our data centers move from a very static deployment to an ever-changing environment. Each day we see more and more applications getting deployed, either on-premises or in the cloud, and each day we, as IT professionals, have the due diligence to ensure that when disaster strikes we can recover these applications. Without the proper procedures in place to consistently update our DR plans, no matter how well-crafted or detailed they are, the confidence in completing successful failovers decreases. So what now?


We’ve already discussed the first step in our DR process: creating our plan. We’ve also touched on the second step, which is to make it a living document to accommodate for data center change. But there is one more step we need to put in place for a successful failover, and that's testing. It boosts the confidence in the IT department and the organization as a whole.


Testing our DR plan - We learn by doing!


When thinking of DR plan testing, I always like to compare it to a child. I know, a weird analogy, but if we think about how children learn and get better, it begins to make sense. Children learn by doing; they learn to talk by talking, learn to play sports by playing, etc. The point is that by “walking the walk,” we tend to improve ourselves. The same applies to our DR plans. We can have as many details and processes laid out on paper as we want, but if we can't restore when we need to, we've failed. Essentially, our DR plans are set up for success by also walking the walk, aka testing.


Start small, get bigger!


I’m not recommending going and pulling the plug on your data center tomorrow to see if your plan works. That would certainly be a career-changing move. Instead, you should start small. Take a couple key services as defined in your DR plan and begin to draft a plan on how to test a failover of the components and servers contained within them. Just as when creating our DR plan, details and coordination are the key to success when creating our testing plan.  Know exactly what you are testing for. Don’t simply acknowledge that the servers have booted as a success. Instead, go deeper. Can you log into the application? Can you use the application? Can a member of the department that owns the application sign off stating that it is indeed functioning normally? By knowing exactly what the end goal is you can sign off on a successful test, or on the flip side, take the failures which have occurred and learn from them, updating our plan to reflect any changes, and be prepared for the next testing cycle.


Once you have a couple services defined go ahead and begin to integrate more and ensure that recurring time has been set aside and defined within the DR plan to carry out these tests. A full-scale DR test is not something that can be performed on a regular basis, but we can carry out smaller tests on a monthly or quarterly basis. Without a consistent schedule and attention to detail we can almost guarantee that items like configuration drift will soon creep up and cause our DR testing to fail, or worse, our DR execution to fail.


I’ve mentioned before that not keeping our DR plans up to date is perhaps one the biggest flaws in the whole DR process. However, not applying a consistent testing plan trumps this. Disaster Recovery, in my opinion, cannot be classified as a project. It cannot have an end date and a closing. We must always ensure, when deploying new services and changing existing applications, that we revisit the DR plan, updating both the process of recovering and the process for testing said recovery. Testing our DR plan is a key component in ensuring that all that work we have done in creating our plan will be successful when the plan is most needed. Let’s face it. A failed recovery will put a blemish on the entire DR planning process and all the work that has gone into it. Test and test often to make sure this doesn't happen to you.


I’d love to hear from all of you regarding how you go about testing, or if you even do? Are there any specific starting points for tests that you recommend? Do you start small and then expand? Do you utilize any specific pieces of software, resources or tools to help test your recovery? If you do test, how often? And finally, let’s hear those horror/success stories of any incidents gone bad (or extremely well) as it relates directly to your DR testing procedures. Thanks for reading!

By Paul Parker, SolarWinds Federal & National Government Chief Technologist


We all know that security concerns go hand in hand with IoT. Here's an interesting article from my colleague Joe Kim, in which he suggests ways to overcome the challenges.


Agencies should not wait on IoT security


The U.S. Defense Department is investing heavily to leverage the benefits provided by the burgeoning Internet of Things (IoT) environment.


With federal IoT spending already hitting nearly $9 billion in fiscal year 2015, according to research firm Govini, it’s a fair bet that IoT spending will continue to increase, particularly considering the department’s focus on arming warfighters with innovative and powerful technologies.


Security risks exist that must not be overlooked. An increase in connected devices leads to a larger and more vulnerable attack surface offering a greater number of entry points for bad actors to exploit.


While the BYOD wave might have been good prep for a connected future, the IoT ecosystem will make managing smartphones and tablets seem like child’s play. To quote my colleague Patrick Hubbard, “IoT is a slowly rising tide that will eventually make IoT accommodation strategies pretty quaint.” That’s because we are talking about many proprietary operating systems that will need to be managed individually.


DHS has acknowledged the problems that the IoT presents and the opportunity to address security challenges. Furthermore, the DoD is making significant strides to fortify the government’s IoT deployments. In addition to DoD’s overall significant investment in wireless devices, sensors and cloud storage, the NIST has issued an IoT model designed to provide researchers with a better understanding of the ecosystem and its security challenges.


The government IoT market remains very much in its nascent stage. While agencies might understand its promise and potential, the true security ramifications must still be examined. One thing’s for certain: Agency IT administrators must fortify their networks now.


A good first step toward meeting the security challenges is through user device tracking, which lets administrators closely monitor devices and block rogue or unauthorized devices that could compromise security. With this strategy, administrators can track endpoint devices by message authentication code and internet protocol addresses, and trace them to individual users.


In addition to tracking the devices themselves, administrators also must identify effective ways to upgrade the firmware on approved devices, which can be an enormous challenge. In government, many firmware updates are still executed through a manual process.


Simultaneously, networks eventually must be able to self-heal and remediate security issues within minutes instead of days, significantly reducing the damage hackers can cause. NSA, DHS, and Defense Advanced Research Projects Agency have been working on initiatives, some of which are well underway.


While the challenges of updates and remediation are being addressed, administrators must devise an effective safety net to catch unwanted intrusions. That’s where log and event management come into play. Systems automatically can scan for suspicious activity and actively respond to potential threats by blocking internet protocol addresses, disabling users, and barring devices from accessing an agency’s network. Log and event management provide other benefits, including insider threat detection and real-time event remediation.


Regardless of its various security challenges, the IoT has great promise for the Defense Department. The various connections, from warfighters’ uniforms to tanks and major weapons systems, will provide invaluable data for more effective modern warfare.


Find the full article on SIGNAL.

They lurk in the shadows, they creep in the dark

You may hear them shriek, howl, grunt, or bark

Fact or fiction, it’s hard to be sure

If these creatures are caught on camera, they’re only a blur

Their stories have been told for hundreds of years

Each one a lesson that forces you to confront your fears

Now it’s your turn to vote and decide forevermore

Who should be crowned the most legendary of all folklore?


Starting today, 33 of the most mythical creatures will battle it out until only one remains and reigns supreme as the ultimate legend.


The starting categories are as follows:


  • Cryptids
  • Half & Halfs
  • The Gruesomes
  • Fairy Tales


We picked the starting point and initial match-ups; however, just like in bracket battles past, it will be up to the community to decide who they think is the most legendary contestant.


*NEW* Submit your bracket: To up the ante this year, we’re giving you a chance to earn 1,000 bonus THWACK points if you correctly guess the final four bracket contestants. To do this, you’ll need to go to the personal bracket page and select your pick for each category. Points will be awarded after the final four are revealed.



Bracket battle rules:


Match-up analysis:

  • For each urban legend, we’ve provided reference links to wiki pages—to access these, just click on their name on the bracket
  • A breakdown of each match-up is available by clicking on the VOTE link
  • Anyone can view the bracket and match-ups, but in order to vote or comment, you must have a THWACK® account



  • Again, you must be logged in to vote and trash talk
  • You may vote ONCE for each match-up
  • Once you vote on a match, click the link to return to the bracket and vote on the next match-up in the series
  • Each vote earns you 50 THWACK points. If you vote on every match-up in the bracket battle, you can earn up to 1,550 points



  • Please feel free to campaign for your favorite legends and debate the match-ups via the comment section (also, feel free to post pictures of bracket predictions on social media)
  • To join the conversation on social media, use hashtag #SWBracketBattle
  • There is a PDF printable version of the bracket available, so you can track the progress of your favorite picks



  • Bracket release is TODAY, March 19
  • Voting for each round will begin at 10 a.m. CDT
  • Voting for each round will close at 11:59 p.m. CDT on the date listed on the bracket home page
  • Play-in battle opens TODAY, March 19
  • Round 1 OPENS March 21
  • Round 2 OPENS March 26
  • Round 3 OPENS March 29
  • Round 4 OPENS April 2
  • Round 5 OPENS April 5
  • Ultimate sidekick announced April 11


If you have any other questions, please feel free to comment below and we’ll get back to you!


Who (or what) will be crowned the ultimate legend?

We’ll let the votes decide!


Access the bracket overview HERE>>

In system design, every technical decision can be seen as a series of trade-offs. If I choose to implement Technology A it will provide a positive outcome in one way, but introduce new challenges that I wouldn’t have if I had chose Technology B. There are very few decisions in systems design that don’t come down to tradeoffs like this. This is the fundamental reason why we have multiple technology solutions that solve similar problem sets. One of the most common tradeoffs we see is in how tightly, or loosely, technologies and systems are coupled together.  While coupling is often a determining factor in many design decisions, many businesses aren’t directly considering the impact of coupling in their decision making process. In this article I want to step through this concept, defining what coupling is and why it matters when thinking about system design.


We should start with a definition. Generically, coupling is a term we use to indicate how interdependent individual components of a system are. A tightly coupled system will be highly interdependent, where a loosely coupled system will have components that run independent from each other. Let’s look at some of the characteristics of each.


Tightly coupled systems can be identified by the following characteristics:


  • Connections between components in the system are strong
  • Parts of the system are directly dependent on one another
  • A change in one area directly impacts other areas of the system
  • Efficiency is high across the entire system
  • Brittleness increases as complexity or components are added to the system


    Loosely coupled systems can be identified by the following characteristics:


  • Connections between components in the system are weak
  • Parts within the system run independently of other parts within the system
  • A change in one area has little or no impact on other areas of the system
  • Sub-optimal levels of efficiency are common
  • Resiliency increases as components are added


So which is better?


Like all proper technology questions, the answer is “It depends!”  The reality is that technologies and architectures sit somewhere on the spectrum between completely loose and completely tight, with both having advantages and disadvantages.


When speaking of systems, efficiency is almost always something we’re concerned about so tight coupling seems like a logical direction to look. We want systems that act in a completely coordinated fashion, delivering value to the business with as little wasted effort or resources as possible. It’s a noble goal. However, we often have to solve for resiliency as well, which logically points to loosely coupled systems. Tightly coupled systems become brittle because every part is dependent on the other parts to function. If one part breaks, the rest are incapable of doing what they were intended to do. This is bad for resiliency.


This is better understood with an example, so let’s use DNS as a simple one.


Generally speaking, using DNS instead of directly referencing IP addresses gives efficiency and flexibility to your systems. It allows you to redirect traffic to different hosts at will by modifying a central DNS record rather than having to change an IP address reference in multiple locations. It also is a great central information repository on how to reach many devices on your network. We often recommend that applications should use DNS lookups, rather than direct IP address references, because of the additional value it provides. The downside is that this name reference now introduces a false dependency. Many of your applications can work perfectly fine without referring to DNS, but by introducing it into them you have tightened coupling between the DNS system and your application. An application which could previously run independently now depends on name resolution and your applications fails if DNS fails.


In this scenario you have a decision to make. Does the value and efficiency of adding DNS lookups to your application outweigh the deterrent of now needing both systems up and running for your application to work. You can see this is a very simple example, but as we begin layering technology, on top of technology, the coupling and dependencies can become both very strong and very hard to actually identify. I’m sure many of you have been in the situation where the failure of one seemingly unrelated system has impacted another system on your network. This is due to hidden coupling, interaction surfaces, and the law of unintended consequences.


To answer the question “Which is better?” again, there is no right answer. We need both. There are times where highly coordinated action is required. There are times when high levels of resilience is required. Most commonly we need both. When designing and deploying systems, coupling needs to be considered so you can mitigate the downsides of each while taking advantages of the positives they provide.

Most enterprises rely on infrastructure and applications in the cloud. Whether it’s SaaS services like Office 365, IaaS in AWS, PaaS in Azure, or analytics services in Google Cloud, organizations now rely on systems that do not reside on their infrastructure. Unfortunately, connectivity requirements are often overlooked when the decision is made to migrate services to the cloud. Cloud service providers downplay connectivity challenges, and organizations new to cloud computing don’t know the right questions to ask. 


SaaS: It’s just the Internet

When organizations begin to discuss cloud infrastructure, an early assumption is that all connectivity will simply happen via the internet. While many SaaS services are accessible from anywhere via the internet, large organizations need to consider how new traffic patterns will affect their current infrastructure. For example, Office 365 recommends you plan for 10 TCP port connections per device. You can support, at most, 6,000 devices behind a single IP address. If you have a large network and a small PAT pool for client egress, PAT exhaustion will quickly become a problem.


Internet-based SaaS applications make hub-and-spoke networks with centralized internet less efficient. Many WAN solutions use local internet connections to build encrypted tunnels to other sites. You can dramatically reduce network traffic by offloading SaaS applications to a local internet connection instead of backhauling traffic to a centralized data center. However, be mindful of the impact of your security footprint as you decentralize internet access across your organization.


But What About the Data Center?

Invariably, as teams begin to build IaaS and PaaS infrastructure in the cloud, they need access to resources and data that live in an on-premises data center. Most organizations begin with IPSec tunnels to connect disparate resources. Care must be taken when building IPSec tunnels to understand cloud requirements. Many cloud teams assume dynamic routing with BGP over VPN tunnels. In my experience, most network engineers assume static routing over IPSec tunnels. Be sure to have conversations about requirements up front.


When building VPNs to the cloud, throughput can be an issue. Most VPN connections are built on underlying infrastructure with throughput limitations. If you need higher throughput than cloud VPN infrastructure will support, you will need to consider a direct connection to the cloud.


Plug Me In to the Cloud, Please

There are several options to connect directly to the cloud. If you have an existing MPLS provider, most offer services to provide direct connectivity into cloud services. There are technical limitations to these services, however. Pay special attention to your routing and segmentation requirements. MPLS connectivity will likely not be as simple as your provider describes in the sales meeting.


If you do not want to leverage MPLS service to connect to the cloud, you can provision a point-to-point circuit from your premises to a cloud service provider. Cloud services publish ample documentation for direct connections.


Another option is to lease space from a co-located provider who can peer with multiple cloud service providers (CSPs). You provide circuits and hardware that reside in the co-lo, and the co-lo provides peering services to the one or more cloud providers. Be aware that each CSP charges a direct connect fee on top of your circuit costs. There may also be data ingress and egress fees.


You Want to Route What on my Network?

Cloud service providers operate their networks with technologies similar to service providers. Many SaaS services are routable only with public IP addresses. For example, if you want to connect to SalesForce, Office365, or Azure Platform Services, you will need to route their public IP addresses on your internal network to force traffic across direct connect circuits. Network engineers who have always routed internet-facing traffic with a default route injected into their IGP will have to rethink their routing design to get full use of direct connectivity into the cloud.


I Thought the Cloud was Simple

The prevailing cloud messaging tells us that the cloud makes infrastructure simpler. There is some truth in this view from a developer’s perspective. However, for the network engineer, the cloud brings new connectivity challenges and forces us to think differently about how to engineer traffic through our networks. As you look to integrate cloud services into your on-premises data center, read up on the documentation from your cloud service provider and brush up on BGP. These tools will position you to address whatever challenges the cloud throws your way.

It's good to be home after two weeks on the road, and just in time for a foot of snow. How do I unsubscribe from Winter?


As always, here are some links from the Intertubz that I hope will hold your interest. Enjoy!


Geek Squad's Relationship with FBI Is Cozier Than We Thought

If you are committing a crime, and someone finds out you are committing a crime, they have every right to notify the authorities. There is no such thing as a "Geek Squad client privilege."


Audit finds Department of Homeland Security's security is insecure

They literally have the word "Security" in their title. Oh, the irony.


MoviePass removes 'unused' location feature that tracked cinema-goers' movements

But not until after their CEO bragged about how they were tracking everyone. Stay classy, MoviePass.


Half of All Orgs Hit with Ransomware in 2017

Well, half of the folks that reply to this survey, sure. But clickbait headlines aside, there is one important fact in this article. The fact that paying a ransom does not guarantee you get your data. If you don't have backups, you will have problems.


Fake News: Lies spread faster on social media than truth does

There's a link in this article to the MIT research paper, which is a bit of a longer read, but worth your time if you are interested. I think humans have a need to be "in the know" ahead of others, and this leads to our innate desire to spread false information faster than the truth (because we assume the truth is already known, or boring, perhaps). I think Paul McCartney said it best: "Sunday's on the phone to Monday. Tuesday's on the phone to me."


Waymo self-driving trucks are hauling gear for Google data centers

It's either self-driving trucks, or filming for the Maximum Overdrive reboot has begun.


Cory Doctorow: Let's Get Better at Demanding Better from Tech

This. So much this. The world of tech advances at an accelerated rate. It's time we find a way to demand better from tech, before we dig too deep a hole.


LIve-action footage of me shoveling snow yesterday:


Risk Management is an important part of IT. Being able to identify risks and remediation options can make a huge difference if or when disaster strikes. If you've moved part of or all of your enterprise to Office 365, you now have no control over a large portion of your IT environment. But what sorts of risks do you face, and how do you deal with them?




It has happened in the past where Office 365 has become unavailable for one reason or another. There is also a very high likelihood of it happening again in the future. One of the great things about using a cloud-based platform such as Office 365 is that enterprise IT doesn't need to maintain large amounts of the infrastructure. One of the big downfalls is that is still their problem to deal with. But what sorts of implication could this have?


What is your organization's plan if, all of a sudden, Exchange Online is unavailable? Will it grind things to a halt, or will it be a minor inconvenience? The same holds true for services such as SharePoint. If all of your critical marketing material is in SharePoint Online and the service goes down, will your salespeople be left high and dry?




Not all risk is equal. Chances are that the risk of a user deleting a document won't have the same impact as something like inbound email coming to a halt. That is why you need to measure these risks. You'll want to consider the likelihood of an event occurring, and what the impact will be if it does.


Why is this step important? By performing an assessment, you'll be able to identify areas that you can mitigate, or possibly eliminate, risks. Knowing their impact is extremely important to justify priorities, as well as budgets.




As enterprise customers, we can't control how Microsoft maintains their services. But what we can do is understand what our critical business processes are, and build contingency plans for when things fall apart.


Let's use an inaccessible Exchange Online service as an example. How can you mitigate this risk? If you are running a hybrid deployment, you might be able to leverage your on-premises services to get some folks back up and running. Other options might be services from Microsoft partners. There are, for example, services that allow you to use third-party email servers to send and receive emails if Exchange Online goes offline. When service returns, the mailboxes are merged, and you can keep chugging along like nothing happened.


If you measured your risks ahead of time, you'll hopefully have noted such a possibility.




Service availability isn't the only risk. Data goes missing. Whether it is "lost," accidentally deleted, or maliciously targeted, data needs to be backed up. If you've moved any data into Office 365, you need to think about how are you going to back it up. Not only that, but what if you have to do a large restore? How long would it take you to restore 1 TB of data back into SharePoint? What impact would that window have on users?


Although a lot of the "hands-on" management is removed from IT shops when they migrate to Office 365, that doesn't mean that their core responsibilities are shifted. At the end of the day, IT staff are responsible for making sure that users can do their jobs. Just because something is in the cloud doesn't mean that it will be problem free.

By Paul Parker, SolarWinds Federal & National Government Chief Technologist


Here is an interesting article from my colleague Joe Kim, in which he explores database heath and performance.


Part of the problem with managing databases is that many people consider database health and performance to be one and the same, but that’s not necessarily the case. Let’s take a closer look at these terms.


Health versus performance: What’s the difference?


Health and performance are certainly closely related, even interconnected. But assuming they are one in the same is potentially a recipe for disaster. If you’re homed in exclusively on your database’s health, you may be overlooking critical metrics that affect your database’s performance. Here’s why:


Database health is inclusive of data points. When you take into consideration such factors as CPU utilization, I/O statistics, and memory pressure, you can determine if your database is capable of proper performance. But these metrics alone cannot confirm that the system’s performance is running optimally.


Database performance integrates an element of time measurement to explain how database queries are being executed. It’s this time component that comes into play when talking about true performance.


Diagnosing the root cause: Database performance management best practices


Identifying the true root cause of database performance issues is the goal of every federal database manager. And yet, without the proper metrics in hand, you lack the tools necessary to resolve more comprehensive problems.


That said, let’s take a closer look at some best practices that take into account both health and performance to create efficient, well-optimized database processes.


Acquire data and metrics. You need granular metrics like resource contention and a database’s workload to identify the root cause of a performance issue. Without good, deep intelligence, you lack the ability to troubleshoot accurately and effectively.


Establish meaningful data management. Every database manager has his or her own way of arranging data, but the key is to arrange it in a way that will help you quickly identify and resolve the root cause of a potential problem. Establishing a system that allows you to do so quickly can help keep your databases running efficiently.


Triangulate issues. The ability to triangulate makes it easy to answer all-important questions regarding who, what, when, where, and why. These questions help you determine the details of a performance issue. Understanding who and what was impacted by poor performance and what caused the impact are important to know.


Review execution plans. Query optimizers are critical database components that analyze Structured Query Language (SQL) queries and determine efficient execution for those queries. The problem is that optimizers can be a bit of a black box; it’s often difficult to see what’s going on inside of them.


Establish a baseline. It’s impossible to tell if your database isn’t performing optimally if you lack a baseline of normal, day-to-day performance.


It’s a marathon, not a sprint


IT pros want their databases to be in good health and to perform optimally. While both are equally important, it’s the end result that matters.


So, make sure you are looking at all criteria of database health and performance. If you are deploying the best practices and tools to help ensure the overall health and performance of your database, your stakeholders will thank you for it.


Find the full article on our partner DLT’s blog Technically Speaking.

Leon Adato

Traveling With Joy

Posted by Leon Adato Expert Mar 12, 2018

Recently, two people I respect very much tweeted about travel, and how to remain positive and grateful while you do it. You can read those tweets here ( and here (


When I saw Jessica's first tweet, I wanted to respond, but thought, "She doesn't need my noise in her twitter feed. But when Josh jumped in with his thoughtful response, I had to join in. If you prefer tweets, you can find the starting point here. For old-fashioned folks who still like correct spelling, complete sentences, and non-serialized thoughts, read on:


First, you need to understand that I have some very strong opinions about how someone should carry themselves if they are lucky enough to get to do "exciting" travel for work. When I say exciting travel, I mean:

  • Travel to some place that YOU find exciting
  • Travel that someone ELSE might find exciting


Here's why I feel so strongly:


As I've written before (, my Dad was a musician. His combination of talent, youth, and connections (mostly talent) gave him the opportunity to join a prestigious orchestra, one that traveled extensively from the time he joined (in 1963) until he retired 46 years later. My dad went everywhere. He was escorted through Checkpoint Charlie twice in the 60s. He wandered around cold-war, iron-curtain Moscow around the same time. He traveled to Australia, Mexico, all over Europe, and, of course, to almost every state in the United States.


It was a charmed life. To be sure, he worked hard to get where he was and made sacrifices along the way. But at the end of the day, he got to play great music with talented colleagues in front of sell-out audiences around the world. It was SO remarkable, that people sometimes had a hard time believing that was all he did.


Because I would "go to work" with him from time to time (which meant a lot of sitting in the green room, wandering backstage, and standing next to him during intermission when he'd come out for some fresh air, I was privy to him meeting audience members without really being part of their conversation, which would often follow a very specific pattern:


"So what do you do during the day?" they'd ask, figuring that he--like the musicians they probably knew--did this as a side gig while they worked an office job or plied a trade to pay the bills. When they found out that this was ALL he did, that he got paid a living wage to perform music, their sense of amazement increased. That's when they would begin asking (i.e. gushing) about the traveling. While some of these people were well-off, many were folks who often had never left the state where they were born, let alone the country, let alone been on a plane. That's when it became hard to watch.


He'd shrug and say, "I get on a plane, sleep, get off the plane, get on the bus, go to the hall, rehearse, eat, play the concert, get on a bus, go to the next town, sleep, get up, rehearse, eat, play. I could be in Timbuktu or Topeka."


From my fly-on-the-wall vantage point, I'd watch the other person deflate. They had hoped to feel a sense of wonder imagining the exotic, the special. Instead, they had the dawning recognition that they might as well have been talking to a plumber about the stores he visits. (No disrespect to plumbers. You folks rock.)


As I grew up and settled into a career in IT, I never thought I'd have the kind of work that would give me opportunities to travel the way my dad did. Which is why, years later, I stood crying under the Eiffel tower. Not because of the wonder of the structure, but for the miracle that I was standing there AT ALL. I was overwhelmed by the sheer impossible magic of being in a role where traveling from Cleveland, Ohio to Paris was possible in any context other than a once-in-a-lifetime, piggy-bank-breaking vacation.


A three-month project in Brussels followed Paris. A year in Switzerland came after that. In between were shorter trips, no less inspiring for being closer to home. Just getting onto a plane and taking off was an adventure in itself.


And through it all were the people. As Jessica said in her tweet, "Thousands of unseen humans help me get to my destination." I was meeting these people, hearing their stories, and being asked to tell mine.


In those moments--in the Lyft on the way to the airport; checking in at the hotel; sitting next to someone on the shuttle to the car rental area--I'm reminded of those moments when I stood next to my dad during intermission. While there are many things about the man that I admire, he's not infallible, and there are definitely habits of his that I choose not to emulate. This is one of them.


So I try to write (sometimes more than is strictly required of me) when I go to new and different places. When I have the time and focus, I write before I go about what I hope to see/do/learn; and then I write again afterward, detailing what I saw, who I met, and how it went.


As Head Geek for SolarWinds, I write these essays partly because it's actually my job. (Best. Job. Ever.) But I also do it because I'm aware that jobs like mine are unique. I want to provide a vicarious experience for those who might want it, so that they can share a sense of wonder about the exotic, the special.


I also write so that, if someone has chosen to forego these types of opportunities, either due to ambivalence, anxiety, or uncertainty, that maybe they might find motivation, reassurance, or insight; that in reading about my experiences, they might realize they have more to gain than they thought.


Finally, I write about my travels for myself. To remind me that, like both Jessica and Josh said, in each trip, thousands of things go right and thousands of people are helping me get where I need to go. To remind me of the wonder, the exotic, the special.


And the blessing.

In the final blog of this series, we’ll look at ways to integrate Windows event logs with other telemetry sources to provide a complete picture of a network environment. The most common way of doing this is by forwarding event logs to a syslog server or SIEM tool.


The benefits of telemetry consolidation are:

  1. Scalability and performance – log collectors are built for and focused on collecting logs.
  2. False Positive Reduction – some events, even if they generate an alert, are not meaningful on their own. By combining them with other events in a query, the security analyst can determine if there was a compromise. For example multiple login failures on their own must be examined in conjunction with other events to rule out threat versus driver error.
  3. Determination of the extent of a compromise – attack detected and verified, the next step is to look for lateral movement, the route of entry to the asset initially compromised, any user specific data gleaned from the activities, failure of a security element such as a firewall or IPS to detect the issue, or conversely threat blocked at a specific point due to the successful application of the security policy. Visibility across the breadth of the organization is critical to incident response and remediation.


Windows Event Logs to a Syslog Host and Beyond


The following is an example of forwarding Windows event logs to a syslog server and from there pushing these events to a basic SIEM tool. I’m showing SolarWinds Event Log Forwarder to Kiwi Syslog Server to ELK (Elasticsearch, Logstash, Kibana) because they are great tools for illustrating the process, and they are all free in their basic form, which means you can have some gratuitous fun testing things out.


Step 1: Configure the event log forwarder agent on the host that is collecting the Windows event logs (refer to last week’s blog for configuring forwarding and collection).


Define the transport to the syslog server.



Define the event log subscription, which is the list of events to be sent to the syslog server.


Step 2: The syslog server should be configured to listen on the correct port. It will receive those events defined in the subscription above.


Step 3: The syslog server can be configured to forward events to another device, such as a SIEM tool. The example below shows how to configure an action that will forward the Windows events from the syslog server via syslog to another host. The events may have an RFC 3164 syslog header appended to them to indicate the original IP of the syslog server (useful if NAT may change the source address of the IP datagram), or you can send the syslog message using the IP of the original source of the event. Another option is to use just the original source IP address of the syslog host. This decision often relies on how the receiving host application process and indexes events.



Step 4: Install the SIEM tool, in this case Elasticsearch, Logstash and Kibana, known as the ELK stack, are installed and configured. There are some references for accomplishing this at the end of this blog.


The key concepts to bolt them together include defining a Logstash-simple.config file that takes an input (for example the TCP/514 events coming from your Syslog server), and outputs those to Elasticsearch which indexes your event data. Localhost:9200 is the default setting.


input {

     tcp {

         port => 514



output {

elasticsearch { hosts => ["localhost:9200"] }



Once Kibana is installed it will be your user interface for viewing, indexing, searching and visualizing your events. By default it runs on localhost:5601.



Your Windows logs can then become part of an overall view of all the telemetry sources and types in your network, viewable and searchable through a single interface. This enables you to build queries across all your data types. By correlating events you increase the fidelity of your investigations by adding visibility.



Working example of a threat hunt


The following table summarizes the types of information that can be gathered and analyzed from a single-pane of glass provided by a log aggregator with good search and index capabilities or a SIEM tool or service.

In this case, the initial trigger is a potential suspicious lateral movement within an organization. When investigating such an event, it’s important not to treat it as an isolated incident, even if you receive only one trigger or alert. Correlation is the key to eliminating false positives. Remember the goal is to rule out false positives, and if the threat is legitimate, you must understand the extent of the attack and when and where it began.




Indicators of Compromise


Detect unusual host to host activity

528, 529, 4624, 4625: Type3 (network) or 10 (RDP) login/logout

Network Information:
Collect Calling Workstation information Name:

Source Network Address: IP
Source Port:  Port

Verify Privilege Escalations

552, 4648

Runas or privilege escalation

Account Whose Credentials Were Used:
Account Domain: DOMAIN

Verify Schedule Tasks

602, 4698

Unusual task names scheduled and quickly deleted

Scheduled Task created:
File Name: Name

Command: Cmd

Triggers: When run

Verify PS Exec

601, 4697

Remote code execution at CMD line following service installation

Attempt to install service:
Service Name: Internal Svc Name
Service File Name: path/name
*Service Type: Code
*Service Start Type: Code

Check VirusScan logs on Hosts

Filenames, Process name, Hashes

Activities may have been attempted by other tools on the host detected and blocked.

Check Firewall Policy

Network access policies on AAA devices

Audit logs on other critical assets

Event Timestamps, IPs, Usernames

Determine if a FW or other security element should be modified to stop further attacks based on IP addresses, ports, or other IOCs

Pull Malicious File Hashes

SHA-256 etc Submit to Sandbox or Analysis Tool

Derive other IOCs representative of this malware and search events for other occurrences and better idea of time attack may have started.

Failure of rule-based element

Set of verifiable IoCs

Update rulesets, virus.dat’s, signature sets. Patch known vulnerabilities.

*The sc query command will show you information on the active services on a workstation


From this example you can see it’s a best practice to start small by reacting to the initial trigger and from here collect other important artifacts that will help you cast a wider net across the entire network. Some of these artifacts will also help you to become more proactive as IoCs can be mapped to security policies and rule sets and applied to key security elements.


Windows logs are an important tool in your attack detection toolbox. Hopefully this series has given you some useful information on best practices and deployment.


Recommended References:

I’m in Redmond this week for the Microsoft MVP Summit. This will be my ninth Summit, but I’m as excited as if it was my first. The opportunity to meet with the people that make and ship the bits, provide valued feedback on their products, and connect with other data professionals is something I treasure. Here’s hoping they keep me around for another year.


As always, here are some links from the Intertubz that I hope will hold your interest. Enjoy!


Here’s How Much Money Dropbox Saved by Moving Out of the Cloud

Probably none, because the article does not talk about how much money it costs Dropbox to manage the infrastructure themselves.


China's hypersonic aircraft would fly from Beijing to New York in two hours

That sounds cool and all, but, um, what about the G-force felt during acceleration, deceleration, and turns at that speed? Well, that sounds cool, too. Sign me up.


AI vs. Lawyers

OK, forget self-driving cars. The first thing I want AI to replace are lawyers. These results are encouraging.


The Man Who Claimed to Invent Bitcoin Is Being Sued for $10 Billion

Oh, good. With this case heading to court there’s a chance that under oath someone might unwittingly admit to how Bitcoin is more scam than currency. And I love how they want to be paid in Bitcoin. By the time this case is settled they may get enough to buy a cup of coffee, and the transaction will take six hours to process.


GitHub Hit by 1.35Tbps Memcached DDoS

“Hey, let’s just use GitHub for our source control! It’s FREE!”


New Study Shows 20% of Public AWS S3 Buckets are Writable

Proof that the cloud is just as secure as your own data center. People are going to misunderstand technology no matter where it is hosted.


The Deadlock Empire Slay dragons, master concurrency!

Because I’m a database geek and I want y’all to understand that deadlocks are caused by application code, nothing more. The next time you have a deadlock, don’t blame the DBA. Instead, take a look in the mirror.


Being able to hang out with fellow Microsoft Data Platform geeks for three days is the highlight of my year:


Thus far, we have gone over how to classify our disasters and how to have some of those difficult conversations with our organization regarding Disaster Recovery (DR). We've also briefly touched on Business Continuity, an important piece of disaster recovery. Now the time has come to gather all our information and put together something formal in terms of a Disaster Recovery plan. As easy as it sounds, it can be quite a daunting task once you begin. DR plans, just like their disasters, come in all forms, and you can go as broad or as detailed as you like. There is no real “set in stone” template or set of instructions for DR plan creation. For example, some DR plans may just cover how to get services back up and going at the 100-foot level, maybe focusing on more of a server level. Others may contain application-specific instructions for restoring services, while others cover how to recover from yet another disaster at your secondary site. The point is that it’s your organization's DR plan, so you can do as you like. Just remember that it might not be you, or even your IT department, executing the failover, so the more details the better. That said, I mentioned that once we begin to create our DR plan, it can become quite overwhelming. That is why I always recommend starting at that 100-foot level and circling back to input details later.


So, with all that said, we can conclude that our DR plans can be structured however we wish, and that’s true. A quick Google search will yield hundreds of different templates for DR plans, each unique in their own way. However, to have a legible, solid, successful DR plan, there are five sections it needs to contain.




The introduction of a DR plan is as important as one found in a textbook. Basically, this is where you summarize both the objectives and the scope of the plan. A good introduction will include all the IT services and locations that are protected, as well as the RTOs and RPOs associated with each. Aside from the technical aspect, the introduction should also contain the testing schedule and maintenance scope for the plan, as well as a history of revisions that have been made to the plan.


Roles and Responsibilities


We have talked a lot in this series about including stakeholders and application owners outside of the IT department in our primary discussions. This is the section of the plan where you will formally list all your internal and external departments and personnel who are key to each DR process that has been covered in our DR plan. Remember, execution of this plan is normally run under the event of a disaster, so names are not enough. You need brief descriptions of their duties, contact information, and even alternate contact information to ensure that no one is left in the dark.


Incident Response(s)


This is where you will include how a disaster event is being declared, who has the power to do so, and the chain of communication that shall immediately follow. Remember, we can have many different types of disasters, therefore we can also have many different types of disaster declarations and incident responses. For instance, a major fire will yield a different incident response than that of an attempted ransomware attack. We need to know who is making the declaration, how they are doing so, and whom will be contacted, so on and so forth, down the chain of command.


DR Procedures


Once your disaster has been declared, those outlined within the Roles and Responsibilities can begin to act on steps to bring the production environment back up within your secondary location. This is where those procedures and instructions are laid out, step by step, for each service that is identified within the plans’ scope. A lot of IT departments will jump right into this step, and this where our plan creation can tend to get out of control. A rule of thumb is to really start broad with your process, define any prerequisites, and then dive into details. Once you are done with that, you can circle back for yet another round of details.


For example, “Recover Accounting Services” may be a good place to start. You then can dive into the individual servers that support the service as a whole, listing out all the servers (names, IPs, etc.) you need to have available. You can then get into finer details about how to get each server up and running to support the service as a whole. Even further, you may need to make changes to the application for it to run at your secondary location (maybe you have a different IP scheme, different networks, etc.), or have support for external hardware, such as a fax server to send out purchase orders.




This is where you place a collection of any other documents that may be of value to your organization in the event of a disaster. Vendor contacts, insurance policies, support contracts, can all go into an appendix. If there is a certain procedure to recover a server (for example, you use the same piece of software to protect all services), and you've already provided--in the DR Procedures section--an exhaustive list of instructions, you can always add it here as well, and simply reference it from within the DR plan.


With these five sections filled out, you should be certain that your organization is covered in the event of a disaster. A challenge, however, may be keeping your document up to date as your production environment changes. Today’s data centers are far from the static providers they once were. We are always spinning up new services, retiring old ones, moving things to and from the cloud. Every time that happens--to be successful in DR--we need to reassess that service within our DR plan. It needs to be a living document, right from its creation, and must always be kept up to date! And remember, it’s your DR plan, so include any other documents or sections that you or your organization wants to. At the end of the day, it’s better to have more information available than not enough, especially if you aren’t the person responsible for executing it! Also, please store a copy of this at your secondary location and/or in the cloud. I’ve heard too many stories of organizations losing their DR plan along with their production site.


I’d love to hear your thoughts about all this! How do you structure your DR plans? Are you more detailed or broader in terms of laying out the instructions to recover? Have you ever had to execute a DR plan you weren’t a part of? If so, how did that change your views on creating these types of procedures and documents? Thanks for reading!

In the age of exploration, cartographers used to navigate around the world and map the coastlines of unexplored continents. The coastline of IT, and moreover the inner landscapes and features, has become much more complex than a decade ago. The cost and effort needed to perform adequate mapping the old way has gone way upwards, and manual mapping is no longer an affordable endeavor, save for a productive one. Organizations and administrators need a solution to the problem, but where to start from?


To continue on this analogy, explorers of old had a few things to help themselves: maps of the known world, navigation instruments and the stars. They also set sail to discover the vast world and uncover its riches, at the price that most of us know now. Back to our modern world: our goal is to understand which services are critical to a business service, and the reason why we want to understand this is clear. We want to ensure the delivery of IT services with the best possible uptime and performance, without disruptions if possible.


It’s essential to start from the business service view. We need to base ourselves, like explorers of old, on existing maps and features as a reference point. Each organization will have its own way of documenting (hopefully), but the most probable starting point would be a service Business Impact Assessment (BIA). The BIA would give a description of upstream and downstream dependencies of a given service, application platforms (and eventually named systems) involved in supporting the service. From there, we can eventually be led to documentation that describes an application, its components, architecture, and systems.


Creating and maintaining a catalog of business impact assessments diverges from the usual kind of works IT personnel does. It might not even be a purely IT endeavor, as compliance departments in larger organizations may own the process. Nevertheless, it is essential that IT is involved because a BIA is the ideal place to capture criticality requirements. It helps articulate how a given process or service impacts the organization’s ability to conduct business operations, assess how the organization is impacted in case of failure, and determine the steps to recover the service. Capturing adverse impact is a key activity because it helps to classify the criticality of the service itself in case of failure. Impact can be financial (loss of revenue, loss of business), reputational (loss of trust from investors/ customers/partners, press scrutiny), or regulatory (loss of trust from regulatory bodies/legislative authorities, regulatory scrutiny, regulatory audits, and eventually even revocation of license to operate in a given country/region for regulated businesses).


The inconvenience with any BIA or written document is that they are a point-in-time description of a service, which is cast in stone until the next documentation revision date. Therefore it is a necessity to engage with the business process owners, and eventually with application teams, to understand if any changes were introduced. While this allows for a better view of the current state, it has the disadvantage of being a manual process with a lot of back-and-forth interactions. Another challenge we might encounter is that the BIA strictly covers a single process, without mentioning any of the upstream/downstream dependencies, or perhaps mentioning them, but without referring to any document (because there was no BIA done for another service, for example). It might also be impossible to even get one done, because a given process could rely on a third-party service or data source, over which we have no control.


There’s also another challenge looming: Shadow IT. Shadow IT broadly characterizes any IT systems that support an organization’s business objectives, but fall outside of IT scope either by omission or by a deliberate will to conceal the existence of such systems to IT. Because these systems exist outside of a formally documented scope, or are not known to IT organizations, it is very difficult to assert their criticality, at least from an IT standpoint. Portions of business processes or entire business divisions may be leveraging external or third-party services, upon which IT has no oversight or control, and yet IT would be held responsible in case of failure.


How can IT understand the criticality of a given application service in the context of a business service when the view is incomplete or even unknown?


  • From a business perspective, the organization leadership should assert or reassert IT’s role in the organization’s digital strategy, by making IT the one-stop shop for all IT related matters. Roles and responsibilities must be well established, and the organization’s leadership (CIO / CTO) should take an official stance on how to handle shadow IT projects.
  • From a compliance perspective, clear processes must be established about services & systems documentation. The necessity to document business processes and underlying technical systems / platforms is evident, critical services from a business perspective should be documented via Business Impact Analysis and collected/regularly reviewed in the documentation that covers the organization business continuity strategy (usually a Business Continuity Plan).
  • From a technical perspective, the IT organization should be involved into compliance / documentation processes not only for review purposes but also to provide the technical standpoint and provide the necessary technical steps that fall under the Business Continuity/Disaster Recovery strategy.


To encompass these three perspectives, regular checkpoints, meetings or review can help maintain the consistency of the view and the strategy. Is this however sufficient? Unfortunately, not always. Those concepts work perfectly with consistent and stateful processes/systems, but the gradual advent of ephemeral workloads that can be spinned up or scaled down on demand becomes difficult to keep full track.


While a well-defined documentation framework is necessary to establish processes that must be adhered to, and while documented processes with prioritization and criticality levels are essential, it is also necessary to complement this approach with a dynamic and real-time view of the systems.


Modern IT operations management tools should allow the grouping of assets not only by category or location, but also by logical constructs, such as an application view or even a process view. These capabilities have existed in the past, but were always performed manually. Advanced management platforms should leverage traffic flow monitoring capabilities to understand which systems are interacting together, and logically group them based on traffic types. This requires a certain level of intelligence built into the tool. For example, in a Windows-based environment, many systems will communicate with the Active Directory domain controllers, or with a Microsoft Systems Center Configuration Manager installation. The existence of traffic between multiple servers and these servers doesn’t necessarily imply an application dependency. The same could be said on a Linux environment where traffic happens between many servers and an NTP server or a yum repository. On the other hand, traffic via other ports could hint at application relationships. A web server communicating with another server via port 3306 would probably mean a MySQL database is being accessed and would constitute plausible evidence of an application dependency.


Knowing which services are critical to a business service doesn’t require the use of a Palantir. It should be a wise blend of relying on solid business processes and on modern IT operations management platforms, with a holistic view of interactions between multiple systems and intelligent categorization capabilities.

IT organizations manage security in different ways. Some companies have formalized security teams with board-level interest. In these companies, the security team will have firm policies and procedures that apply to network gear. Some organizations appoint a manager or director to be responsible for security with less high-level accountability. Smaller IT shops have less formal security organizations with little security-related accountability. The security guidance a network engineer receives from within their IT organization can vary widely across the industry. Regardless of the direction a network engineer receives from internal security teams, there are reasonable steps he or she can take to protect and secure the network.


Focus on the Basics


Many failures in network security happen due to a lack of basic security hygiene. While this problem extends up the entire IT stack, there are basic steps every network engineer should follow. Network gear should have consistent templated configuration across your organization. Ad-hoc configurations, varying password schemes, and a disorganized infrastructure opens the door for mistakes, inconsistencies, and vulnerabilities. A well-organized, rigorously implemented network is much more likely to be a secure network.


As part of the standard configuration for your network, pay special attention to default passwords, SNMP strings, and unencrypted access methods. Many devices ship with standard SNMP public and private communities. Change these immediately. Turn off any unencrypted access methods like telnet or unsecure web (http). If your organization doesn't have a corporate password vault system, use a free password vault like KeePass to store enable passwords and other sensitive access information. Don't leave a password list lying around, stored on Sharepoint, or unencrypted on a file share. Encrypt the disk on any computer that stores network configurations, especially engineer laptops which can be stolen or left accidentally.


To Firewall or Not to Firewall


While many hyperscalers don't use firewalls to protect their services, the average enterprise still uses firewalls for traffic flowing through their corporate network. It's important to move beyond the legacy layer 4 firewall to a next-generation, application-aware firewall. For outbound internet traffic, organizations need to build policy based on more than the 5-tuple. Building policies based on username and application will make the security posture more dynamic without compromising functionality.


Beyond the firewall, middle boxes like load balancers and reverse-proxies have an important role in your network infrastructure. Vulnerabilities, weak ciphers, and misconfigurations can leave applications and services wide open for exploit. There are many free web-based tools that can scan internet-facing hosts and report on weak ciphers and easy-to-spot vulnerabilities. Make use of these tools and then plan to remediate the findings.


Keep A Look Out for Vulnerabilities


When we think of patch cycles and vulnerability management, servers and workstations are top of mind. However, vulnerabilities exist in our networking gear too. Most vendors have mailing lists, blogs, and social media feeds where they post vulnerabilities. Subscribe to the relevant notification streams and tune your feed for information that's relevant to your organization. Make note of vulnerabilities and plan upgrades accordingly.


IT security is a broad topic that must be addressed throughout the entire stack. Most network engineers can't control the security posture of the endpoints or servers at their company but they do control networking gear and middle boxes which have a profound impact on IT security. In most instances, you can take practical, common sense steps that will dramatically improve your network security posture.

Filter Blog

By date: By tag:

SolarWinds uses cookies on its websites to make your online experience easier and better. By using our website, you consent to our use of cookies. For more information on cookies, see our cookie policy.