Skip navigation
1 2 3 4 Previous Next

Geek Speak

2,160 posts

Root Cause.png

 

I remember the largest outage of my career. Late in the evening on a Friday night, I received a call from my incident center saying that the entire development side of my VMware environment was down and that there seemed to be a potential for a rolling outage including, quite possibly, my production environment.

 

What followed was a weekend of finger pointing and root cause analysis between my team, the virtual data center group, and the storage group. Our org had hired IBM as the first line of defense on these Sev-1 calls. IBM included EMC and VMware in the problem resolution process as issues went higher up the call chain, and still the finger pointing continued. By 7 am on Monday, we’d gotten the environment back up and running for our user community, and we’d been able to isolate the root cause and ensure that this issue would never come again. Others, certainly, but this one was not to recur.

 

Have you experienced similar circumstances like this at work? I imagine that most of you have.

 

So, what do you do? What may seem obvious to one may not be obvious to others. Of course, you can troubleshoot the way I do. Occam’s Razor or Parsimony are my courses of action. Try to apply logic, and force yourself to choose the easiest and least painful solutions first. Once you’ve exhausted those, you move on to the more illogical, and less obvious.

 

Early in my career, I was asked what I’d do as my first troubleshooting maneuver for a Windows workstation having difficulty connecting to the network. My response was to save the work that was open on the machine locally, then reboot. If that didn’t solve the connectivity issue, I’d check the cabling on the desktop, then the cross-connect before even looking at driver issues.

 

Simple parsimony, aka economy in the use of means to an end, is often the ideal approach.

 

Today’s data centers have complex architectures. Often, they’ve grown up over long periods of time, with many hands in the architectural mix. As a result, the logic as to why things have been done the way that they have has been lost. As a result, the troubleshooting toward application or infrastructural issues can be just as complex.

 

Understanding recent changes, patching, etc., can be an excellent way to focus your efforts. For example, patching Windows servers has been known to break applications. A firewall rule implementation can certainly break the ways in which application stacks can interact. Again, these are important things to know when you approach troubleshooting issues.

 

But, what do you do if there is no guidance on these changes? There are a great number of monitoring software applications out there that can track key changes in the environment and can point the troubleshooter toward potential issues. I am an advocate for the integration of change management software into help desk software and would like to add to that some feed toward this operations element with some SIEM collection element. The issue here has to do with the number of these components already in place at an organization, and with that in mind, would the company desire changing these tools in favor of an all-in-one type solution, or try to cobble pieces together. Of course, it is hard to discover, due to the nature of enterprise architectural choices, a single overall component that incorporates all of the choices made throughout the history of an organization.

 

Again, this is a caveat emptor situation. Do the research and find out a solution that best solves your issues, determines an appropriate course of action, and helps to provide the closest to an overall solution to the problem at hand.

sqlrockstar

The Actuator - June 7th

Posted by sqlrockstar Employee Jun 7, 2017

Data security and privacy links take center stage this week. I didn't intend for that to happen, it just did. I'm guessing we are going to see an uptick in incidents being reported, which is different than saying there is an uptick in incidents as a whole. I believe people are more cognizant of data security and privacy matters and as a result we are seeing increased reporting.

 

As always, here are some links from the Intertubz that I hope will hold your interest. Enjoy!

 

Ransomware: Best Practices for Prevention and Response

A nice summary for you to convert into a checklist in an effort to minimize your risk from being a victim of ransomware.

 

Fireball Malware Infects 20% of Corporate Networks Worldwide
Interesting note here: Adware can spread just as malware would, but it isn’t considered illegal. And the result of not treating adware as a virus are things like Fireball.

 

The seven deadly sins of statistical misinterpretation, and how to avoid them
Because the future for data professionals is data analytics, and I want you to know about these simple mistakes that are all too common.

 

Building a Slack bot for channel topic detection using word embeddings
And I thought I was impressed when Outlook tells me that I forgot an attachment to an email, this looks like a real value-add.

 

OneLogin: Breach Exposed Ability to Decrypt Data
This is why we can’t have nice things. It’s time to move away from the use of passwords.

 

International data privacy laws create inconsistent rules
It’s almost as if the lawyers are passing contradictory laws to make certain they have billable hours for the next ten years.

 

The next time you are frustrated with some piece of code I want you to stop and think about how lucky you are that you didn't need to ever lookup the 10th character of a VIN number many times a day:

IMG_4240.JPG

SAP® recently held their annual Americas’ SAP Users’ Group (ASUG) Sapphire Now celebration in Orlando, which attracted more than 35,000 executives, subject matter experts, sales and public relations personnel, as well as a whole bunch of SAP customers. They all converged on the Orlando Convention Center for four days to celebrate, collaborate, network, and innovate. Yours truly was a speaker for the “Using the Right SAP Support Services and Tools at the Right Time” session.

 

The Monday afternoon event, “A Call to Lead,” kicked off the conference with special guests former First Lady Michelle Obama and former President George W. Bush leading a discussion about diversity and equality in the workplace. (George Bush is hilarious, and, like the former first lady, a wonderful and charismatic speaker.) Tuesday morning’s keynote was delivered by SAP CEO Bill McDermott, who was joined on stage by Dell® Technologies founder and CEO Michael Dell. Bert and John Jacobs, brothers and co-founders of the Life is Good® clothing line, spoke in the afternoon, ending their presentation by throwing frisbees into the crowd. Wednesday morning, Hasso Plattner co-founder and chair of SAP’s supervisory committee presented, followed by appearances by Derek Jeter and Kobe Bryant on Thursday morning. That night, the British band Muse wrapped up the conference with a special performance.

 

When Hasso spoke, nearly everyone in the vast conference center stopped to listen. Hasso shared his thoughts on the future of IT, technology, and business, where it is all going, and how the driving forces behind these progressions are being shaped. While SAP ERP software runs on well-recognized technology, the conference did not focus solely on technology. SAPPHIRE targeted opportunities provided by the latest technological trends that drive businesses forward. SAP, vendors, and customers heard from people in human resources, finance, operations, supply chain, IT, and more. The vendor space was immense, and crackling with energy. A wide range of vendors, representing SAP HANA, cloud, integration services, managed, pretty much what you are pitched at every conference and many that you THWACKsters recognize: Microsoft®, VMware®, Dell, Cisco®, AWS®, Google® Cloud, and many more.

 

So why am I blogging about this? Like most THWACK® followers and contributors, I work in IT. I care about bits and bytes and blinking lights. Apparently, there are 22-25
fellow THWACKsters who have SAP running in their environment. And while the technology is critical to the IT professional, the experienced pros have learned that it is equally important to understand the company’s goals and how their work is aligned with them. Oddly enough, I sat in on several customer presentations, and SolarWinds® was featured on more than one slide deck (spelled incorrectly – Solarwinds --  every time! Grr!). SAP and SolarWinds share a common trait. They thrive on inspiring their customers to innovate and lead. SAP’s user group, ASUG, like THWACK, is a force to be reckoned with.

 

So will we see SolarWinds at Sapphire next year? Or maybe SAP’s more tech-y conference, TechEd? Here’s hoping!

cat-hiding-2872.jpg

 

 

 

 

Hey, guys! This week I’d like to share a very recent experience. I was troubleshooting, and the information I was receiving was great, but it was the context that saved the day! What I want to share is similar to the content in my previous post, Root Cause, When You're Neither the Root nor the Cause, but different enough that I thought I'd pass it along.

 

This tale of woe begins as they all do, with a relatively obscure description of the problem and little foundational evidence. In this particular case it was, “The internet wasn't working on the wireless, but once we rebooted, it worked fine.” How many of us have had to deal with that kind of problem before? Obviously, all answers lead to, “Just reboot and it’ll be fine." While that’s all fine and dandy, it is not acceptable, especially at the enterprise level, because it offers no real solution. Therefore, the digging began.

 

The first step was to figure out if I could reproduce the problem.

 

I had heard that it happened with some arbitrary mobile device, so I set up shop with my MacBook, an iPad, my iPhone and my Surface tablet. Once I was all connected, I started streaming content, particularly the live YouTube stream of The Earth From Space. It had mild audio and continuous video streaming that could not buffer much or for long.

 

The strangest thing happened in this initial wave of troubleshooting. I was able to REPRODUCE THE PROBLEM! That frankly was pretty awesome. I mean, who could ask for more than the ability to reproduce a problem! Though the symptoms were some of the stranger parts, if you want to play along at home, maybe you can try to solve this as I go. Feel free to chime in with something like, “Ha ha! You didn’t know that?" It's okay. I’m all for a resolution.

 

The weirdest part of this resolution was that for devices connecting on lower wireless bands, 802.11A, 802.11N, things were working like a champ, or seemingly working like a champ. They didn’t skip a beat and were working perfectly fine. I was able to reproduce it best with the MacBook connected at 802.11AC with the highest speeds available. But seemingly, when it would transfer from one APS channel to another AP on another channel, poof, I would lose internet access for five minutes. Later, it was proven to be EXACTLY five minutes (hint).

 

At the time though, like any problem in need of troubleshooting, there were other issues I needed to resolve because they could have been symptoms of this problem. Support even noted that these symptoms relate to a particular problem that was all fine and dandy when adjusted in the direction I preferred.  Alas, they didn’t solve my overwhelming problem of, “Sometimes, I lose the internet for EXACTLY five minutes.” Strange, right?

 

So, I tuned up channel overlap, modified how frequent devices will roam to a new access point and find their new neighbor, cleaned up how much interference there was in the area, and got it working like a dream. I could walk through zones transferring from AP to AP over and over again, and life seemed like it was going great. But then, poof, it happened again. The problem would resurface, with its signature registering an EXACT five-minute timeout.

 

This is one of those situations where others might say, “Hey, did you check the logs?” That's the strange part. This problem was not in the logs. This problem transcended mere logs.

 

It wasn’t until I was having a conversation one day and said, “It’s the weirdest thing. The connection with a full wireless signal, with minimal to no interference and nothing erroneous showing in the logs would just die, for exactly five minutes.” My friend chimed in, “I experienced something similar once at an industrial yard. The problem would surface when transferring from one closet-stack to another closet-stack, and the tables for Mac Refresh were set to five minutes. You could shorten the Mac Refresh timeout, or simply tunnel these particular connections back to the controller."

 

That prompted an A-ha moment (not the band) and I realized, "OMG! That is exactly it." And it made sense. In the earlier phases of troubleshooting, I had noted that this was a condition of the problem occurring, but I had not put all of my stock in that because I had other things to resolve that seemed out of place. It’s not like I didn’t lean on first instincts, but it’s like when there’s a leak in a flooded basement. You see the flooding and tackle that because it’s a huge issue. THEN you start cleaning up the leak because the leak is easily a hidden signal within the noise.

 

In the end, not only did I take care of the major flooding damage, but I also took care of the leaks. It felt like a good day!

 

What makes this story particularly helpful is that not all answers are to be found within an organization and their tribal knowledge. Sometimes you need to run ideas past others, engineers within the same industry, and even people outside the industry. I can’t tell you the number of times I've talked through some arbitrary PBX problem with family members. Just talking about it out loud and explaining why I did certain things caused the resolution to suddenly jump to the surface.

 

What about you guys? Do you have any stories of woe, sacrifice, or success that made you reach deep within yourself to find an answer? Have you had the experience of answers bubbling to the surface while talking with others? Maybe you have other issues to share, or cat photos to share. That would be cool, too.

I look forward to reading your stories!

In this post, part of a miniseries on coding for non-coders, I thought it might be interesting to look at a real-world example of breaking a task down for automation. I won't be digging hard into the actual code but instead looking at how the task could be approached and turned into a sequence of events that will take a sad task and transform it into a happy one.

 

The Task - Deploying a New VLAN

 

Deploying a new VLAN is simple enough, but in my environment it means connecting to around 20 fabric switches to build the VLAN. I suppose one solution would be to use an Ethernet fabric that had its own unified control plane, but ripping out my Cisco FabricPath™ switches would take a while, so let's just put that aside for the moment.

 

When a new VLAN is deployed, it almost always also requires that a layer 3 (IP) gateway with HSRP is created on the routers and that VLAN needs to be trunked from the fabric edge to the routers. If I can automate this process, for every VLAN I deploy, I can avoid logging in to 22 devices by hand, and I can also hopefully complete the task significantly faster.

 

Putting this together, I now have a list of three main steps I need to accomplish:

 

  1. Create the VLAN on every FabricPath switch
  2. Trunk the VLAN from the edge switches to the router
  3. Create the L3 interface on the routers, and configure HSRP

 

Don't Reinvent the Wheel

 

Much in the same way that one uses modules when coding to avoid rewriting something that has been created already, I believe that the same logic applies to automation. For example, I run Cisco Data Center Network Manager (DCNM) to manage my Ethernet fabric. DCNM has the capability to deploy changes (it calls them Templates) to the fabric on demand. The implementation of this feature involves DCNM creating an SSH session to the device and configuring it just like a real user would. I could, of course, implement the same functionality for myself in my language of choice, but why would I? Cisco has spent time making the deployment process as bulletproof as possible; DCNM recognizes error messages and can deal with them. DCNM also has the logic built in to configure all the switches in parallel, and in the event of an error on one switch, to either roll back that switch alone or all switches in the change. I don't want to have to figure all that out for myself when DCNM already does it.

 

For the moment, therefore, I will use DCNM to deploy the VLAN configurations to my 20 switches. Ultimately it might be better if I had full control and no dependency on a third-party product, but in terms of achieving the goal rapidly, this works for me. To assist with trunking VLANs toward the routers, in my environment the edge switches facing the routers have a unique name structure, so I was also able to tweak the DCNM template so that if it detects that it is configuring one of those switches, it also adds the VLANs to the trunked list on the relevant router uplinks. Again, that's one less task I'll have to do in my code.

 

Similarly, to configure the routers (IOS XR-based), I could write a Python script based on the Paramiko SSH library, or use the Pexpect library to launch ssh and control the program's actions based on what it sees in the session. Alternatively, I could use NetMiko which already understands how to connect to an IOS XR router and interact with it. The latter choice seems like it's preferable, if for no other reason than to speed up development.

 

Creating the VLAN

 

DCNM has a REST API through which I can trigger a template deployment. All I need is a VLAN number and an optional description, and I can feed that information to DCNM and let it run. First, though, I need the list of devices on which to apply the configuration template. This information can be retrieved using another REST API call. I can then process the list, apply the VLAN/Description to each item and submit the configuration "job." After submitting the request, assuming success, DCNM will return the JobID that was created. That's handy because it will be necessary to keep checking the status of that JobID afterward to see if it succeeded. So here are the steps so far:

 

  • Get VLAN ID and VLAN Description from user
  • Retrieve list of devices to which the template should be applied
  • Request a configuration job
  • Request job status until it has some kind of resolution (Success, Failed, etc)

 

Sound good? Wait; the script needs to login as well. In the DCNM REST API that means authenticating to a particular URL, receiving a token (a string of characters), then using that token as a cookie in all future requests within that session. Also, as a good citizen, the script should logout after completing its requests too, so the list now reads:

  • Get VLAN ID and VLAN Description from user
  • Authenticate to DCNM and extract session token
  • Retrieve list of devices to which the template should be applied
  • Request a configuration job
  • Request job status until it has some kind of resolution (Success, Failed, etc)
  • Log out of DCNM

 

That should work for the VLAN creation but I'm also missing a crucial step which is to sanitize and validate the inputs provided to the script. I need to ensure, for example, that:

 

  • VLAN ID is in the range 1-4094, but for legacy Cisco purposes perhaps, does not include 1002-1005
  • VLAN Description must be 63 characters or less, and the rules I want to apply will only allow [a-z], [A-Z], [0-9], dash [-] and underscore [_]; no spaces and odd characters

 

Maybe the final list looks like this then:

 

  • Get VLAN ID and VLAN Description from user
  • Confirm that VLANID and VLAN Description are valid
  • Authenticate to DCNM and extract session token
  • Retrieve list of devices to which the template should be applied
  • Request a configuration job
  • Request job status until it has some kind of resolution (Success, Failed, etc)
  • Log out of DCNM

 

Configuring IOS XR

 

In this example, I'll use Python+NetMiko to do the hard work for me. My inputs are going to be:

 

  • IPv4 Subnet and prefix length
  • IPv6 Subnet and prefix length
  • VLAN ID
  • L3 Interface Description

 

As before, I will sanity check the data provided to ensure that the IPs are valid. I have found that IOS XR's configuration for HSRP, while totally logical and elegantly hierarchical, is a bit of a mouthful to type out, so to speak, and as such it is great to have a script take the basic information like a subnet, and apply some standard rules to it (e.g. the 2nd IP is the HSRP gateway, e.g. .1 on a /24 subnet), the next address up (e.g. .2) would be on the A router, and .3 would be on the B router. For my HSRP group number, I use the VLAN ID.  The subinterface number where I'll be configuring layer 3 will match the VLAN ID also, and with that information I can also configure the HSRP BFD peer between the routers too. By applying some simple standardized templating of the configuration, I can take a bare minimum of information from the user and create configurations which would take much longer to create manually and quite often (based on my own experience) would have mistakes in it.

 

The process then might look like this:

 

  • Get IPv4 subnet, IPv6 subnet, VLAN ID and L3 interface description from user
  • Confirm that IPv4 subnet, IPv6 subnet, VLANID and interface description are valid
  • Generate templated configuration for the A and B routers
  • Create session to A router and authenticate
  • Take a snapshot of the configuration
  • Apply changes (check for errors)
  • Assuming success, logout
  • Rinse and repeat for B router

 

Breaking Up is Easy

 

Note that the sequences of actions above have been created without requiring any coding. Implementation can come next, in the preferred language, but if we don't have an idea of where we're going, especially as a new coder, it's likely that the project will go wrong very quickly.

 

For implementation, I now have a list of tasks which I can attack, to some degree, separately from one another; each one is a kind of milestone. Looking at the DCNM process again:

 

  • Get VLAN ID and VLAN Description from user

 

Perhaps this data comes from a web page but for the purposes of my script, I will assume that these values are provided as arguments to the script. For reference, an argument is anything that comes after the name of the script when you type it on the command line, e.g. in the command, sayhello.py John the program sayhello.py would see one argument, with a value of John.

 

  • Confirm that VLANID and VLAN Description are valid

 

This sounds like a perfect opportunity to write a function/subroutine which can take a VLAN ID as its own argument, and will return a boolean (true/false) value indicating whether or not the VLAN ID is valid. Similarly, a function could be written for the description, either to enforce the allowed characters by removing anything that doesn't match, or by simply validating whether what's provided meets the criteria or not. These may be useful in other scripts later too, so writing a simple function now may save time later on.

 

  • Authenticate to DCNM and extract session token
  • Retrieve list of devices to which the template should be applied
  • Request a configuration job
  • Request job status until it has some kind of resolution (Success, Failed, etc)
  • Log out of DCNM

 

These five actions are all really the same kind of thing. For each one, some data will be sent to a REST API, and something will be returned to the script by the REST API. The process of submitting to the REST API only requires a few pieces of information:

 

  • What kind of HTML request is it? GET / POST / etc?
  • What is the URL?
  • What data needs to be sent, if any, to the URL?
  • How to process the data returned. (What format is it in?)

 

It should be possible to write some functions to handle GET and POST requests so that it's not necessary to repeat the HTTP request code every time it's needed. The idea is not to repeat code multiple times if it can be more simply put in a single function and called from many places. This also means that fixing a bug in that code only requires it to be fixed in one place.

 

For the IOS XR configuration, each step can be processed in a similar fashion, creating what are hopefully more manageable chunks of code to create and test.

 

Achieving Coding Goals

 

I really do believe that sometimes coders want to jump right into the coding itself before taking the time to think through how the code might actually work, and what the needs will be. In the example above, I've run through taking a single large task (Create a VLAN on 20 devices and configure two attached routers with an L3 interface and HSRP) which might seem rather daunting at first, and breaking it down into smaller functional pieces so that a) it's clearer how the code will work, and in what order; and b) each small piece of code is now a more achievable task. I'd be interested to know if you as a reader feel that the task lists, while daunting in terms of length, perhaps, seemed more accomplishable from a coding perspective than just the project headline. To me, at least, they absolutely are.

 

I said I wouldn't dig into the actual code, and I'll keep that promise. Before I end, though, here's a thought to consider: when is it right to code a solution, and when is it not? I'll be taking a look at that in the next, and final, article in this miniseries.

By Joe Kim, SolarWinds Chief Technology Officer

 

Because of the Internet of Things (IoT) we're seeing an explosion of devices, from smartphones and tablets to connected planes and Humvee® vehicles. So many, in fact, that IT administrators are left wondering how to manage the deluge, particularly when it comes to ensuring that their networks and data remain secure.

 

The challenge is significantly more formidable than the one posed by bring-your-own-device issues when administrators only had to worry about a few mobile operating systems. This pales in comparison to the potentially thousands of IoT-related operating systems that are part of an increasingly complex ecosystem that includes devices, cloud providers, data, and more.

 

How does one manage such a monumental task? Here are five recommendations that should help.

 

1. Turn to automation

 

Getting a grasp on the IoT and its impact on defense networks is not a job that can be done manually, which makes automation so important. The goal is to create self-healing networks that can automatically and immediately remediate themselves if a problem arises. A self-healing, automated network can detect threats, keep data from being compromised, and reduce response and downtime.

 

2. Get a handle on information and events

 

DoD administrators should complement their automation solutions with security information and event management processes. They are monitoring solutions designed to alert administrators to suspicious activity and security and operational events that may compromise the networks. Administrators can refer to these tools to monitor real-time data and provide insight into forensic data that can be critical to identifying the cause of network issues.

 

3. Monitor devices and access points

 

Device monitoring is also extremely important. Network administrators will want to make sure that the only devices that are hitting their networks are those deemed secure. Administrators will want to be able to track and monitor all connected devices by MAC and IP address, as well as access points. They should set up user and device watch lists to help them detect rogue users and devices in order to maintain control over who and what is using their networks.

 

4. Get everyone on board

 

Everyone in the agency must commit to complying with privacy policies and security regulations. All devices must be in compliance with high-grade security standards, particularly personal devices that are used outside of the agency. The bottom line is that it’s everyone’s responsibility to ensure that DoD information stays within its network.

 

5. Buckle up

 

Understand that while IoT is getting a lot of hype, we’re only at the beginning of that cycle. Analyst firm Gartner® once predicted that there would be 13 billion connected devices by 2020, but some are beginning to wonder if that’s actually a conservative number. Certainly, the military will continue to do its part to drive IoT adoption and push that number even higher.

 

In other words, when it comes to connected devices, this is only the beginning of the long road ahead. DOD administrators must prepare today for whatever tomorrow might bring.

 

Find the full article on Defense Systems.

Dez

Firewall Logs - Part Two

Posted by Dez Employee Jun 1, 2017

In Part One of this series, I dove into the issue of security and compliance. In case you don't remember, I'm reviewing this wonderful webcast series

to stress the importance of the information presented in each. This week, I'm focusing on the firewall logs webcast.

 

I chose the Firewall Logs webcast for this week because it is a known and very useful way to prevent attacks. Now, my takeaway from this session is that SIEMs are fantastic ways to normalize your logs from a firewall and also your infrastructure. You guys don't need me to preach on that, I know. However, I feel like when you use health performance and network configuration management tools, you really have a better solution all the way around.

 

Everyone (I think) knows that I'm not one to tell you to buy or purchase just SolarWinds products! So please do NOT take this that way. I will preach about having some type of SIEM, network performance monitor (NPM), patch manager (PaM), and a solid network configuration change management (NCM) within your environment. Let me give you some information to go along with this webcast on how I would personally tie these together. 

 

  1. Knowing the health of your infrastructure allows you to see anomalies. When this session was discussing the mean time to detection I couldn't help but think about a performance monitor. You have to know what normal is and have a clear baseline before an attack.
  2. Think about the ACLs along with your VLANs and allowed traffic on your network devices. NCM allows you to use a real-time change notification to help you track if any outside changes are being made and shows you what was changed.  Also, using this with the approval system allows you to verify outside access and stop it in its tracks as they are not approved network config changes. This is a huge win for security.  When you also add in the compliance reports and scheduled email send-outs you are able to verify your ACLs and access based on patterns you customize to your company's needs. This is vital for documentation and also if you have any type of a change request ticketing to validate.
  3. We all know we need to be more compliant and patch our stuff! Not only to be aware of vulnerabilities but also to protect our vested interests in our environment.

 

Okay, so the stage is laid out and I hope you see why you need more than just a great SIEM like LEM to back, plan, and implement any type of security policies you may need. This webcast brings up great points to think about on how to secure and think about those firewalls. IMHO, if you have LEM, Jamie's demo should help you guys strengthen your installation.  Also, the way he presents this helps you to strengthen or validate any SIEM you may have in place currently.

 

I hope you guys are enjoying this series as much as I am. I think we should all at least listen to security ideas to help us strengthen our knowledge and skill sets. Trust me, I'm no expert or I would abolish these attacks, lol! What I am is a passionate security IT person who wants to engage different IT silos to have a simple conversation about security.

 

Thanks for your valuable time! Let me know what you think by posting a comment below, and remember to follow me @Dez_Sayz!

sqlrockstar

Data is a commodity

Posted by sqlrockstar Employee Jun 1, 2017

commodity_LI.jpg

 

Data is a commodity.

 

Don’t believe me? Let’s see how the Oxford dictionary defines “commodity.”

 

“A thing that is useful or has a useful quality.”

 

No good researcher would stop at just one source. Just for fun, let’s check out this definition from Merriam-Webster:

 

“Something useful or valued.”

 

Or, this one from Dictionary.com:

 

“An article of trade or commerce, especially a product as distinguished from a service.”

 

There’s a lot of data on the definition of the word “commodity.” And that’s the point, really. Data itself is a commodity, something to be bought and sold.

 

And data, like commodities, comes in various forms.

 

For example, data can be structured or unstructured. Structured data is data that we associate with being stored in a database, either relational or non-relational. Unstructured data is data that has no pre-defined data model, or is not organized in any pre-defined way. Examples of unstructured data include things like images, audio files, instant messages, and even this word document I am writing now.

 

Data can be relational or non-relational. Relational data is structured in such a way that data entities have relationships, often in the form of primary and foreign keys. This is the nature of traditional relational database management systems such as Microsoft SQL Server. Non-relational data is more akin to distinct entities that have no relationships to any other entity. The key-value pairs found in many NoSQL database platforms are examples of non-relational data.

 

And while data can come in a variety of forms, not all data is equal. If there is one thing I want you to remember from this article it is this: data lasts longer than code. Treat it right.

 

To do that, we now have Azure CosmosDB.

 

Introduced at Microsoft Build™, CosmosDB is an attempt to make data the primary focus for everything you do, no matter where you are. (Microsoft has even tagged CosmosDB as “planet-scale,” which makes me think they need to go back and think about what “cosmos” means to most people. But I digress.)

 

I want you to understand the effort Microsoft is taking to the NewSQL space here. CosmosDB is a database platform as a service that can store any data that you want: key-value pair, graph, document, relational, non-relational, structured, unstructured…you get the idea.

 

CosmosDB is a platform as a service, meaning the admin tasks that most DBAs would be doing (backups, tuning, etc.) are done for you. Microsoft will guarantee performance, transactional consistency, high availability, and recovery.

 

In short, CosmosDB makes storing your data easier than ever before. Data is a commodity and Microsoft wants as big a market share as possible.

 

I can’t predict the future and tell you CosmosDB is going to be the killer app for cloud database platforms. But I can understand why it was built.

 

It was built for the data. It was built for all the data.

sqlrockstar

The Actuator - May 31st

Posted by sqlrockstar Employee May 31, 2017

Home from Techorama in Belgium and back in the saddle for a short week before I head to Austin on Monday morning. I do enjoy visiting Europe, and Belgium in particular. Life just seems to move at a slower pace there.

 

As always, here are some links from the Intertubz that I hope will hold your interest. Enjoy!

 

The big asks of British Airways

Last year I wrote about a similar outage with Delta, so here's some equal time for a similar failure with BA. Who knew that managing IT infrastructure could be so hard?

 

Facebook Building Own Fiber Network to Link Data Centers

I'm kinda shocked they don't already have this in place. But more shocking is the chart that shows internal traffic growth, mostly a result of Facebook having to replicate more pictures and videos of cats.

 

Who Are the Shadow Brokers?

Interesting thought exercise from Bruce Schneier about this group and what might be coming next.

 

Web Developer Security Checklist

Every systems admin needs a similar checklist to this one.

 

All the things we have to do that we don't really need to do: The social cost of junk science

A nice and quick reminder about the hidden costs of junk science. Or, the hidden costs of good science.

 

The Calculus of Service Availability

So the next time someone tells you they need 99.9% uptime for a system, you can explain to them what that really means.

 

How Your Data is Stored, or, The Laws of the Imaginary Greeks

This is a bit long, set aside some time. But you'll learn all about the problems (and solutions) for distributed computing.

 

One thing I love about Belgium is how they make shopping for the essentials easy:

IMG_7019.JPG

By Joe Kim, SolarWinds Chief Technology Officer

 

Federal IT professionals must consider the sheer volume and variety of devices connected to their networks, from fitness wearables to laptops, tablets, and smartphones. The Internet of Things (IoT) and the cloud also significantly impact bandwidth and present security concerns, spurred by incidents such as the Office of Personnel Management breach of 2014.

 

Despite this chaotic and ever-changing IT environment, for the Defense Department, network and data center consolidation is well underway, layering additional concerns on top of an already complex backdrop. Since 2011, the DoD has closed more than 500 data centers. That’s well below the goal the agency initially set forth, and it issued a directive last year to step up the pace; and subsequently, the Data Center Optimization Initiative was introduced to further speed efforts.

 

To be successful, federal IT professionals need a system that accounts for all of the data that soon will stream through their networks. They also need to get a handle on all the devices employees use and will use to access and share that data, all while ensuring network security.

 

Meeting the Challenges of Tomorrow Today

 

Network monitoring has become absolutely essential, but some solutions are simply not capable of dealing with the reality of today’s networks.

 

Increasingly, federal IT managers house some applications on-premises while others use hosted solutions, creating a hybrid IT environment that can be difficult to manage. Administrators will continue to go this route as they attempt to fulfill the DoD's ultimate goal: greater efficiency. Hybrid IT creates monitoring challenges, as it makes it difficult for administrators to “see” everything that is going on with the applications.

 

Going Beyond the Basics

 

This complexity will require network administrators to go beyond initial monitoring strategies and begin implementing processes that provide visibility into the entire network infrastructure, whether it’s on-premises or hosted. Hop-by-hop analysis lets administrators effectively map critical pathways and gain invaluable insight into the devices and applications using the network. It provides a complete view of all network activity, which will become increasingly important as consolidation accelerates.

 

At the very least, every IT organization should employ monitoring best practices to proactively plan for consolidation and ensuing growth, including:

 

  1. Adding dedicated monitoring experts who can provide holistic views of agencies’ current infrastructure and calculate future needs.
  2. Helping to ensure that teams understand the nuances of monitoring hardware, networks, applications, virtualization, and configurations and that they have access to a comprehensive suite of monitoring tools.
  3. Equipping teams with tools that address scalability needs. This will be exceptionally important as consolidation begins to truly take flight and data needs rapidly expand.

 

Looking Reality in the Eye

 

DoD network consolidation is a slow, yet major undertaking, and a necessity to help ensure efficiency. It comes with extreme challenges, particularly a much greater degree of network complexity. Effectively wrangling this complexity requires network administrators to go beyond simple monitoring and embrace a more comprehensive monitoring strategy that will better prepare them for their future.

 

Find the full article on Signal.

20170516_115950-2.jpg

Two weeks ago, I had the privilege of attending and speaking at ByNet Expo in Tel Aviv, Israel.  As i mentioned in my preview article, I had hoped to use this event to talk about cloud, hybrid IT, and SolarWinds' approach to these trends, to meet with customers in the region, and to enjoy the food, culture, and weather.

 

I'm happy to report that the trip was a resounding success on all three fronts.

 

First, a bit of background:

 

Founded in 1975, ByNet (http://www.bynet.co.il/en/) is the region's largest systems integrator, offering professional services and solutions for networking, software, cloud, and more.

 

I was invited by SolarWinds' leading partner in Israel, ProLogic (http://prologic.co.il/) who, honestly, are a great bunch of folks who not only know their stuff when it comes to SolarWinds, but they also are amazing hosts and fantastic people to just hang out with.

 

Now you might be wondering what kind of show ByNet (sometimes pronounced "bee-naht" by the locals) Expo is. Is it a local user-group style gathering? A tech meet-up? A local business owners luncheon?

 

To answer that, let me first run some of the numbers:

  • Overall attendees: 4,500
  • Visitors to the SolarWinds/Prologic booth: ~1,000
  • Visitors to my talk (~150, which was SRO for the space I was in)

 

The booth was staffed by Gilad, Lior, and Yosef, who make up part of the ProLogic team. On the Solarwinds side, I was joined by Adriane Burke out of our Cork office. That was enough to attract some very interesting visitors, including the Israeli Ministry of Foreign Affairs, Orbotec, Soreq, the Israeli Prime Minister's Office, Hebrew University, Mcafee, and three different branches of the IDF.

 

We also got to chat with some of our existing customers in the region, like Motorola, 3M, the Bank of Israel, and Bank Hapoalim.

 

Sadly missing from our visitor list, despite my repeated invitations on Twitter, was Gal Gadot.

 

But words will only take you so far. Here are some pictures to help give you a sense of how this show measures up:

 

01_IMG_0004.JPG

 

01_IMG_0106.JPG

 

01_IMG_0841.JPG

 

01_IMG_0844.JPG

01scaled_IMG-20170516-WA0000.jpg

01scaled_IMG-20170516-WA0004.jpg

 

 

But those are just some raw facts and figures, along with a few flashy photos. What was the show really like? What did I learn and see and do?

 

First, I continue to be struck by the way language and culture informs and enriches my interactions with customers and those curious about technology. Whether I'm in the booth at a non-U.S. show such as CiscoLive Europe or ByNet Expo, or when I'm meeting with IT pros from other parts of the globe, the use of language, the expectations of where one should pause when describing a concept or asking for clarification, the graciousness with which we excuse a particular word use or phrasing - these are all the hallmarks of both an amazing and ultimately informative exchange. And also of individuals who value the power of language.

 

And every time I have the privilege to experience this, I am simply blown away by its power. I wonder how much we lose, here in the states, by our generally mono-linguistic mindset.

 

Second, whatever language they speak, SolarWinds users are the same across the globe. Which is to say they are inquisitive, informed, and inspiring in the way they push the boundaries of the solution. So many conversations I had were peppered with questions like, "Why can't you...?" and "When will you be able to...?"

 

I love the fact that our community pushes us to do more, be better, and reach higher.

 

With that said, I landed on Friday morning after a 14-hour flight, dropped my bags at the hotel and - what else - set off to do a quick bit of pre-Shabbat shopping. After that, with just an hour or two before I - and most of the country - went offline, I unpacked and got settled in.

 

Twenty-four hours later, after a Shabbat spent walking a chunkble chuck of the city, I headed out for a late night snack. Shawarma, of course.

 

Sunday morning I was joined by my co-worker from Cork, Adrian Burke. ProLogic's Gilad Baron spent the day showing us Jerusalem's Old City, introducing us to some of the best food the city has to offer, and generally keeping us out of trouble.

 

And just like that, the weekend was over and it was time to get to work. On Monday we visited a few key customers to hear their tales of #MonitoringGlory and answer questions. Tuesday was the ByNet Expo show, where the crowd and the venue rivaled anything Adrian and I have seen in our travels.

 

On my last day, Wednesday, I got to sit down in the ProLogic offices with a dozen implementation specialists to talk some Solarwinds nitty-gritty: topics like the product roadmaps, use cases, and trends they are seeing out in the field.

 

After a bit of last-minute shopping and eating that night, I packed and readied myself to return home Thursday morning.

 

Random Musings

  • On Friday afternoon, about an hour before sundown, there is a siren that sounds across the country, telling everyone that Shabbat is approaching. Of course nobody is OBLIGATED to stop working, but it is striking to me how powerful  a country-wide signal to rest can be. This is a cultural value that we do not see in America.
  • It is difficult to take a 67-year-old Israeli taxi driver seriously when he screams into his radio at people who obviously do not understand him. Though challenging, I managed to hide my giggles.
  • Traveling east is hard. Going west, on the other hand, is easy.
  • You never "catch up" on sleep.
  • Learning another language makes you much more sensitive to the importance of pauses in helping other people understand you.
  • Everything in Jerusalem is uphill. Both ways.
  • On a related note: there are very few fat people in Jerusalem.
  • Except for tourists.
  • Orthodox men clearly have their sweat glands removed. Either that or they install personal air conditioners inside their coats. That's right. I said coats. In May. When it's 95 degrees in the sun.

 

01scaled_20170514_155616.jpg

01scaled_20170515_201928.jpg

01-Scaled_20170513_215610.jpg

 

01scaled_20170514_094222.jpg

scuff

Level up your automation skills

Posted by scuff May 26, 2017

If you’ve read any of my articles you’ll know I’m old school. Automation in my days was batch files, Kix32 scripts, then group policy and Microsoft Systems Management Server (before SMS was a popular messaging protocol). I’m fortunate to mingle with some very smart Enterprise tech people occasionally, and they are talking new automation languages. Infrastructure as Code. Chef. Puppet. Ansible. Wow. I’m going to pause for a minute in envy.

 

To start with, there’s the debate of “which one do you choose?” I’m going to leave that to anyone who has more knowledge of these products than me. Are you using one of those or an alternative product, to handle server and infrastructure configuration management?

 

Do we all need to be versioning our infrastructure or has this DevOps thing gone a little too far?

 

Or is there a tipping point – probably an organizational size – at which this makes way more sense than how we used to manage infrastructure? Does your organization slide under that, making you wonder what all the fuss is about and why you’d bother?

 

Meanwhile, back in my SMB world, PowerShell is nudging in more and more as “the way you do things." In fact, many Office 365 administration tasks can’t be performed in the web GUI and require a PowerShell command. Which also means you know how to install the required PowerShell components and issue the correct commands to connect to your Office 365 or Azure Active Directory tenant. If I ignored the operational efficiencies from going command line again (hello, Lotus Domino Server), I would still be dragged into the world of PowerShell when it’s the only way to do something.

 

If this is all new to you, or if you live and breathe this stuff, my next question is …. how do you start? Whether you’re resigned to needing this stuff on your CV now or whether you are genuinely excited about learning something new (or you might be somewhere in the middle), what are your go-to resources?

 

Product websites are always a good place to start. Many are offering videos and webinars as an alternative to drowning in text with screenshots.

Are you searching through Github for infrastructure as code samples and PowerShell scripts, or are you learning from peers on Reddit?

Maybe you’ve gone with a different learning resource altogether, like the courses at Pluralsight?

 

If automation is the new normal (or the old normal), how do we pick up new automation skills? Let me know what’s worked for you.

 

Disclaimer: I’m a Pluralsight author of one course that is nothing to do with the topics I’ve just written about. And there are no affiliate links here either.

Network performance monitoring feels a bit like a moving target sometimes.  Just as we normalize processes and procedures for our monitoring platforms, some new technology comes around that turns things upside down again. The most recent change that seems to be forcing us to re-evaluate our monitoring platforms is cloud computing and dynamic workloads. Many years ago, a service lived on a single server, or multiple if it was really big. It may or may not have had redundant systems, but ultimately you could count on any traffic to/from that box to be related to that particular service.

 

That got turned on its head with the widespread adoption of virtualization. We started hosting many logical applications and services on one physical box. Network performance to and from that one server was no longer tied to a specific application, but generally speaking, these workloads remained in place unless something dramatic happened, so we had time to troubleshoot and remediate issues when they arose.

 

In comes the cloud computing model, DevOps, and the idea of an ephemeral workload. Rather than have one logical server (physical or virtual), large enough to handle peak workloads when they come up and highly underutilized otherwise, we are moving toward containerized applications that are horizontally scaled. This complicates things when we start looking at how to effectively monitor these environments.

 

So What Does This Mean For Network Performance Monitoring?

 

The old way of doing things simply will not work any longer. Assuming that a logical service can be directly associated with a piece of infrastructure is no longer possible. We’re going to have to create some new methods, as well as enhance some old ones, to extract the visibility we need out of the infrastructure.

 

What Might That Look Like?

 

Application Performance Monitoring

This is something that we do today and Solarwinds has an excellent suite of tools to make it happen. What needs to change is our perspective on the data that these tools are giving us. In our legacy environments, we could poll an application every few minutes because not a lot changes between polling intervals. In the new model of system infrastructure, we have to assume that the application is scaled horizontally behind load balancers and that poll only touched one of many deployed instances. Application polling and synthetic transactions will need to happen far more frequently to give us a broader picture of performance across all instances of that application.

 

Telemetry

Rather than relying on polling to tell us about new configurations/instances/deployments on the network, we need the infrastructure to tell our monitoring systems about changes directly. Push rather than pull works much better when changes happen often and may be transient. We see a simple version of this in syslog today, but we need far better-automated intelligence to help us correlate events across systems and analyze the data coming into the monitoring platform. This data then will need to be associated with our traditional polling infrastructure to understand the impact of a piece of infrastructure going down or misbehaving. This likely will also include heuristic analysis to determine baseline operations and variations from that baseline. Manually reading logs every morning isn’t going to cut it as we move forward.

 

Traditional Monitoring

This doesn’t go away just because we’ve complicated things with a new form of application deployment. We still will need to keep monitoring our infrastructure for up/down, throughput, errors/discards, CPU, etc.

 

Final Thoughts

Information Technology is an ever-changing field, so it makes sense that we’re going to have to adjust our methods over time. Some of these changes will be in how we implement the tools we have today, and some of them are going to require our vendors to give us better visibility into the infrastructure we’re deploying. Either way, these types of challenges are what makes this work so much fun.

Dez

Security vs Compliance - Part One

Posted by Dez Employee May 25, 2017

Today, I want to bring your attention to a great series of webcasts that are available here: Security Kung Fu Webcast Series

 

I will stress the importance of each one of these over the next few weeks as I review and reflect on what I learned from these webcasts.

 

That's right. I'm reviewing the webcast as a critic in this series because I deeply believe in security, and I want to make sure you guys are aware of the content provided in each webcast. Please follow me on this security adventure and dive into the importance of the information they covered. Also, I'll be mixing them up, so the reviews won't be presented in order. 

 

Takeaways

 

1. There is a difference in being secure versus compliant.

  • I can comply with regulations, but does that cover everything within my infrastructure?
  • I can secure my environment, but does that mean I am meeting my overall compliance needs?

 

These are questions that I like to ask whenever I'm involved with any security plan. This helps to make sure that my environment is fluid and being assessed by both sides of the argument.

 

2. Too many rules to follow! I just want to do my job!

  • News flash: Security is a business issue. It's NOT just for IT!
  • This webcast talks about the rules and compliance needs for different types of businesses. However, all levels of users need to focus on security. This means engaging with and training them at every opportunity.

 

The biggest issue that I see is a lack of a solid security planning that is integral to an organization's overarching business strategy. This webcast offers insight on ways to use tools to help you complete security plans faster and strengthen your proactive and reactive security needs.

 

Summary

 

The Security vs Compliance webcast will help guide you toward implementing a solid security plan. I joined this webcast and offered some of my opinions on being secure vs compliant, so please feel free to let me know if you have more to add!

 

Remember, "Security is a very fluid dance. The music may change, but you have to keep dancing."

 

If there is something specific you guys want me to bring up, please let me know! I love talking security and how to use what you have to support any security plan. Leave me a security comment and I'll see if I can get this ramped up and answer in a future Geek Speak blog!

The-Hog-Ring-Auto-Upholstery-Community-Aerospace-Lancia-Beta-Trevi.jpg

 

We’ve all seen dashboards for given systems. A dashboard is essentially a quick view into a given system. We are seeing these more and more often in the monitoring of a given system. Your network monitoring software may present a dashboard of all switches, routers, and even down to the ports, or all the way up to all ports in given WAN connections. For a large organization, this can be a quite cumbersome view to digest in a quick dashboard. Network is a great example of fully fleshed out click-down views. Should any “Red” appear on that dashboard, a simple click into it, and then deeper and deeper into it, should help to discover the source of the problem wherever it may be.

 

Other dashboards are now being created, such that the useful information presented within the given environment may be not so dynamic, and harder to discern in terms of useful information.

 

The most important thing to understand from within a dashboard environment is that the important information should be so easily presented that the person glancing at it should not have to know exactly how to fix whatever issue is, but that that information be understood by whoever may be viewing it. If a given system is presenting an error of some sort, the viewer should have the base level of understanding necessary to understand the relevant information that is important to them.

 

Should that dashboard be fluid or static? The fluidity is necessary for those doing the the deep dive into the information at the time, but a static dashboard can be truly satisfactory should that individual be assigning the resolution to another, more of a managerial or administrative view.

 

I believe that those dashboards of true significance have the ability to present either of these perspectives. The usability should only be limited by the viewer’s needs.

 

I’ve seen some truly spectacular dynamic dashboard presentations. A few that spring to mind are Splunk, the analytics engine for well more than just a SIEM, Plexxi, a networking company with outstanding deep dive capabilities into their dashboard with outstanding animations, and of course, some of the wonderfully intuitive dashboards from SolarWinds. This is not to say that these are the limits of what a dashboard can present, but only a representation of many that are stellar.

 

The difficulty with any fluid dashboard is how difficult is it for a manager of the environment to create the functional dashboard necessary to the viewer? If my goal were to fashion a dashboard intended for the purpose of seeing for example Network or storage bottlenecks, I would want to see, at least initially, a Green/Yellow/Red gauge indicating if there were “HotSpots” or areas of concern, then, if all I needed was that, I’d, as management assign someone to look into that, but if I were administration, I’d want to be more interactive to that dashboard, and be able to dig down to see exactly where the issue existed, and/or how to fix it.

 

I’m a firm believer in the philosophy that a dashboard should provide useful information, but only what the viewer requires. Something with some fluidity always is preferable.

Filter Blog

By date:
By tag: