Skip navigation

If you haven't read the earlier posts, here's a chance to catch up on the story so far:

 

  1. It's Not Always The Network! Or is it? Part 1 -- by John Herbert (jgherbert)
  2. It's Not Always The Network! Or is it? Part 2 -- by John Herbert (jgherbert)

 

Now you're up to speed with the chaotic life of the two characters whose jobs we are following, here's the third installment of the story, by Tom Hollingsworth (networkingnerd).

 

 

The View From Above: James (CEO)

 

I got another call about the network today. This time, our accounting department told us that their End of Year closeout was taking much too long. They have one of those expensive systems that scans in a lot of our paperwork and uploads it to the servers. I wasn't sure if they whole thing was going to be worth it, but we managed to pay for it with the savings from renting warehouse space to store huge file boxes full of the old paper records. That's why I agreed to sign off on it.

 

It worked great last year, but this time around I'm hearing nothing but complaints. This whole process was designed to speed things up and make everyone's job easier. Now I have to deal with the CFO telling me that our reports are going to be late and that the shareholders and the SEC are going to be furious. And I also have to hear comments in the hallways about how the network team still isn't doing their job. I know that Amanda has done a lot recently to help fix things, but if this doesn't get worked out soon the end of the year isn't going to be a good time for anyone.

 

 

The View From The Trenches: Amanda (Sr Network Manager)

 

Fresh off my recent issues with the service provider in Austin, I was hoping the rest of the year was going to go smoothly. Until I got a hotline phone call from James. It seems that the network was to blame for the end of year reporting issues that the accounting department was running into. I knew this was a huge issue after sitting in on the meetings about the records scanning program before I took over the network manager role. The arguments about the cost of that thing made me glad I worked in this department. And now it was my fault the thing wasn't working? Time to get to the bottom of this.

 

I fired up SolarWinds NPM and started checking the devices that were used by the accounting department. Thankfully, there weren't very many switches to look at. NPM told me that everything was running at peak performance; all the links to the servers were green, as was the connection between the network and the storage arrays. I was sure that any misconfiguration of the network would have shown up as a red flag here and given me my answer, but alas the network wasn't the problem. I could run a report right now to show to James to prove that the network was innocent this time.

 

I stopped short, though. Proving that it wasn't the network was not the issue; the issue was that the scanning program wasn't working properly. I knew that if it ended up being someone else's bigger issue that they were going to be on the receiving end of one of those conference room conversations that got my predecessor Paul fired. I knew that I had the talent to help this problem get fixed and help someone keep their job before the holidays.

 

So, if the network wasn't the problem, then what about the storage array? I called one of the storage admins, Mike, and asked him about the performance on the array. Did anything change recently? Was the firmware updated? Or out of date? I went through my standard troubleshooting questions for network problems. The answers didn't fill me with a lot of confidence.

 

Mike knew his arrays fairly well. He knew what kind they were and how to access their management interfaces. But when I started asking about firmware levels or other questions about the layout of the storage, Mike's answers became less sure. He said he thought maybe some of the other admins were doing something but he didn't know for sure. And he didn't know if there was a way to find out.

 

As if by magic, the answer appeared in my inbox. SolarWinds emailed me about a free trial of their Storage Resource Monitor (SRM) product. I couldn't believe it! I told Mike about it and asked him if he'd ever tried it. He told me that he had never even heard of it. Given my luck with NPM and keeping the network running, I told Mike we needed to give this a shot.

 

Mike and I were able to install SRM alongside NPM with no issues. We gave it the addresses of the storage arrays that the accounting data was stored on and let it start collecting information. It only took five minutes before I heard Mike growling on the other end of the phone. He was looking at the same dashboard I was. I asked him what he was seeing and he started explaining things.

 

It seems that someone had migrated a huge amount of data onto the fast performance storage tier. Mike told me that data should have been sitting around in the near-line tier instead. The data in the fast performance tier was using up resources that the accounting department needed to store their scanned data. Since that data was instead being written to the near-line storage, the performance hit looked like the network was causing the problem when in fact the storage array wasn't working like it should.

 

I heard Mike cup his hand over the phone receiver and start asking some pointed questions in the background. No one immediately said anything until Mike was able to point out the exact time and date the data was moved into the performance tier. It turns out one of the other departments wanted to get their reports done early this year and talked one of the other storage admins into moving their data into a faster performance tier so their reports would be done quicker. That huge amount of data had caused lots of problems. Now, Mike was informing the admin that the data was going to be moved back ASAP and they were going to call the accounting department and apologize for the delay.

 

Mike told me that he'd take care of talking to James and telling him it wasn't the network. I thanked him for his work and went on with the rest of my day. Not only was it not the network (again), but we found the real problem with some help from SolarWinds.

 

I wouldn't have thought anything else about it, but Mike emailed me about a week later with an update. He kept the SRM trial running even after we used it to diagnose the accounting department issue. The capacity planning tool alerted Mike that they were going to run out of storage space on that array in about six more weeks at the rate it was being consumed. Mike had already figured out that he needed to buy another array to migrate data and now he knew he needed a slightly bigger one. He used the tool to plan out the consumption rate for the next two years and was able to convince James to get a bigger array that would have more than enough room. It's time to convert that SRM trial into a purchase, I think; it's great value and I'm sure Mike will be only too happy to pay.

 

 

>>> Continue reading this story in Part 4

The “cloud” can mean so many things to different people. Depending on who you ask, it could mean SaaS ( software as a service ) running Salesforce in the cloud but another person may say it's running servers on AWS. The definition of cloud can be cloudy but the transition to cloud is the same regardless of what your putting there.

 

When you make that decision to transition the cloud, having a plan or tool kit is useful.  It’s very similar to an upgrade or deployment plan that I recently blogged about last month on Geek Speak called BACK TO BASICS TO HAVE A SUCCESSFUL UPGRADE. The same concept of project planning can be applied to transitioning to the cloud, with some minor tweaks and details to add.

 

Building you own “Cloud” Avengers…

 

If you want a smooth transition, it’s always best to get all the players involved from the start. Yes, that means networking, server team, application team and Security. I would say getting security involved from the start is key because they can shoot down plans because of not meeting some compliance standard, which then delays your transition. However, with security involved from the start means that you’re planning the right way from the start and will have a less likely chance of security delaying your project.  Getting everyone together, including the business (if applicable), gives everyone a chance to air out their grievances about the cloud and work together to make it a success.

 

“Cloud” Avengers assemble…

 

Now that you have your basic “Cloud” avengers core team built there are some common things that you should really ask with every cloud plan.

 

Disaster Recovery -  What is the DR plan for my cloud? If it is an application that is being moved to cloud, what are the DR plans for this application. Does the provider have DR plans for when their datacenter or servers decided to take a break and stop working? Are their DR plans for internet outages or DNS outages?


Backups - You should also be asking what are my back up options what is my recovery time if I need a restore. Lawsuits are common so how would an E-discovery situation be handled would be a question to ask. Where are the backups retained and for how long? How do you request data to be restored? Do the backup policies meet your in house policies?

 

Data retention – Something overlooked is data retention. How long does it stay in the cloud?  Each industry and business is different with different data retention periods so you will need to see if they meet your requirements. If there are deferring data retention periods, how does it impact your polices in house? Sometimes this may involve working your legal and compliance teams to come up with the best solution. E-discovery could also impact data retention periods so best to talk to the legal and compliance teams to make sure you are safe.

 

Data security - We all want to make sure our data secure so this should a standard question to ask. How is remote access handled and how easy can someone request access to the data. Is it as simple as sending an email or filling out the form? Does the provider have other means of authenticating that the correct person is requesting the data access? If you are running servers in the cloud you will want to know how the datacenters are secured. You will also want to know how the data is protected from antivirus if you are using SaaS and what are the remediation plans if data is compromised.

Back -out Plan -  If you are planning to transition to the cloud you should also have a back out plan. Sometimes you may find out it’s not all rainbows and sunny skies in the cloud and decide to come back to land. Asking the provider what are your options for backing out of the cloud is a good question to ask upfront because depending on your options this could impact your plan. You should also find out if there additional costs or fees for backing out. Something else that should also be asked is what happens if I want to leave the cloud and come back on premise what happens to my data and backups (if any existed). How has the data and can you get that back or does it get swallowed up by the cloud?

 

The cloud is the way of the future. As we move more and more data to the cloud, it may become less foggy. Until then plan as much as you can and ask all the questions you can ( even the stupid ones).

scuff

Hey Siri, fix my PC.

Posted by scuff Nov 30, 2016

If the machines are taking over the world, are they coming for our jobs too?

 

“Automate all the things!” is the current trend in our industry. Chef, Puppet and Ansible scream that they are the solution to the end of monotonous work. We script all the things, ending the days of clicking Next, Next, Next, Finish. We’re using machines and machine languages to build, update and alter other machines. Right now, they still need us. They’re just making our lives easier.

 

Or are they enabling us to take an acceptable step towards outsourcing our tasks …. to them?

 

This year Zendesk dipped their toes in the water with Automatic Answers. The feature “uses machine learning capabilities to analyze customer and agent actions over time, learning which articles solve tickets associated with specific keywords and topics. If a customer indicates their inquiry has been solved successfully, the ticket is closed. For tickets that remain unsolved, they proceed to the customer service team as normal.”  It’s easy to think of that in a B2C scenario, say if I’ve emailed a company asking about the status of a product return. Automatic Answers could glean enough information from my email to check another system and reply with an answer, minus any human interaction. With in-house tech support, maybe that frees up the Helpdesk from questions like “how do I give someone else access to my calendar?” or “how do I turn on my out of office replies?” DigitalGenius chief strategy officer Mikhail Naumov confirms that customer service is easy because a history of recorded answers is a goldmine for machines to learn appropriate responses from.

 

At the other extreme, we now have robots that can heal themselves and this technology has been around for more than 12 months.

 

Somewhere between the two sit our software machines. Without physical moving robot parts, the technology that we interact with from our desktops or mobiles boils down to a bunch of code on a hardware base. If it all comes down to binary, will it one day be able to fix itself?

 

Software developers might start to get worried. Grab a cup of coffee and read this article about how we’ll no longer write code to program machines, instead we’ll train them like dogs. Yippee says the girl who hates coding.

 

A toe in the water example is Microsoft’s ‘Troubleshooter’ capability. Still initiated by a human, it will look for known common causes of problems with Windows Updates, your network connectivity or Windows Store Apps.  Yes, I know, your results may vary, but it’s a start.

 

IBM was playing around with Autonomic Computing back in 2003. They mention automatic load balancing as an example of self-optimization which I guess is a very rudimentary autonomic task.

 

Now we’ve built some intelligence into monitoring, diagnostics and remote management. Some monitoring systems can attempt a pre-programmed resolution step for example (e.g. if the service stops, try and restart the service). There are even a few conferences on Cloud and Autonomic Computing http://icac2016.uni-wuerzburg.de/  http://www.autonomic-conference.org/iccac-2017/

 

But autonomic computing of the future looks to building systems that can monitor, react, protect and manage themselves without human intervention. Systems will be self-healing, self-configuring, self-protecting and self-optimizing. We won’t program automation anymore, we’ll train the systems what try when they are failing (or maybe train them to aid each other? Paging Dr Server!).

 

I’m not sure if that’s a future I’ll really looking forward to or if it scares the heck out of me. When I get flashbacks to server that won’t boot and log in after a failed Microsoft patch, I’d gladly settle for one that correctly identifies it was a bad patch, reboots & uninstalls and actually returns it to the previous good state, all automatically.

 

But maybe the service desk tickets and red dashboard icons are keeping me in a job? What would you do if the servers & networks could fix themselves?

I *may* have eaten my weight in turkey and stuffing last week. But the best part about the holiday was how I spent the better part of four days disconnected from just about everything. Disconnecting from time to time was the subject of a talk by adatole recently at a DevOpsDays event. Here's the video if you want to see Leon deliver a wonderful session on a topic that is important to everyone, inside and outside of IT.

 

Also, here's a bunch of other links I found on the Intertubz that you may find interersting, enjoy!

 

Great. Now Even Your Headphones Can Spy on You

As if I needed more reasons to be paranoid, apparently even my tinfoil hat won't help stop this threat.

 

Madison Square Garden, Radio City Music Hall Breached

A. Full. Year.

 

Shift Your Point of View to When America Was “Better”

Because I love data visualizations, and so should you.

 

How Long Did It Take To Make Food in Ancient Times?

Pretty sure if I had to wait 20 days to make some coffee my head would explode.

 

Oracle Bare Metal Cloud: Top Considerations and Use-Cases

The more I read pieces like this, the more I think Oracle is a sinking ship. Take this quote for example: "we begin to see that there is a market for public cloud consumption and the utilization of cloud services". Hey, Larry, it's 2016, most of us knew there was a market 10 years ago. And telling me that your Cloud will be better than other clouds becuase...why exactly?

 

Ransomware Result: Free Ticket to Ride in San Francisco

Get used to seeing more attacks like this one, but more disruptive. It wouldn't take much to shut down the trains altogether in exchange for a quick payout.

 

Fake News Is Not the Only Problem

A bit long, and politics aside, the takeaway for me here is the sudden realization by many that the Internet may not be the best source of news.

 

When your team is stuck, rub a little DevOps on your process and everything will be fine:

 

DevOps-lotion.jpg

great-db.png

I’ve stated often that great database performance starts with great database design. So, if you want a great database design you must find someone with great database experience. But where does a person get such experience?

 

We already know that great judgment comes from great experience, and great experience comes from bad judgment. That means great database experience is the result of bad judgment repeated over the course of many painful years.

 

So I am here today to break this news to you. Your database design stinks.

 

There, I said it. But someone had to be the one to tell you. I know this is true because I see many bad database designs out in the wild, and someone is creating them. So I might as well point my finger in your direction, dear reader.

 

We all wish we could change the design or the code but there times when it is not possible to make changes. As database usage patterns push horrible database designs to their performance limits database administrators are then handed an impossible task: Make performance better but don’t touch anything.

 

Imagine that you take your car to a mechanic for an oil change. You tell the mechanic they can’t touch the car in any way, not even open the hood. Oh, and you need it done in less than an hour. Silly, right? Well I am here to tell you that it is also silly to go to your database administrator and say: “we need you to make this query faster and you can’t touch the code”.

 

Lucky for us the concept of "throwing money at the problem” is not new as shown by this ancient IBM commercial. Of course throwing money at the problem does not always solve the performance issue. This is the result of not knowing what the issue is to begin with. You don’t want to be the one to spend six figures on new hardware to solve an issue with query blocking. Even after ordering the new hardware it takes time before arrival, installation, and the issue resolved.

 

That's why I put together this list of things that can help you fix database performance issues without touching code. Use this as a checklist to research and take action upon before blaming code. Some of these items cost no money, but some items (such as buying flash drives) might. What I wanted to do was to provide a starting point for things you can research and do yourself.

 

As always: You’re welcome.

 

Examine your plan cache

If you need to tune queries then you need to know what queries have run against your instance. A quick way to get such details is to look inside the plan cache. I’ve written before about how the plan cache is the junk drawer of SQL Server. Mining your plan cache for performance data can help you yield improvements such as optimizing for ad-hoc workloads, estimating the correct cost threshold for parallelism, or which queries are using a specific index. Speaking of indexes…

 

Review your index maintenance

Assuming you are doing this already, but if not then now is the time to get started. You can use maintenance plans, roll your own scripts, or use scripts provided by some Microsoft Data Platform MVPs. Whatever method you choose, make certain you are rebuilding, reorganizing, and updating statistics only when necessary. I’d even tell you to take time to review for duplicate indexes and get those removed.

 

Index maintenance is crucial for query performance. Indexes help reduce the amount of data that searched and pulled back to complete a request. But there is another item that can reduce the size of the data searched and pulled through the network wires…

 

Review your archiving strategy

Chances are you don’t have any archiving strategy in place. I know because we are data hoarders by nature, and only now starting to realize the horrors of such things. Archiving data implies less data, and less data means faster query performance. One way to get this done is to consider partitioning. (Yeah, yeah, I know I said no code changes; this is a schema change to help the logical distribution of data on physical disk. In other words, no changes to existing application code.)

 

Partitioning requires some work on your end, and it will increase your administrative overhead. Your backup and recovery strategy must change to reflect the use of more files and filegroups. If this isn’t something you want to take on then instead you may instead want to consider…

 

Enable page or row compression

Another option for improving performance is data compression at the page or row level. The tradeoff for data compression is an increase in CPU usage. Make certain you perform testing to verify the benefits outweigh the extra cost. For tables that have a low amount of updates and a high amount of full scans then data compression is a decent option. Here is the SQL 2008 Best Practices whitepaper on data compression which describes in detail the different types of workloads and estimated savings.

 

But, if you already know your workload to that level of detail, then maybe a better option for you might be…

 

Change your storage configuration

Often this is not an easy option, if at all. You can’t just wish for a piece of spinning rust on your SAN to go faster. But technology such as Windows Storage Spaces and VMware’s VSAN make it easy for administrators to alter storage configurations to improve performance. At VMWorld in San Francisco I talked about how VSAN technology is the magic pixie dust of software defined storage right now.

 

If you don’t have magic pixie dust then SSDs are an option, but changing storage configuration only makes sense if you know that disk is your bottleneck. Besides, you might be able to avoid reconfiguring storage by taking steps to distribute your I/O across many drives with…

 

Use distinct storage devices for data, logs, and backups

These days I see many storage admins configuring database servers to use one big RAID 10, or OBR10 for short. For a majority of systems out there the use of OBR10 will suffice for performance. But there are times you will find you have a disk bottleneck as a result of all the activity hitting the array at once. Your first step is then to separate out the database data, log, and backup files onto distinct drives. Database backups should be off the server. Put your database transaction log files onto a different physical array. Doing so will reduce your chance for data loss. After all, if everything is on one array, then when that array fails you will have lost everything.

 

Another option is to break out tempdb onto distinct array as well. In fact, tempdb deserves its own section here…

 

Optimize tempdb for performance

Of course this is only worth the effort if tempdb is found to be the bottleneck. Since tempdb is a shared resource amongst all the databases on the instance it can be a source of contention. But we operate in a world of shared resources, so finding tempdb being a shared resource is not a surprise. Storage, for example, is a shared resource. So are the series of tubes that makes up your network. And if the database server is virtualized (as it should be these days) then you are already living in a completely shared environment. So why not try…

 

Increase the amount of physical RAM available

Of course, this only makes sense if you are having a memory issue. Increasing the amount of RAM is easy for a virtual machine when compared to having to swap out a physical chip. OK, swapping out a chip isn’t that hard either, but you have to buy one, then get up to get the mail, and then bring it to the data center, and…you get the idea.

 

When adding memory to your VM one thing to be mindful about is if your host is using vNUMA. If so, then it could be the case that adding more memory may result in performance issues for some systems. So, be mindful about this and know what to look for.

 

Memory is an easy thing to add to any VM. Know what else is easy to add on to a VM?

 

Increase the number of CPU cores

Again, this is only going to help if you have identified that CPU is the bottleneck. You may want to consider swapping out the CPUs on the host itself if you can get a boost in performance speeds. But adding physical hardware such as a CPU, same as with adding memory, may take too long to physically complete. That’s why VMs are great, as you can make modifications in a short amount of time.

 

Since we are talking about CPUs I would also mention to examine the Windows power plan settings, this is a known issue for database servers. But even with virtualized servers resources such as CPU and memory are not infinite…

 

Reconfigure VM allocations

Many performance issues on virtualized database servers are the result of the host being over-allocated. Over-allocation by itself is not bad. But over-allocation leads to over-commit, and over-commit is when you see performance hits. You should be conservative with your initial allocation of vCPU resources when rolling out VMs on a host. Aim for a 1.5:1 ratio of vCPU to logical cores and adjust upwards from there always paying attention to overall host CPU utilization. For RAM you should stay below 80% total allocation, as that allows room for growth and migrations as needed.

 

You should also take a look at how your network is configured. Your environment should be configured for multi-pathing. Also, know your current HBA queue depth, and what values you want.

 

Summary

We’ve all had times where we’ve been asked to fix performance issues without changing code. The items listed above are options for you to examine and explore in your effort to improve performance before changing code. Of course it helps if you have an effective database performance monitoring solution in place to help you make sense of your environment. You need to have performance metrics and baselines in place before you start turning any "nerd knobs", otherwise you won't know if you are have a positive impact on performance no matter which option you choose.

 

With the right tools in place collecting performance metrics you can then understand which resource is the bottleneck (CPU, memory, disk, network, locking/blocking). Then you can try one or more of the options above. And then you can add up the amount of money you saved on new hardware and put that on your performance review.

Last week, we talked about monitoring the network from different perspectives. By looking at how applications perform from different points in the network, we get an approximation of the users' experience. Unfortunately, most of those tools are short on the details surrounding why there's a problem or are limited in what they can test.

On one end of our monitoring spectrum, we have traditional device-level monitoring. This is going to tell us everything we need to know that is device-specific. On the other end, we have the application-level monitoring discussed in the last couple of weeks. Here, we're going to approximate a view of how the end users see their applications performing. The former gives us a hardware perspective and the latter gives us a user perspective. Finding the perspective of the network as a whole is still somewhere between.

Using testing agents and responders on the network at varying levels can provide that intermediate view. They allow us to test against all manner of traffic, factoring in network latency and variances (jitter) in the same.

Agents and Responders

Most enterprise network devices have built-in functions for initiating and responding to test traffic. These allow us to test and report on the latency of each link from the device itself. Cisco and Huawei have IP Service Level Agreement (SLA) processes. Juniper has Real-Time Performance Monitoring (RPM) and HPE has its Network Quality Analyzer (NQA) functions, just to list a few examples. Once configured, we can read the data from them via Simple Network Management Protocol (SNMP) and track their health from our favourite network monitoring console.

Should we be in the position of having an all-Cisco shop, we can have a look at SolarWinds' IP SLA Monitor and VoIP and Network Quality Manager products to simplify setting things. Otherwise, we're looking at a more manual process if our vendor doesn't have something similar.

Levels

Observing test performance at different levels gives us reports of different granularity. By running tests at the organization, site and link levels, we can start with the bigger picture's metrics and work our way down to specific problems.

Organization

Most of these will be installed at the edge devices or close to them. They will perform edge-to-edge tests against a device at the destination organization or cloud hosting provider. There shouldn't be too many of these tests configured.

Site

Site-to-site tests will be configured close to the WAN links and will monitor overall connectivity between sites. The point of these tests is to give a general perspective on intersite traffic, so they shouldn't be installed directly on the WAN links. Depending on our organization, there could be none of these or a large number.

Link

Each network device has a test for each of its routed links to other network devices to measure latency. This is where the largest number of tests are configured, but is also where we are going to find the most detail.

Caveats

Agent and responder testing isn't passive. There's always the potential for unwanted problems caused by implementing the tests themselves.

Traffic

Agent and responder tests introduce traffic to the network for purposes of testing. While that traffic shouldn't be significant enough to cause impact, there's always the possibility that it will. We need to keep an eye on the interfaces and queues to be sure that there isn't any significant change.

Frequency and Impact

Running agents and responders on the network devices themselves are going to generate additional CPU cycles. Network devices as a whole are not known for having a lot of processing capacity. So, the frequency for running these tests may need to be adjusted to factor that in.

Processing Delay

Related to the previous paragraph, most networking devices aren't going to be performing these tests quickly. The results from these tests may require a bit of a "fudge factor" at the analysis stage to account for this.

The Whisper in the Wires

Having a mesh of agents and responders at the different levels can provide point-in-time analysis of latencies and soft failures throughout the network. But, it needs to be managed carefully to avoid having negative impacts to the network itself.

Thanks to Thwack MVP byrona for spurring some of my thinking on this topic.

Is anyone else building something along these lines?

For government agencies, network monitoring has evolved into something extremely important, yet unnecessarily complex. For instance, according to Gleanster Research, 62 percent of respondents use on average three separate monitoring tools to keep their networks safe and functioning properly.

 

Network monitoring tools have become an integral part of agencies’ IT infrastructures, as they allow administrators to more easily track overall network availability and performance. All of this can be handled in real-time and with accompanying alerts, making network monitoring a must for agencies seeking to bolster their security postures.

 

Below, we’ll break down three monitoring techniques that will help you get a handle on how effective network monitoring can solve numerous problems for your agency.

 

Slay Problems through IP SLA

 

IP SLA – or short for Internet Protocol Service Level Agreements – sounds complex. But in reality its function is a simple one: ensuring the voice-over-IP (VoIP) environment is healthy. IP SLA allows IT administrators to set up certain actions to occur on a network device and have the results of that operation reported back to a remote server.

 

For example, the operation may include checking if a Web page or DNS server is responding, or whether a DHCP server is responding and handing out IP addresses. This is a huge asset because it uses the existing devices within the network infrastructure rather than requiring you to set up separate devices (or agents on existing PCs or servers) to run tests.

 

Trace the NetFlow of “Conversations”

 

NetFlow has the ability to capture network “conversations” for you. NetFlow data is captured by one or more routers operating near the center of the network.

 

Simply put, if DesktopComputer_123 is sending a file to Server_ABC via FTP, that is one conversation. The same PC browsing a webpage on the same server using HTTP is another conversation. NetFlow operates in the middle of these conversations to collect data so that the monitoring server can then aggregate, parse, and analyze the data.

 

Hook Into API Monitoring

 

Using a network monitoring Application Protocol Interface (API) can be the murkiest of all of the techniques we’ve discussed. In essence, to understand how API is used, you must realize that there are hooks built into applications that allow for data requests. Each time this type of request is received, a response is sent back to the monitoring software, giving you a better understanding of how your network is performing. Microsoft System Center Operations Manager (SCOM) is a proprietary example of a network monitoring API, while VMware’s API is published and generally available.

 

Make no mistake — maintaining network security in today’s environment is more complex and crucial than ever. Having the tools in place – and understanding what tools are out there for federal government agencies – is a must.  But the good news is that these tools do exist.  And with less work than you may have expected, you can quickly understand and appreciate what you can do to crack the case of network security.

 

Find the full article on our partner DLT’s blog, TechnicallySpeaking.

Over the past 5 postings, I’ve talked about some trends that we have seen happening and gaining traction within the Cloud space. I’ve spoken of :

 

  • Virtualization – Where established trends toward virtualization, particularly VMWare, have been challenged by a variety of newcomers, who’s market share continues to grow. Most notable here is OpenStack as a hypervisor. VMWare has challenged the threat of Azure, AWS, and true OpenStack by embracing it with a series of API’s meant to incorporate the on-prem Virtual DataCenter with those peers in the hybrid space.

 

  • Storage – In the case of traditional storage, the trend has been to faster, with faster Ethernet or Fibre as interconnect, and of course, Solid State is becoming the norm in any reasonably high IO environment. But the biggest sea change is becoming that of Object based storage. Object really is a different approach, with replication, erasure encoding, and redundancy built-in.

 

  • Software Defined Networking – Eating quite drastically into the data center space these days is SDN. The complexities in routing tables, and firewall rules are being addressed within the virtual data center by tools like ACI (Cisco) and NSX (VMWare). While port reduction isn’t quite the play here. The ability to segment a network via these rules far surpasses any physical switch’s capacities. In addition, these rules can be rolled out quite effectively, accurately, and with easy roll-back. I find that these two pieces are truly compelling to the maintaining and enhancing the elegance of the network, while reducing the complexities laid onto the physical switch environment.

 

  • Containers – In the new world of DevOps, Containers, a way to disaggregate the application from the operating system, have proven yet another compelling way into the future. DevOps calls for the ability to update parts and pieces of an application, while Containers allow for the ability to scale the application, update it, and deploy it wherever and whenever you want.

 

  • Serverless and MicroServices – Falling into the equation of DevOps, where small components compiled together make up for the entire application, put together as building-blocks make the whole of the application quite dynamic, and modifiable. While the “Serverless” piece, which is somewhat a misnomer (due to the fact that any workload must reside on some compute layer), are dynamic, movable, and reliant less on the hypervisor or location, than wherever the underlying architecture actually resides.

 

So… What’s next in the data center infrastructure? We’ve seen tools that allow the data center administrator to easily deploy workloads into a destination wherever that may be, we’ve seen gateways that bridge the gap from more traditional storage to object based, we’ve seen orchestration tools which allow for the rapid, consistent, and highly managed deployment of containers in the enterprise/cloud space, and we’ve seen a truly cross-platform approach to serverless/MicroService type of architecture which eases the use of a newer paradigm in the data center.

 

What we haven’t seen is a truly revolutionary unifier. For example, when VMWare became the juggernaut it did become, the virtualization platform became the tool that tied everything together. Regardless of your storage, compute (albeit X86 particularly) and network infrastructure, with VMWare as a platform, you had one reliable and practically bulletproof tool with which to deploy new workloads, manage existing platforms, essentially scale it up or down as required, and all through the ease of a simple management interface. However, with all these new technologies, will we have that glue? Will we have the ability to build entire architectures, and manage them easily? Will there be a level of fault tolerance; an equivalent to DRS, or Storage DRS? As we seek the new brass ring and poise ourselves onto the platforms of tomorrow, how will we approach these questions?

 

I’d love to hear your thoughts.

Part 2 of a 3-part series, which is itself is a longer version of a talk I give at conferences and conventions.

You can find part 1 here.

I'd love to hear your thoughts in the comments below!

 

In the first part of this series, I made a case for why disconnecting some times and for some significant amount of time is important to our health and career. In this segment I pick up on that idea with specific things you can do to make going offline a successful and positive experience.

 

Don’t Panic!

If you are considering taking time to unplug, you probably have some concerns, such as:

  • how often and for how long should you unplug
  • how do you  deal with a workload that is already threatening to overwhelm you
  • how will your boss, coworkers, friends perceive your decision to unplug
  • how do you maintain your reputation as a miracle worker if you aren’t connected
  • how do you deal with pseudo medical issues like FOMO
  • what about sev1 emergencies
  • what if you are on-call

 

Just take a deep breath. This isn't as hard as you think.

 

Planning Is Key

"To the well-organized mind, death is but the next great adventure."

- Albus Dumbledore

 

As true as these words might be for Nicolas Flamel as he faces his mortality, they are even truer for those shuffling off the mortal coil of internet connectivity. Because, like almost everything else in IT, the decisions you make in the planning phase will determine the ultimate outcome. Creating a solid plan can make all the difference between experiencing boring, disconnected misery and relaxed rejuvenation.

 

The first thing to plan out is how long you want to unplug, and how often. My advice is that you should disconnect as often, and for as long per session, as you think is wise. Period. It's far more important to develop the habit of disconnecting and experience the benefits than it is to try to stick to some one-size-fits-most specification.

 

That said, be reasonable. Thirty minutes isn't disconnecting. That’s just what happens when you're outside decent cell service. You went offline for an hour? I call that having dinner with Aunt Frieda, the one who admonishes you with a “My sister didn't raise you to have that stupid thing out at the table." Haven't checked Facebook for two or three hours? Amateur. That's a really good movie, or a really, REALLY good date.

 

Personally, I think four hours is a good target. But that's just me. Once again, you have to know your life and your limits.

 

At the other end of the spectrum, unless you are making some kind of statement, dropping off the grid for more than a day or two could leave you so shell shocked that you'll avoid going offline again for so long you may as well have never done it.

 

One suggestion is to try a no-screens-Sunday-morning every couple of weeks, and see how it goes. Work out the bugs, and the re-evaluate to see if you could benefit from extending the duration.

 

It's also important to plan ahead to decide what counts as online for you. This is more nuanced that it might seem. Take this seemingly clear-cut example: You plan to avoid anything that connects to the outside world, including TV and radio. There are still choices. Does playing a CD count? If so, can you connect to your favorite music streaming service since it’s really just the collection of music you bought? What about podcasts?

 

The point here is that you don’t need to have the perfect plan. You just need to start out with some kind of plan and be open-minded and flexible enough to adjust as you go.

 

You also need to plan your return to the land of the connected. If turning back on again means five hours of hacking through email, twitter feeds, and Facebook messages, then all that hard won rest and recharging will have gone out the window. Instead, set some specific parameters for how you reconnect. Things like:

  • Limit yourself to no more than 30 minutes of sorting through email and deleting garbage
  • Another 30 to respond to critical social media issues
  • Decide which social media you actually HAVE to look at (Do you really need to catch up on Pinterest and Instagram NOW?)
  • If you have an especially vigorous feed, decide how far back (in hours) that you will scroll

 

As I said earlier, any good plan requires flexibility. These plans are more contingencies than tasks, and you need to adhere to a structure, but also go with the flow when things don't turn out exactly as expected.

 

Preparation is Key

Remember how I said that Shabbat didn't mean sitting in the dark eating cold sandwiches? Well, the secret is in the preparation. Shabbat runs from Friday night to Saturday night, but a common saying goes something like, "Shabbat begins on Wednesday.” This is because you need time to get the laundry done and food prepared so that you are READY when Friday night arrives.

 

An artist friend of mine goes offline for one day each week. I asked him what happens if he gets an idea in the middle of that 24-hour period. He said, "I make an effort all week to exhaust myself creatively, to squeeze out every idea that I can. That way I look at my day off as a real blessing. A day to recharge because I need it."

 

His advice made me re-think how I use my time and how I use work to set up my offline time. I ask myself whether the work I'm doing is the stuff that is going to tear my guts out when I'm offline if it's not done. I also use a variety of tools - from electronic note and to-do systems to physical paper - so that when it's time to drop offline, I have a level of comfort that I'm not forgetting anything, and that I'll be able to dive back in without struggling to find my place.

 

Good preparation includes communicating your intentions. I'm not saying you should broadcast it far and wide, but let key friends, relatives, and coworkers know that you will be “…out of data and cell range.”

 

This is exactly how you need to phrase it. You don’t need to explain that you are taking a day to unplug. That's how the trouble starts. Tell people that you will be out of range. Period.

 

If needed, repeat that phrase slowly and carefully until it sounds natural coming out of your mouth.

 

When you come back online, the opposite applies. Don't tell anyone that you are back online. Trust me, they'll figure it out for themselves.

 

In the next installment, I'll keep digging into the specifics of how to make going offline work for you. Meanwhile, if you have thoughts, suggestions, or questions, let me know in the comments below!

dsmain.gif

(image courtesy of Marvel)

 

...I learned from "Doctor Strange"

(This is part 1 of what will be a 4-part series. Enjoy!)

 

"When the student is ready, the teacher appears," is a well-known phrase, but I was struck recently by the way that sometimes the teacher appears in unexpected forms. It's not always the kindly and unassuming janitor, Mr. Miyagi, or the crazy old hermit, Ben Kenobi. Sometimes the teacher isn’t a person or a character, but an entire movie filled with lessons for ready students. 

 

I found myself in that situation recently, as I sat watching Dr. Strange, the latest installment in the Marvel cinematic universe.

 

There, hidden among the special effects, panoramic vistas, and Benedict Cumberbatch's cheekbones were some very real and meaningful IT career lessons, applicable to both acolytes and masters as they walk the halls of your own technological Kamar Taj. In fact, I discovered a series of lessons, more than I can fit into just one essay.

 

So, over the next couple of installments I'm going to share them with you, and I’d like to hear your thoughts and reactions in the comments below.

 

If it needs to be said, there are many spoilers in what follows. If you haven't seen the movie yet, and don't want to know what's coming, bookmark this page to enjoy later.

 

Know the essential tools of the trade

The movie introduces us to the concept of a sling ring, a magical device that allows a sorcerer to open a portal to another location. In the narrative arc of the movie, this appears to be one of the first and most basic skills sorcerers are taught. It was also the key to many of the plot twists and a few sight gags in the movie. In my mind, I equated the concept of the sling ring with the idea that all IT pros need to understand and master basic skills, such as IP subnetting, command line syntax, coding skills, and security.

 

Can you be a solid IT pro without these skills? Sure, but you'll never be a master, and odds are good that you'll find yourself hanging around the lower end of the career ladder far longer than you’d like.

 

Think creatively about how to use the technology you already have

In the movie, immediately after figuring out how to use a sling ring, we see the hero use it in non-standard ways. Instead of opening a portal for his whole body, he opens holes just big enough for his hands, so that he can borrow books from the library and avoid being detected by Wong the librarian. We see this again in the use of the Eye of Agamotto during Doctor Strange's face-off against Dormamu.

 

The great thing about essential IT skills is that they can be used in so many ways. Understanding network routing will allow you to build stronger and more secure environments in the cloud. A grasp of regular expressions will help you in coding, in using various tools, and more. Understanding the command line, rather than being trapped in the GUI all the time, allows you to automate tasks, perform actions more quickly, and extend functionality.

 

It's worth noting that here at SolarWinds we place great stock in enabling our users to think outside the box. We even have a SolarWinds User Group (SWUG) session on doing just that – called “Thinking Outside the Box”.

 

Don't let your desire for structure consume you

In the movie, Mordo began as an ally, and even friend, of Stephen Strange, but displayed certain issues throughout the movie. In claiming he had conquered his demons, the Ancient One replied, "We never lose our demons. We only learn to live above them."

 

Mordo’s desire to both protect the natural order and remain steadfastly within its boundaries proved his undoing, with him leaving the sorcerers of Kamar Taj when he found that both the Ancient One and Doctor Strange had bent rules in order to save the world.

 

I find this relevant when I see seasoned IT pros forcing themselves to operate within constraints that don't exist, except in their own minds. When I hear IT pros proclaim that they would never run (name your operating system, software package, or hardware platform) in their shop, it's usually not for any sound business reason. And when those standards are challenged, I have watched more than a few seasoned veterans break rather than bend. It's not pretty, and it's also not necessary.

 

There are never too many sorcerers in the world

Mordo's reaction is extreme. He begins hunting down other practitioners of the magical arts and taking their power, proclaiming, "There are too many sorcerers in the world!"

 

There are times in IT when it feels like EVERYONE is trying to become a (again, fill in your technology or specialty here) expert. And it's true that when a whole crop of new folks come into a discipline, it can be tiresome watching the same mistakes being made, or having to explain the same concepts over and over.

 

But the truth is that there are never enough sorcerers, or in our case, specialists, in the world. There's plenty of work to go around. And the truth is that not everyone is cut out for some of these specialties, and they soon find themselves overwhelmed and leave – hopefully to find an area of IT that suits them better.

 

While I don't expect that anyone reading this will magically extract the IT power from their peers, I have watched coworkers shoot down or even sabotage the work of others just so they can maintain their own privileged status. I'm happy to say that this tactic rarely works, and never ends well.

 

Persistence often pays off

At one point in the movie, the Ancient One sends Strange on a trip through alternate dimensions, then asks, "Have you seen that at a gift shop?" When Strange begs her to teach him, her response is a firm “no.” Hours later, Strange is wailing at the door, begging to be let in.

 

At some point in your career, you may have an epiphany and realize that your career goals point you toward a certain technology or discipline. And, just your luck, there's a team that specializes in exactly that! So you go to the manager or team lead and ask if you can join up.

 

Your first request to join the team may fall on deaf ears. And your second. You may need to hang, like a sad puppy dog, around them in the lunchroom or around the water cooler for a while. Unlike Doctor Strange, it may take weeks or even months of persistence, rather than a few hours. But that doesn't mean it's not worth it.

 

Did you find your own lesson when watching the movie? Discuss it with me in the comments below. And keep an eye out for parts 2-4, coming in the following weeks.

The series is a general interest piece and not related to SolarWinds products in any way, nor will it be used to promote Solarwinds products.

 

It will be hosted on THWACK.com, the free, open user community for monitoring experts.

 

Can you

Tomorrow is Thanksgiving here in the USA. I have much to be thankful for but these days I am most thankful that jennebarbour continues to let me write this series each and every week.

 

So, in that spirit, here's a bunch of links I found on the Intertubz that you may find appetizing, enjoy!

 

AOL is laying off 500 employees in a restructuring with focus on mobile, data and video

My first thought to this was "AOL still has employees?"

 

How to eat as much food as humanly possible this Thanksgiving

For those of us in IT that don't already know how to eat way more than necessary, here's a list to help.

 

Nothing Personal but I'm Taking Your Job

I've said it before, and I will say it again: If you aren't trying to automate your job away, someone else will do it for you.

 

6 links that will show you what Google knows about you

If you were curious to see yourself as Google sees you.

 

How To Ask A Question At A Conference

After a busy event season this is a nice reminder on how to be polite when asking questions during a session. Nobody wants to see you strut (HT to datachick).

 

Live Streaming Web Cam Views from Around the World

I'm wondering how many of these webcams are meant to be public, and how many are simply the result of the owner having no idea.

 

Eat, Fry, Love

If you haven't seen this video yet, you should. It's a great PSA about the dangers of deep-frying a turkey.

 

It won't happen this year, but that won't stop me from dreaming about this:

turbaconducken.jpg

 

Happy Thanksgiving!

Last week we talked about application-aware monitoring. Rather than placing our focus on the devices and interfaces, we discussed getting data that approximates our users' experiences. These users, are going to be distributed around the organization at least.  They may even be scattered around the Internet, depending on the scope of our application. We need to examine application performance from different perspectives to get a complete picture.

Any way we look at it, we're going to need active remote probes/agents to accomplish what we're looking for. Those should be programmable to emulate application behaviour, so that we can get the most relevant data. At the least, having something that can measure basic network performance from any point on the network is necesary. There are a few options.

NetPath

Last week, I was invited to Tech Field Day 12 as a delegate and had the opportunity to sit in on the first session of Networking Field Day 13 as a guest. Coincidentally, SolarWinds was the first presenter. Even more coincidentally, they were showing off the NetPath feature of Network Performance Monitor (NPM) 12. This product, while not yet fully programmable to emulate specific applications, provides detailed hop-by-hop analysis from any point at which an agent/probe can be placed. In addition, it maintains a performance history for those times when we get notification of a problem well after the fact. For those of you working with NPM 12, I'm going to recommend you have a very close look at NetPath as a beginning for this sort of monitoring. One downside of the NetPath probes is the requirement to have a Windows Professional computer running at each agent location. This makes it a heavier and more costly option, but well worth it for the information that it provides. Hopefully, the SolarWinds folks will look into lightweight options for the probe side of NetPath in the future. We're only at 1.0, so there's a lot of room for growth and development.

Looking at lighter, though less full-featured options, we have a few. They're mostly roll-your own solutions, but this adds flexibility at the cost of ease.

Lightweight VMs and ARM Appliances

If there's a little bit of room on a VM somewhere, that's enough space for a lightweight VM to be installed. Regular application performance probes can be run from these and report directly to a monitoring station via syslog or SNMP traps. These custom probes can even be controlled remotely by executing them via SSH.

In the absence of VM space, the same sort of thing can be run from a small ARM computer, like a Raspberry Pi. The probe device itself can even be powered by the on-board USB port of another networking device nearby.

Going back to NetPath for a moment, one option for SolarWinds is to leverage Windows Embedded and/or Windows IoT as a lightweight option for NetPath probes. This is something I think would be worth having a look at.

On-device Containers

A few networking companies (Cisco's ISR 4K line, for example) have opened up the ability to run small custom VMs and containers on the device itself. This extends the availability of agents/probes to locations where there are no local compute resources available.

Built-in Router/Switch Functions

Thwack MVP byrona had a brilliant idea with his implementation of IP SLA in Cisco routers and having Orion collect the statistics, presumably via SNMP. This requires no additional hardware and minimal administrative overhead. Just set up the IP SLA process and read the statistics as they're generated.

The Whisper in the Wires

NetPath is looking like a very promising approach to monitoring from different points of view. For most other solutions, we're unfortunately still mostly at the roll-your own stage. Still, we're seeing some promising solutions on the horizon.

What are you doing to get a look at your application performance from around the network?

devopsdays_ohio.png

 

I wanted to share some of the things I heard and saw during the incredible two days I spent with 300+ attendees at DevOps Days Ohio.

 

First, I have to admit that after more than a year of attending DevOpsDays around the country, I'm still working on my own definition of what DevOps is, and how it compares and contrasts with some of the more traditional operations. But this event helped gel a number of things for me.

 

What I realized, with the help of this article (which came out while I was at the conference), is that my lack of clarity is okay, because sometimes the DevOps community is also unclear on what they mean.

 

One of the ongoing points of confusion for me is the use of words I think I know, but in a context that tells me it means something else. Case in point: configuration management. In my world, that means network device configurations, specifically for backing up, comparing, auditing, and rolling out. But then I hear a pronouncement that, "Config management is code," and, "If you are working on configs, you are a developer now." And most confusingly, "To do config management right, you need to be on Git."

If this has ever struck you as strange, then you (and I) need to recognize that to the DevOps community, the server (and specifically the virtualized server) is king, and the config management they're talking about is the scripted creation of a new server in on-premises or cloud-based environments.

 

This led to some hilarious interactions for me, including a side conversation where I was talking about on-call emergencies and the other person said, "I don't know why on-call is even a thing any more. I mean, if a system is having a problem, you should just delete it and rebuild it from code, right? Humans don't need to be involved at all."

 

To which I replied, "Interesting idea, but to my knowledge it's very difficult to delete and re-build a router with a bad WIC using nothing but code."

 

The reply? "Oh, well, yeah, there's that."

 

The point of this is not that DevOps-focused IT pros are somehow clueless to the realities of the network, but that their focus is so intensely trained on optimizing the top end of the OSI model, that we monitoring experts need to allow for that, and adjust our dialogue accordingly.

 

I was honestly blown away to learn how far DevOps culture has made in-roads, even into traditionally risk-averse environments, such as banking. I worked at a bank between 2006 and 2009, right in the middle of the home mortgage crisis, and I could never imagine something like DevOps taking hold. But we heard from folks at Key Bank who spoke openly about the concerns, challenges, and ultimately successes that their shift to DevOps has garnered them, and I saw that the value that cloud, hybrid IT, micro-services, and agile development holds for business that are willing to consider it within the context of their industry, and implement it rationally and thoughtfully.

 

I was also heartened to hear that monitoring isn't being overlooked. One speaker stated flat out that having monitoring in place is table stakes for rolling out micro-services. This shows an appreciation for the skills we monitoring engineers bring to the table, and presages a potential new avenue for people who simply have monitoring as a bullet item on their to do list to make the leap into a sub-specialization.

 

There is a lot of work to do, in the form of education, for monitoring specialists and enthusiasts. In one-on-one conversations, as well as in OpenSpace discussions, I found experienced DevOps folks conflating monitoring with alerting; complaining about alerts as noise, while demonstrating a lack of awareness that alerts could be tuned, de-duplicated, or made more sophisticated, and therefore more meaningful; and overlooking the solutions of the past simply because they believed new technology was somehow materially different. Case in point, I asked why monitoring containers was any harder or even different from monitoring LPARs on AIX, and got nervous chuckles from the younger folks, and appreciative belly laughs from some of the old timers in the room.

 

However, I came to the realization that DevOps does represent a radical departure for monitoring engineers in its "Cattle, not Pets" mentality. When an entire server can be rebuilt in the blink of an eye, the best response to a poorly behaving service is truly not to fix the issue. That attitude alone may take time for those of us who may be mired in biases based in the old days of bare-metal hardware and servers we named after the Brady Bunch or Hobbit dwarves.

 

Overall, I am excited for the insights that are finally gelling in my mind, and look forward to learning more and becoming a more fluent member of the DevOps community, especially during my upcoming talk at DevOpsDays Tel Aviv!

 

One final thing: I gave an Ignite talk at this conference and found the format (five minutes, 20 slides that auto-advance every 15 seconds), to be both exhilarating and terrifying. I'm looking forward to my next chance to give one.

Staying one step ahead of hackers trying to infiltrate an IT environment is challenging. It can be nearly impossible if those tasked with protecting that environment don’t have visibility across all of the systems and infrastructure components. Using unified monitoring software gives integrated cross-domain visibility and a solid view of the whole environment.

 

Let’s take a look at an attack scenario

Perhaps a hacker gains access through a Web application with a structured query language-injection attack against a database server. The attack compromises the database and exfiltrates data or gains credentials.

 

With access to the local database or server, the attacker can drop malware that could reverse an administrative session and gain access to other parts of the infrastructure, including routers, switches and firewalls. Attack evidence would likely be found in various places within the environment; such evidence might not trigger an alert, but taken together, these events clearly signal a problem.

 

Visibility leads to quick resolution

With comprehensive monitoring tools, clear insight and consistent education throughout the IT team and all agency personnel, the task can seem less daunting.

 

The tools

First, make sure monitoring tools are in place to provide deep visibility. These include the following:

 

  • Endpoints- User device tracking will provide information about where devices are located, how they connect to the network and who uses them.
  • Data- Make sure you have monitoring in place that will detect and block malicious file transfer activities and software designed to securely transfer and track files coming into and going out of the agency.
  • Patching- In large environments, something always needs to be updated. Therefore, it is important to use software that automatically patches servers and workstations.
  • Servers and applications- Always monitor server and application performance. This will help you find service degradation that could indicate an intrusion.
  • Databases- Create performance baselines for databases to ensure that any anomalies are registered.
  • Systems- Deep visibility into virtual machines and storage devices can provide insight into the root cause of any performance change.
  • Networks- Traffic analysis, firewall and router monitoring, and configuration compliance and optimization are all critical to ensuring the integrity of a network.

 

The knowledge

Once these tools are monitoring what they should, the resulting data needs to be fed into a consolidated view where it can be correlated and analyzed as a whole. Doing so lets IT pros quickly and decisively identify potential threats and take action where needed.

 

The training

Finally, it is important to make sure that the people who work on the network receive detailed security training. Making everyone aware of the seriousness of an attack and the role each worker plays in practicing good cyber hygiene—from the IT team to finance and public affairs—can go a long way in creating a more secure agency.

 

There is no one-size-fits-all solution when it comes to security, and attacks are becoming harder to prevent. That said, implementing the right tools, combining insights across domains and providing in-depth, regular training can improve detection and response capabilities.

 

Find the full article on Signal.

For the last couple years, the single hottest emerging trend in technology, a topic of conversation, the biggest buzzword, and a key criterion for designing both hardware and application bases has been the concept of containers.

 

At this point, we have approaches from Docker, Google, Kubernetes (k8s), Mesos, and notably, Project Photon from VMware. While discretely, there are differentiations on all fronts, the concept is quite similar. The container, regardless of the flavor, typically contains the packaged, migratable, and complete or component parts of the application. These containers work as workloads in the cloud, and allow for the ability to take that packaged piece and run it practically anywhere.

 

This is in direct contrast to the idea of virtual machines, which while VM’s can in some ways accomplish the same tasks, but in other ways, they’ve not got the portability to reside as-is on any platform. A VMware based virtual machine can only reside on a VMware host. Likewise Hyper-V, KVM, and OpenStack based VM’s are limited to their native platforms. Now, processes of migrating these VM’s to alternate platforms do exist. But the procedures are somewhat intensive. Ideally, you’d simply place your workload VM’s in their target environment, and keep them there.

 

That model is necessary in many older types of application workloads. Many more modern environments, however, pursue a more granular and modular approach to application development. These approaches allow for a more MicroServices type of concept. They also allow for the packaging and repackaging of these container based functions, and allow for the deployment to be relocated essentially at will.

 

In a truly “Cloud-Based” environment, the functionality and orchestration becomes an issue. As the adoption grows, the management of many containers becomes a bit clumsy, or even overwhelming. The tools from Kubernetes (Originally a Google project, then donated to the Cloud Native Computing Foundation) make the management of these “Pods” (basic scheduling units) a bit less of a difficulty. Tools are regularly expanded, and functionality of these tools grows, as part of an opensource code. Some of the benefits to this approach are that the community can access via tools like GitHub, the primitives and add to, optimize, and enhance them, to which these added functionalities are constantly being updated.

 

Opensource is a crucial piece of the equation. If your organization is not pursuing the agile approach, the “CrowdSourced” model of IT, which in my opinion is closed minded, then this concept is really not for you. But, if you have begun by delivering your code in parts and pieces, then you owe it to yourself to pursue a container approach. Transitions can present their own challenges, but the cool thing is that these new paradigm approaches can be done gradually, the learning curve can be tackled, there is no real outlay for the software, and from a business perspective, the potential beneficial enhancement on the journey to cloud, cloud-native, and agile IT are very real.

 

Do your research. This isn’t necessarily the correct approach to every IT organization, but it may be for yours. Promote the benefits, get yourself on https://github.com, and begin learning how your organization can begin to change your methods to approach this approach to IT management. You will not be sorry you did.

 

  • Some considerations that must be addressed prior to making the decision to move forward:
    Storage – Does your storage environment support containers? In the storage world, Object based is truly important
    • Application – Is your app functional in a micro-services/container based function? Many legacy applications are much too monolithic as to be supportable. Many new DevOps type applications are far more functional

          

          

  I’m sure that there are far more considerations.

This is a longer version of a talk I give at conferences and conventions. I would love to hear your responses, thoughts, and reactions in the comments below.

 

Do You Care About Being Constantly Connected?

For the next few minutes, I dare you to put down your phone, close up your laptop, and set aside your tablet. In fact, I double dog dare you. I've got $20 on the table that says you can't print this, find a quiet corner, and read it, away from any electronic interruptions in the form of beeps, pings, or tweets.

 

I. Triple. Dog. Dare. You.

tongue-to-pole1_commons.jpg

Separating ourselves from our devices, and, more broadly, from the internet, which feeds the devices that have become a lifeline for most of us – has been a topic of conversation for some time now. Recently, columnist Andrew Sullivan wrote a column for New York Magazine about coming to terms with his self-described addiction to technology. In "My Distraction Sickness: Technology Almost Killed Me", Sullivan provides sobering data for those of us who spend some (or most) of our days online:

  1. In just one minute, YouTube users upload 400 hours of video
  2. Tinder users swipe profiles over a million times
  3. Facebook users generate 1 billion likes every day
  4. In their regular SnapChatting career, a typical teen will post, receive, or re-post between 10,000 and 400,000 snaps
  5. A study published last year found that participants were using their phones for up to five hours a day…
  6. ... 85 separate times
  7. ... with most interactions lasting fewer than 30 seconds
  8. ... where users thought they picked up their phones half as often as they actually did
  9. Forty-six percent of the study subjects said they couldn't live without their phone

 

It’s important to recognize that we've arrived at this point in less than a decade. The venerable iPhone, the smartphone that launched a thousand other smartphones, debuted in 2007. Four years late one-third of all Americans owned one. Today, that number is up to two-thirds. If you only count young adults, that figure is closer to eighty-five percent.

 

This all probably comes as a surprise to no one reading this (likely on a smartphone, tablet, or laptop). Equally un-surprising is that, intellectually, we know what our screens are pulling us away from:

 

Life.

 

In his essay "7 Important Reasons to Unplug and Find Space", columnist Joshua Becker wrote:

 

"Life, at its best, is happening right in front of you. These experiences will never repeat themselves. These conversations are unfiltered and authentic."

 

In that same article, Mr. Becker quotes the progenitor of smartphones the digital patron saint of many IT pros, Steve Jobs, who said,

 

“We’re born, we live for a brief instant, and we die. It’s been happening for a long time. Technology is not changing it much – if at all.”

 

But it doesn't stop there. We should already understand what studies are showing:

 

I want to be clear though: this article is NOT about how bad it is to be connected. It would be disingenuous for me, someone who spends a majority of his day in front of a screen, wrote only about how bad it is to be connected. Not to mention, it wouldn't be particularly helpful.

 

My goal is to make it clear why disconnecting, at times and for a significant amount of time, is measurably important to each of us and can have a very real impact on the quality of our life, both online and off.

 

The Secret Society

You've probably read essays suggesting you take a technology cleanse or a data diet, as if the bits and packets of your network have gotten impacted and are now backing up the colon of your brain.

 

If you have heard such suggestions, you may have responded with, "What kind of crazy wing nut actually does that?"

 

Now I’d like to share a little secret with you: I belong to a whole group of wing nuts do this every week. We call this crazy idea Shabbat, or the Sabbath in English, and it is observed by Jews across the world.

jew-jitsu.jpg

(image courtesy Yehoshua Sofer)

 

Before I go any further, you should know that Judaism is not big on converting people so I'm not going to try to get anyone to join the tribe. I'm also not going to ask you to sign up for Amway.

 

On Shabbat, which begins at sundown Friday night and ends at sundown Saturday, anything with an ON switch is OFF limits. It can't be touched, moved, or changed. Yes, leaving the television set to SportsChannel 24 and just happening to walk past it every 10 minutes is cheating. And no, you don't sit in the dark and eat cold sandwiches. But I’ll talk more about that later.

 

But Shabbat only comes into play if you are one of the roughly 600,000 Jews in the United States (or 2.2 million worldwide) who fully observe the Sabbath. Which begs the question: if I'm not going to try to get YOU to be Jewish, where am I going with this?

 

In addition to being part of that crazy group of wing nuts, I've also worked in IT for 30 years. For almost a decade now, I've disconnected every single week, rain or shine, regardless of my job title, company, or on-call rotation. That has given me a unique perspective in tips, tricks, workarounds, and pitfalls.

 

So this is less of a you-should-unplug lecture, and more of a here’s HOW to unplug and not lose your job (or your marriage, or your mind) conversation.

 

Remember, this is just part one of a 3-part series. I'm looking forward to hearing your thoughts, suggestions, and ideas in the comments below!

The View From Above: James (CEO)

 

Another week, another network problem. On Tuesday morning I received an angry call from our CFO, Phyllis, who was visiting our Austin, TX site. The whole network is a mess, she told me, nothing is working properly and I can't do my job. I asked for more detail, but she just said the network was a nightmare and she couldn't even send emails. Great start to the day, especially as Austin is our main manufacturing plant, and if the network was as bad as Phyllis said it was, we were in for a bad week with our supply chain getting out of sync, which could negatively impact both our cashflow and our production output.

 

I called our new Senior Network Manager, Amanda, to let her know that the Austin office was down. She sounded surprised; apparently she had just been talking to the Inventory Management team, and they had been telling her that they were quite pleased with the performance of the company's inventory tool, especially given that it is based out of our data center in Raleigh, NC. I put her in touch with Phyllis and told her to figure out what was going on, because clearly things in Austin weren't going as great as she thought they were.

 

The View From The Trenches: Amanda (Sr Network Manager)

 

Two weeks have passed since I installed Solarwinds' Network Performance Manager, and so far things have been good. I should have guessed that the quiet wouldn't last long, however. I got a call from James around 10AM on Tuesday, and he was mad. Apparently Phyllis was on site in Austin, TX and told him that the network was broken. I knew it wasn't; I was just talking to the Inventory Management team about a project to implement handheld (WiFi) scanners, and they've been testing their old wired scanners in parallel to the WiFi scanners, and both have been working just great, so hopefully both the wireless and wired networks are functioning ok. Still, if Phyllis is upset, it's more than my job's worth to ignore her.

 

Phyllis is without question good at her job, but I get the impression that she would be happier using a large paper ledger and a pot of ink (and maybe even a feather quill pen). Computers are, in her eyes, an irritation, and trying to troubleshoot her problems over the phone is challenging to say the least. However, after a while I did manage to figure out what the problem was. It turns out that everything is down actually meant my email is working intermittently. About 9 months ago we moved our email to Microsoft's Office365, so the mail servers are now accessed via the internet. I confirmed with Phyllis that she was able to access our intranet without issue, which confirmed that our site network was not the problem, (I knew it!), but when she tried accessing the Internet -- including Outlook365 -- she was having problems. It wasn't a total loss of connectivity, but things were slow, and would sometimes lose her connection to the server altogether. Sounds like an Internet issue, but what - and where?

 

Time to fire up a browser to NPM. I checked the basics, but all the network hardware seemed fine, including our Internet routers and edge firewalls, so maybe it was something on the Internet itself. Unfortunately I know how these things work; if I can't prove where the problem is, the assumption is still that it's the network at fault. As I stared at the screen, the phone rang; Phyllis was on the line. I don't know why it took so long, she said, but it looks like whatever you did worked. Finally I can get on with my day's work. And she hung up. Had she stayed on the line I'm not sure if I would have admitted that I'd done nothing, but at least the immediate pressure seemed to be off. But what caused the problem? And worse, now the problem had cleared itself up, there aren't really any tests I could do to troubleshoot. At this point, I remembered NetPath.

 

When I installed NPM, I installed a bunch of probes and set up some monitoring of a number of services to see what it would look like. My idea was that I'd be able to monitor network performance from a few sites, but I got so consumed with setting up device monitoring I pushed that aside for a bit. In the background however, the probes had been faithfully gathering data for me about their connectivity to a number of key sites including -- by incredible good fortune -- the email service. I started off by checking what the NetPath traffic graph looked like right now, when data was successfully flowing to Office365. NetPath had identified that traffic seemed to pass through one of three potential service providers between our Austin site's internet provider and the Office365 servers on the Internet, with the vast majority (around 80%) likely to be sent through TransitCo, a large provider in Texas and the South Central states. At the bottom of the screen was the Path History bar, and it was clear to see that while everything was now green, there was a large chunk of red showing on the timeline for both availability and latency. Time to wind the clock back.

 

Clicking on one of the red blocks, the NetPath display updated and ... whoa ... ok, that explains it. TransitCo's router was lit up in red (along with some attached links) and NetPath was reporting 90% packet loss through that path, and extremely high latency. No wonder Phyllis was having problems staying connected! Data in hand, I called up TransitCo to ask them about their service interruption and they confirmed that an interface had gone bad but the routing engine had for some reason kept on pumping traffic down that link. They had completed a reboot and an interface replacement around 30 minutes earlier, and service was restored. Amazing. Our own Internet provider wouldn't have reported this as it wasn't their direct problem, and there's no way we could sign up for alerts from every other provider just to keep abreast of the outages. If we hadn't had this tool, I'd still be scratching my head wondering what on earth had happened this morning. Still, while I find out a way to get a better handle on upstream provider problems, at least I can now go back and report on the cause and scope of the outage. And maybe I can sell my VP on funding a secondary Internet link out of Austin from another provider, just in case something like this happens again.

 

I've not even had it installed for a month, but Solarwinds NPM saved the day (or my reputation, at least). I think I'll be checking out what other products they have.

 

 

>>> Continue reading this story in Part 3

A successful help desk seeks to solve incidents quickly, find resolutions to persistent problems, and keep end-users happy. The help desk is the first line of defense triaging tickets and working with end-users directly to fix their technical problems, and this is no easy task.

 

In order to keep ticket queues low and morale high, help desk managers should consider these three key principles:

 

1)     Dedicated People

2)     Established Processes

3)     Centralized Information

 

Dedicated people is the first key principle.

 

A help desk doesn’t necessarily need senior level engineers with advanced degrees and 10 years’ experience. Instead, a solid first line of defense requires a solid team of hard workers who know how to locate information on internal information repositories and how to Google solutions to weird Windows and printer issues. The key here is hard work and dedication. I don’t mean dedication to showing up on time, necessarily, though that’s certainly important. What I mean is a dedication to getting the issue-at-hand resolved.

 

For example, during my first year in IT, I worked on a help desk serving a large government agency. We had hundreds of new tickets in the queue every day. My co-worker, Don, made it his simple goal to close as many tickets per week as he could. Don was already in his 30s and had changed careers from restaurant management, so he didn’t have decades of experience along with advanced computer science degrees and industry certifications. What he did have was a sheer determination to figure out an issue and get the problem fixed. Our end-users loved him and often asked for him specifically. He browsed through our internal wikis and Googled his life away looking for a way to fix an issue, and nearly every time he eventually figured it out.

 

This is what a good help desk needs: people who know how to do basic online research and are dedicated to sticking with an issue until it’s resolved.

 

Having clear, established processes is the second key principle.

 

My friend Don would have had a much more difficult time resolving tickets without the processes in place to enable him to get the job done. For example, a service desk manager must determine how tickets will be logged and organized, how they will be triaged, how they will be escalated, and how to provide quick information to help desk technicians to solve new tickets as they come in.

In my experience this means first finding the right ticket management system. Whether it’s in the cloud or on local servers, a solid ticket management system will make it easy for end-users to submit tickets and for the service desk to organize, triage, and resolve them. I personally prefer a single source of truth in which the ticketing system is not only a way to organize tickets but also an information repository and a method to communicate with end-users. In this way technicians can log into one system and find everything they need to get the job done. Navigating multiple systems and many windows is a sure-fire way to forget (or ignore) tickets and spend way too much time looking up simple information such as license keys or asset locations.

 

Another important part of clear and established help desk processes is accountability. This must be built in to the help desk processes and not just assumed. Tickets get lost, and sometimes they’re ignored. This may be because the help desk is dealing with a huge number of tickets with too few people, but I’ve seen many tickets ignored because they were difficult, long-winded, or because the end-user was a well-known jerk.

 

Rather than have tickets come in from end-users into a general queue, consider having them all go first to a help desk manager or team lead to very quickly triage and be assigned to the appropriate technician. I have seen struggling service desks go from zero to hero implementing just this one simple process.

 

A decent ticketing system will have escalation timers, auto-responders, and many other built-in tools to automate workflow, but don’t rely on the software alone to maintain some semblance of order. This is a top-down process beginning with help desk managers and team leads.

 

Maintaining a centralized, updated information repository is the last key principle.

 

Let’s face it, most companies use Windows computers for their end-users. Yes, I know there are exceptions, but even Apple devices and various flavors of Linux are not custom-built operating systems that no one has ever heard of. That means many end-user issues are not unique to any one company. What is unique is the company-specific knowledge.

 

What are the IP addresses of the domain controllers? Where is the installation file for the billing software kept? Does the new branch office use a Windows DHCP server or is it running off their core switch?

 

Having a centralized repository of information is priceless to a helpdesk technician. Better yet is when the repository is also the ticket management system, and even better yet is when it also contains documentation for how to solve recurring issues or how to install weird company software.

 

In my first job as a network engineer I worked near the service desk who sat in the next cubicle area. As the number of customers grew, so did the number of technicians, and so did the amount of information needed to resolve tickets. We used a great ticket management system and kept as much information as possible in it. We also used an internal wiki page, but in order to get to it you had to follow a link embedded in the ticketing system.

 

They were able to support several thousand end-users with a help desk of only three technicians and one service desk manager. So important were these principles that if it anyone discovered that information wasn’t in the database that should have been, whoever was responsible to get it in there had to bring in donuts for the entire office. Yes, I brought donuts in a couple times, and so did our service desk manager and even the owner of the company.

 

There are volumes that can be written on how to provide successful end-user support. These three principles may be broad, and I’ve seen them implemented in very different ways. However, so long as you have dedicated people, clear processes, and an updated information repository, the help desk will be the successful first line of defense every CIO and Director of IT dreams of. 

 

 

 

 

 

Sitting back at the office getting work done, keeping the ship afloat, living the Ops life of the perception of DevOps, only to have your IT Director, VP or CxO come and demand, “Why aren’t we using Containers! It’s all the rage at FillInTheBlankCon!” And they start spouting off the Container of the week, Kubernetes, Mesos, Docker, CoreOS, Rocket, Photon, Marathon, and another endless Container product, accessory or component of a container.   If it hasn’t happened to you, that may be a future you’ll be looking at.  If it has happened to you, or you’ve already adopted some approach to Containers in your environment even more the better.

 

Screen Shot 2016-11-16 at 8.59.51 PM.png

Just as a VERY brief primer in the infinite world of containers for those of you who are not aware I’ll try to overly simplify it here.  Using the following image as an example and comparing it to Virtualization. Typically, Virtualization is hardware running a hypervisor to which you abstract of the hardware and install an Operating System on the VM and then install your applications into that. Whereas in most container scenarios you have hardware, running some kind of abstraction layer which you present Containers where you install your applications, abstracting out the Operating System.  

 

So quite possibly the most overly simplified version of it because there are MANY moving parts under the covers to make this a reality and make it possible.  However, who cares how it works as much as how you can use it to improve your environment, right?!

 

That’s kind of the key of things, Docker one of the more commonly known Container approaches (albeit technically Kubernetes is used more) has some real cool benefits and features of it. Docker officially has support for running Docker Containers on Microsoft Servers, Azure and AWS, and they also released Docker for Windows clients and OSX!   One particular benefit there that I like as a VMware user ishttp://www.virtuallyghetto.com/2016/10/powercli-core-is-now-available-on-docker-hub.htmlPowerCLI Core is now available on Docker Hub!

But they don’t really care about how you’re going to use it, because all roads lead to DevOps and how you’re supposed to implement things to make their lives better. But in the event that you will be forced down a road of learning a particular Container approach for better or worse it’s probably best to find a way to make it better your life rather than just another piece of infrastructure we’re expected to understand even if we don’t.   I’m not saying that one Container is better than another, I’ll leave that up to you guys to make that particular determination in the comments if you have Container stories to share.   Though I’m particular to Kubernetes when it comes to running cloud services on Google, but then I really like Docker when it comes to running Docker for OSX (because I run OSX )

The applications are endless and continually growing and the solutions are plentiful, some might say far too plentiful depending.   What are some of the experiences you’ve had with containers, the good the bad and the ugly, or is it an entirely new road you’re looking at pursuing but haven’t yet?  We’re definitely in the no judgement zone!

As always I appreciate your insight into how ya’ll use these technologies to better yourselves and your organizations as we all grow together!

 

Love or hate it, Office 365 is here to stay. For most companies that have not made the transition to the Office 365 yet, I say yet because it is a matter of time before the majority of business are running email in the cloud. When you are planning to move email to the cloud there are many considerations to make and one of them is your network.

 

Your network is key when you want to live in the cloud. One of the first things you should do after you’ve made decision to go to the cloud is review your network and estimate how much bandwidth you will use. Office 365 adds increased usage because of the synchronization outlook and downloading of templates. The amount of users connecting to the cloud and type of tasks they do will impact your bandwidth. Network performance is impacted by what the users are doing, for instance if everyone is streaming video or having multiple video conference calls on your network that will certainly cause high bandwidth which can impact your connectivity to cloud services.

 

Migrating to Office is not an overnight task as some may think. It can take week to months to be completely migrated to the cloud and a lot of this depends on your network. It is highly recommended to test and validate your internet bandwidth as this will impact your migration. Mailbox sizes will impact how fast or slow the migration to the cloud will be. Let’s say your organization has about 100TB emails in your on-premise environment and you want to migrate all that to the cloud. I will tell you it will not help happen in days it will be more like months. Keep in mind Microsoft does throttle how much date you pump into their network each night. Let’s just say you are using an internet connection of 100Mbit/s and you are at 100% speed you are looking at least 8-9 months but given that there is throttling involved and possibly other outside factors that would also effect the speed and bandwidth your real estimate would likely be closer to 10-12 months.

 

Slow internet means slow migration and possible failures along the way.  If your still have slow MPLS sites these are not ideal for Office 365, however Microsoft has partner with a select few providers to use the ExpressRoute. ExpressRoute is a private connectivity to Microsoft Office 365.  Microsoft has some tools that you can use to help estimate your network requirements. One of the important ones to look at is the Exchange Client Network Bandwidth Calculator which estimates the bandwidth required for Outlook, Outlook Web App, and mobile devices.

 

Once you have made it to the cloud it does not stop there. Ongoing performance tuning maybe needed to ensure that your users are happy and do not experience email “slowness”.  Given that Microsoft has published best practices articles on slow networks for Office 365 I am pretty sure your network guys will be called a lot to check network performance. They do give some recommendations such as:

 

  • Upgrade to Outlook 2013 SP1 or later for substantial performance improvements over previous versions.
  • Outlook Web App lets you create offline messages, contacts, and calendar events that are uploaded when OWA is next able to connect to Office 365.
  • Outlook also offers an offline mode. To use this, you must first set up cached mode so that information from your account is copied down to your computer. In offline mode, Outlook will try to connect using the send and receive settings, or when you manually set it to work online.
  • If you have a smart phone, you can use it to triage your email and calendar over your phone carrier's network. ( yes this as a real alternative…)

 

At the end of the day it comes to making sure your network is up to snuff when you are making your way to the cloud or you may have some headaches. Good Luck

Though we can sit around and talk about the threat of Skynet (as we have a little in comments on my previous posts) it seems the tech world is committed to the pursuit and enhancement of artificial intelligence. In fact, you’re almost not a leading tech company right now if I google your name + artificial intelligence and I get zero results. AI startups are also in hot demand. So what exactly are the technology leaders planning?

 

Microsoft – CEO Satya Nadella doesn’t keep it a secret that AI is key for his company “AI is at the intersection of our ambitions.” Even with the socially failed Tay chatbot experiment, Microsoft learnt that at least in the USA, chatbots would need to be built to be resilient to attacks. Most recently, Microsoft announced a partnership to support Elon Musk & Co’s OpenAI non-profit AI research organization with Azure computing power. Microsoft is staying true to its corporate mission, democratizing AI so it’s accessible for every person and every organization on the planet to help them achieve more.

 

Google: Google’s research division has been hard at work on Machine Intelligence for years, boasting 623 publications to-date in their library (that they are happy to publicly share).  Parent company Alphabet boasts the neural network company Deep Mind in its collection, acquired in 2014. Within the last few days, Google have added Jia Li (head of research at Snapchat) and Fei-Fei-Li (director of the AI lab at Stanford University) to lead a new group with the Google Cloud division.

 

Facebook: They’ve got access to your data and already have a reputation for serving you with targeted information. Facebook is focusing on how to scale, to deliver promises like “We’re trying to build more than 1.5 billion AI agents—one for every person who uses Facebook or any of its products.” Joaquin Candela, the head of the Applied Machine Learning group wins the award for my favorite AI quote though “We tend to take things like storage, networking, and compute for granted,” he says. “When the video teams builds live videos, people don’t realize the magnitude of this thing. It’s insane. The infrastructure team is just there, shipping magic—making the impossible possible. We need to do the same with AI. We need to make AI be so completely part of our engineering fabric that you take it for granted.” As an infrastructure junkie, I like anyone who calls my work ‘magic’.

 

Apple: Jumping on the AI bandwagon, Apple is kind of sad that their AI is so unobtrusive that people don’t even realize Apple’s in the AI game. And we’re not just talking about Siri.  Apple doesn’t have a dedicated machine learning department, but the capability underpins a lot of their product capabilities. They are certainly quieter than other brands about what they are working on behind the scenes. One interesting development is the enhancement of image processing software using AI, so physical lens hardware will no longer be the defining factor in camera capability.

 

Cisco: Not wanting to miss out either, Cisco have developed their own virtual assistance called Monica. I’d never heard of her except for the girl from Friends, but it’s been a few years since I touched a corporate telepresence system. Restricted to the office right now, Cisco has plans to increase Monica’s usefulness. It could be handy to say ‘Monica, find me the PowerPoint that Jo presentation last Thursday’. Back at its core business, Cisco has also been smart with AI company acquisitions, snapping up Cognitive Security who use AI techniques to detect advanced cyber threats.

 

IBM: The granddaddy of AI, IBM’s Watson super computer has grown up a little from winning games of Jeopardy. At CES 2016, IBM CEO Ginny Rometty unveiled strategic partnerships with sportswear maker Under Armour, Softbank Robotics’ Pepper and more.  IBM’s Cognitive Business Solutions unit is banking on AI as the future of business, with smarts like “When people ask how Watson is different than a search engine, I tell them to go on Google and type 'anything that's not an elephant.' What do you get? Tons of pictures of elephants. But Watson knows those subtle differences. It understands that when feet and noses run, those are very different things”

 

Elon Musk: To put this paragraph under just the title of Tesla would do the man a disservice. Musk and his business partners formed the OpenAI research company to stop AI from ruining the world.  

 

There you have it. The tech giants are determined to make this happen, whether we like it or not...

 

-SCuffy

Back home after a week in Redmond for the annual Microsoft MVP Summit. It was weird being there for the days before and after the election. In a way I felt as if we were in a bubble, focused on the MVP sessions all day long and isolated from what was happening elsewhere.

 

It was awesome.

 

Here's a bunch of links I found on the Intertubz that you may find interesting, enjoy!

 

Smart Light Bulb Worm Hops from Lamp to Lamp

As if I needed yet another thing to worry about, now light bulbs may be attacking me. I'm starting to think wearing a hat made from aluminum foil might be a smart fashion choice.

 

How to avoid becoming a part of a DDoS attack?

I suppose I could go live in a shack in Montana, but this list might be worth trying first.

 

More with SQL Server on Linux

Words cannot express how excited I am for SQL Server on Linux. Well, maybe these words: sudo yum install -y mssql-server

 

Employees are faster and more creative when solving other people's problems

Well, this possibly explains why some people feel the need to act as if they are the smartest person in the room, but I'm still going to just think they are being a jerk.

 

Because Every Country Is the Best at Something

How is it that Brazil is not the best at Brazil nuts?

 

The Best Ways to Get Rid of Unwanted Data

Setting aside the philosophical discussion about how information can neither be created nor destroyed, these are good tips for those of us that sometimes have the need to make an effort to destroy data.

 

By 2020, 92 Percent of All Data Center Traffic Will be Cloud

It's rare for me to find someone these days that still deny the Cloud is a thing, and I suspect that such perceptions will all but disappear by 2020.

 

Found outside my hotel room last Wednesday morning and my first thought was "is this where the Hadoop sessions are taking place?":

gop-hadoop - 1.jpg

The View From Above

Being a CEO is not the easy job some people think it is. As a CEO, I'm pulled in multiple directions and have to do my best to balance the needs of the business with the needs of the shareholders, deal with crises as they arise, and reassure both our investors and our employees that the company is strong and has a positive future. All of this gets a bit tricky when our websites -- where we make 60% of our revenue -- keep going down, causing us to lose sales. For the technical teams, the problem ends when the website starts working again, but for me, the ripples from each outage keep spreading for months by way of missed revenue targets, the impact on our supply chain as our order volume fluctuates, and the requests for interviews from analysts who are concerned that we will not make the numbers we anticipated if our customers can't buy our goods. One big outage can make my life a misery for weeks on end, so it's not surprising, perhaps, that I am less than impressed with our network, or the NOTwork as I have come to know it over the last six months. My home network stays up for months on end without interruption, so with all the money we spend on equipment and employees, I'd hope we could do the same. Apparently I'm wrong. If we don't fix this soon, I'm just going to instruct the CTO to move everything to the cloud, and we'll dump those useless network idiots. I fired the Senior Network Manager last month, and not a moment too soon. I only hope his replacement is better than he was. I'd like to get a good night's sleep once in a while, without worrying about whether the stock price will be plunging tomorrow.

The View From the Trenches

My first few weeks as the Senior Network Manager have been, well, challenging to say the least. My predecessor, Paul, was fired after a shouting match between he and the CEO because of yet another major outage that was being blamed on the network. After our websites had been down for two hours, James (our CEO) stormed down and practically dragged Paul into a conference room and slammed the door behind him. So while I wasn't actually in the room when the showdown occurred, that didn't make much difference. Even through the closed door there was no mistaking who the CEO felt was responsible, despite Paul's protestations to the contrary.

Three weeks later, we're still not entirely sure how that outage started, and worse, we have no idea how it finally ended. Of course, it may be that somebody actually does know, but doesn't want to admit being the culprit, especially after hearing James losing his mind in that room. The next day, Paul didn't come to work and I was called into my VP's office where I was given the news that I was being promoted to his position, effective immediately. Did you ever get a gift you weren't sure you really wanted? Yeah. That.

So, now that you're caught up on how I got into this mess, I need to get back to figuring out what seems to be wrong with the network. I always thought Paul had his finger on the pulse of the network, but once I started spending more time looking at the network management systems, I began to wonder how he figured anything out. It seems that our availability monitoring was being accomplished by a folder full of home-grown perl and shell scripts which pinged the network equipment in the data center and would send an email to Paul when a device became unavailable. I mean, that sort of worked, but the scripts weren't logging anything, so there was no historical data we could use to calculate uptime. Plus, the ping response could take up to a second to respond before it would time out and be considered a failure, so even if the network or device performance was completely terrible, nobody would have known about it. What I realized is that when Paul was proudly telling the board that the network had “four-nines uptime," he must have been pulling that figure out of the air. I can't believe he got away with it for so long. He might have been right, but neither he nor I could prove it, and I refuse to lie about it now that my neck is on the line.

The first order of business, then, was to get some proper network management in place. I didn't inherit a huge budget, and I was in a hurry, so I used my corporate Amex to grab a copy of Solarwinds Network Performance Monitor. At least now I'm gathering some real data to work with and if (when!) the next outage occurs, maybe I'll see something happening that will give me a clue about what's going on. The executive team has finally put a woman in charge of the network, and I'm going to show them just what I'm capable of.

 

>>> Continue reading this story in Part 2

Transitioning portions of IT applications and services into the cloud is something that’s starting to become a part of daily life for IT professionals across government agencies. But it’s still a big departure from the traditional implementations.

 

Hybrid Cloud Scenarios

 

There are three general scenarios that will involve cloud-based and on-premises apps and services.

 

The first is an architected solution that uses components of on-premises data centers as well as cloud service providers. One example is application servers in the cloud while the backend database servers reside on-premises. This type of hybrid approach allows agencies to govern sensitive data in their data centers. Applications, however, need to be highly responsive to the number of potential clients connected through any given methodology, and cloud implementations can provide that type of flexibility and availability.

 

The second is a redundant implementation where an application is available in both an on-premises data center and in the cloud. This approach is good for disaster recovery and also enhances an agency’s ability to quickly pivot support for remote and mobile users. Data replication is a key consideration in this scenario, but replicating data among several distributed databases is geek chic these days.

 

The third is about scalability. An agency may have an on-premises application that performs just fine 95% of the time, but that other 5% of the time it finds itself resource constrained due to peak loads or other load bearing factors. The challenge for the agency, though, is that the 5% doesn’t justify the investment in additional on-premises hardware that will actually sit idle the other 95% of the time. Cloud is the ideal solution to this problem, as it allows agencies to rapidly ramp-up resources to accommodate periods of peak demand, and pay for what is actually used during that 5% of the time. The ability to quickly scale on-demand is a key benefit of cloud.

 

The key to successfully working through any one of these scenarios is in understanding the requirements of each approach and planning ahead, both logistically (IT Operations) and financially (Business Operations), to be able to deliver those requirements.

 

Key Strategies for Implementation

 

Regardless of which scenario fits your agency’s needs, you’ll want to partner with a cloud service provider that has achieved their Federal Risk and Authorization Management Program (FedRAMP) certification.

 

Scenario one is highly dependent on the data path from client to application server, as well as from application server to data server. The first key objective here is ensuring that dedicated bandwidth exists between the cloud and the on-premises data center.

 

Scenario two also has a dependency on the data path between the cloud and the on-premises data center. The key here is how much delay in the data replication between the two parallel environments can be tolerated. Important to the success of this implementation will be active monitoring of the data replication activities, and a proven contingency plan for when data replication is disrupted beyond acceptable tolerances.

 

Scenario three is the most complicated to implement, and engaging with a qualified service provider who has experience in implementing an on-demand, scale-out type of environment is a good idea. The key aspect of this implementation is the transparency of inbound connections being rolled over, or rerouted, from the primary site in the on-premises data center to the supplemental resources in the cloud.

 

In Conclusion: Plan, Plan, Plan!

 

Regardless of which of these scenarios you might be contemplating, or possibly even a scenario not discussed here, it’s absolutely critical to have a plan, and involve both technology as well as business stakeholders in the development of that plan. Identify and analyze all contingencies, and define performance and QoS expectations for every aspect of the environment.

 

Find the full article on Federal Technology Insider.

1610_thwack_writing-challenge_W900H300.jpg

For the past five years, I’ve engaged in a yearly, month-long blogging challenge as a way of preparing for the upcoming new year. This challenge – write a blog entry whose theme is prompted by a new word each day – was decidedly introspective and somewhat spiritural in nature. It spurred me to consider themes of renewal, self-assessment, and refocusing in preparation for the coming year.

 

Last year I decided to take it up a notch. I wrote a daily blog post specifically with IT and technology in mind. You can read the series of posts here: http://www.adatosystems.com/tag/elul/

 

Several of those posts were graciously reviewed by our editorial team, and appeared on Geek Speak:

 

This year, I thought I'd invite the entire THWACK community to participate (and earn some points in the process!) The value of these types of challenges (as long as they don’t distract from our other work), is that the practice of writing begets more writing, and these short, free-form, introspective essays often give rise to ideas that can lead to real improvement in our environment, career, and even life.

 

So here's how it will work: Beginning December 1, we're going to post an "official" essay for each daily word (you can find them over here: Word-A-Day Challenge 2016 ). You'll hear from me, the other Head Geeks, and other SolarWinds luminaries.

 

You are then invited to post your entry in the comments for that word. It can be an essay, a poem, or even a picture. As a reward for your efforts, we'll gift you with 200 THWACK points per entry. That means you have a chance to claim 6,000 points if you participate for the entire month!

 

To give you a leg up, I'm publishing the word list NOW, so that you can get writing, curating, and creating. That way, when December 1 arrives, you will be able to claim all that wonderful THWACK point goodness.

 

We also hope this challenge will help us all build meaningful connections to other members of our community, and build insight into what makes each of us tick. It can't be ALL about our rabid love of SolarWinds and desire for THWACK swag, can it?

 

The daily word prompts are:

Day 1: Learn

Day 2: Act

Day 3: Search

Day 4: Understand

Day 5: Accept

Day 6: Believe

Day 7: Choose

Day 8: Hear

Day 9: Observe

Day 10: Count

Day 11: Trust

Day 12: Forgive

Day 13: Remember

Day 14: Rest

Day 15: Change

Day 16: Pray

Day 17: Awaken

Day 18: Ask

Day 19: Judge

Day 20: Fulfill

Day 21: Love

Day 22: End

Day 23: Begin

Day 24: Hope

Day 25: Intend

Day 26: Create

Day 27: Bless

Day 28: Give

Day 29: Return

Day 30: Celebrate

 

I'm looking forward to seeing what our community has to say and share on these themes. Remember you can find all the essays over here: Word-A-Day Challenge 2016

Over the next few posts, I'm going to explore some newer thinking in network monitoring. We start by designing centralized management stations and remote agents, but is this sufficient? We look at things from a device and interface perspective and establish baselines of how the network should operate. This works, but is something of a one-size-fits-all solution. Where do we look when we want more than this?

 

A Little Bit of SDN

 

Over the last few years, Software Defined Networking (SDN) has generated a lot of buzz. Ask 10 different people what it's all about and you'll likely get 10 different answers, none of which is really incorrect. It's a term that has a definition that tends to exceed usability. Still, SDN maintains mind share because the components people tend to associate with it are desirable. These include, amongst others, centralized management and/or control, programmable configuration management and application-aware networking. This last one is of the most immediate interest for our current topic.

 

The network's performance as it relates to the key applications running on it is immediately relevant to the business. This isn't just the performance of the application across a single device. This is a look at the gestalt of the application's performance across the entire network. This allows  detection of problems and performance where they matter most. Drilling down to the specific devices and interfaces that have impact can come later.

 

Network Tomography

 

Recently, I ran across a new term, or at least it was a new term to me: Network Tomography. This describes the gathering of a network's characteristics from endpoint data. The "tomography" part of the term comes from medical technologies like Magnetic Resonance Imaging (MRI), where the internal characteristics of an object are derived from the outside. Network tomography isn't really tomography, but the term conveys the meaning fairly well. The basic idea is to detect loss or delay over a path by using active probes from the endpoints and recording the results.

 

Monitoring loss and delay over a path is a beginning. Most application performance issues are going to be covered by this approach. We can track performance from many locations in the network and still report the results centrally, giving us a more complete picture of how the business is using the network.

 

Application Metrics

 

If we're going to look at network performance from the applications' view, we'll need a fingerprint of what those applications do. Many pieces of software use a single network access method, such as Hypertext Transfer Protocol (HTTP) and can be tracked easily. Others use many different data access methods and will need more complex definition. Either way, we need to monitor all of the components in order to recognize where the problems lie. Getting these details may be as simple as speaking to the vendor or more complex and requiring some packet analysis. Regardless, if we have problems with one aspect of the applications' communications but not another, the experience is still sub-par and we may not know why.

 

The Whisper in the Wires

 

We've all run into frustrated users claiming that "the network is slow" and have torn our hair out trying to find out the specifics of what that means. Ultimately, technical specifics aside, that is specifically what it means. We're just not used to looking at it that way.

 

What do we need to take this into practice? Smarter agents? Containers or VMs that allow us to either locally or remotely test application performance? How to we automate this? I'll be giving my own perspective over the next few weeks, but would like to hear your thoughts on it in the meantime.

We have long been in a space wherein Virtualization has played a huge role within the data center. Certainly the concept has existed both within the Mainframe world, and on LPAR for quite a bit longer than it has in X86, commodity architecture, but it wasn’t until VMware under Diane Greene, Mendel Rosenblum, Scott Devine, and Edouard Bunion brought that concept to the Intel X86 world that the explosion of capacities in the data center made it mainstream.

 

I can remember doing a POC (Proof of Concept) back in 2004 for a beverage company, wherein we virtualized a file-server in an effort to emphasize the capacities of vMotion when the customer claimed that it simply couldn’t work. We scripted a vMotion to occur every 30 minutes of this file-server. A month later, after we returned to further discuss that the customer re-emphasized their trepidation over the concept of vMotion, at which point we showed them the script logs displaying the roughly 1500 vMotions that had taken place unnoticed over the previous month when they realized the value of the product. So much has been accomplished over the following 12 years. Virtualization has become de-facto. So mainstream, in fact, that the question today is rarely “Should we virtualize, or Should we virtualize first as standard operating procedure?” but “Should we move on from VMware as a platform for virtualization to say, HyperV, Amazon, Azure, or possibly private/public OpenStack?”

 

I’m not going to enter into that religious debate. I can certainly see places wherein all these are valid questions. Again, as I’ve stated before, I stress that you should adequately evaluate all options before making a global decision regarding the platform on which you rest the bulk of your data center services. I will say, however, that these alternative choices have been making huge strides towards parity on many fronts of the virtualization paradigm. Some gaps do exist, and possibly always will. I won’t express those functional distinctions, rather impress on the customer to make educated choices, but I will say that if your decision to go one way or the other is based on money, then you’re likely not looking at the full picture. What may cost less initially, may come with unanticipated costs that go far beyond those that are immediately obvious. Caveat Emptor, right?

 

Needless to say, Virtualization is here. But what will happen, where will the new hot trends come from, and how is it changing? I have no crystal ball, nor are my tea leaves particularly legible or dependable for telling the future. What I can say, though, as I’ve said before, the decisions made today have implications toward the future. Should you choose a platform that doesn’t embrace the goals of the future, you may find yourself requiring a fork-lift upgrade not too far down the road.

 

It is clear that the API is the key to integrations with openstack. If you choose a closed platform, then your lock-in will be substantial. If you don’t evaluate pieces like object storage, API’s, container integration, security roadmaps, etc., you’ll be making choices in a vacuum. I truly don’t recommend it.

 

I cannot stress enough how existing staffing requirements including training can enter into the budgetary decision making process. Please understand, for example, that OpenStack as a decision, should not be made due to cost. Training, and support must be part of the decision making process.

Electronic Protected Health Information (ePHI) must flow from its source to many different recipients to efficiently support the mission of providing quality heath care. However, HIPAA’s data privacy and security standards limit the manner in which ePHI can be transmitted. With HIPAA now in full force - having expanded its reach officially to business associates after the Final Omnibus Rule - and Phase 2 audits currently in progress, now is the perfect time to review the controls and policies you have set in place to protect ePHI in transmission.

 

In many cases, transferring ePHI is embedded into the workflow with sender/receiver applications that directly connect via secure API. In this scenario the authorization model is sometimes built into the software, and sometimes manually managed and security and privacy are embedded. But what happens in the situation where ePHI does not have a pre-existing defined secure electronic delivery mechanism?  With some simple controls on the sender and receiver sides, Managed File Transfer (MFT) offers a means to create a flexible, embedded, or adhoc, HIPAA-safe, data transfer mechanism.

 

The benefits of Managed File Transfer for ePHI are many, including:

  1. Hosted or on-premises capabilities
  2. Secure point-to-point transfer between Covered Entities (CEs) and Business Associates (BAs)
  3. Uses standard internet protocols that even small CEs and BAs can support with limited IT staff
  4. On demand, ad hoc, secure data exchange when unexpected data transfer needs arise

 

If you find your organization needs to use MFT for HIPAA, how do you know if the security of the system meets the requirements for transferring ePHI? Unlike the payment card industry, the HIPAA security guidelines are not proscriptive. They are derived from the HIPAA Security Rule, which was promulgated in its final form on March 26, 2013[1]. Fortunately, as the Security Rule has been put into practice, additional clarifications and guidelines have been made available from various sources and Health and Human Services (HHS) shares them via their website.

 

With respect to file transfer in particular, HHS.gov points to guidelines developed by the Federal Trade Commission (FTC) under the FTC’s authority over consumer data privacy. The FTC guidelines for peer-to-peer transfer mechanisms[2], which are adaptable and relevant to managed file transfer of ePHI, include:

 

  1. Restrict the locations to which work files containing sensitive information can be saved or copied. For example, you can create designated, well-defended network servers to house these files, or use a file management program. These kinds of tools and techniques isolate sensitive information and may limit the extent to which peer-to-peer file sharing programs need to be banned.
  2. If possible, use application-level encryption to protect the information in your files. This type of encryption can help protect files that are shared inadvertently on peer-to-peer networks. If you use encryption, keep the passwords and encryption keys safe. Make sure they are not available in drives or folders designated for sharing.
  3. Use file naming conventions that are less likely to disclose the types of information a file contains. For example, it’s easy to spot terms like “ssn,” “tax,” or “medical” within a filename.
  4. Monitor peer-to-peer networks for sensitive information, either directly or by using a 3rd-party service provider. Because search terms can be viewed by others on peer-to-peer networks, be careful about the terms you use. Some search terms (such as those that include “ssn”) may increase the risk to sensitive information, while others (such as company or product names) likely will not.

 

Recall that the HIPAA Security Rule divides safeguards into administrative, physical, and technical controls. Each of the above recommendations should be mapped to the HIPAA safeguards. The following is a recommendation.

 

Administrative: § 164.308(a)(7)(ii)(E) Application Data Criticality Analysis 

Include your MFT process in this part of your administrative controls analysis.

 

Physical: Device and Media Controls § 164.310(d)(1)

As an integral part of your ePHI data transmission process, care should be taken with any media components that may fail or need to be recycled. HIPAA has very strict data destruction guidelines[3] pointing to NIST- SP 800-88[4]. If you are not an expert in data destruction techniques, have your policy point to a certified data destruction provider.

 

Finally, the relevant technical controls are summarized in this table.

 

Control

Technical                                                    

Restrict Locations

Access Control§ 164.312(a)(1)

Servers should be configured according to policy with correct authentication, authorization, and security.

Application

Level Encryption

§Transmission Security 164.312(e)(1)

Encryption & Decryption § 164.312(a)(2)(iv)

 

Implement encryption standards according to your written policy applying a well-vetted standard, consider NIST 800-52 or OWASP guidelines.

File Naming Convention

Integrity Controls§ 164.312(e)(2)(i)

Implement written standards to assure file naming conventions which will assist in meeting integrity controls.

Monitor Peer-to-peer

Audit Controls § 164.312(b)

Implement continuous monitoring to ensure that no unencrypted data is transmitted. Use caution when storing audit log data to avoid accidental disclosure though gathering and transmission of audit logs.

 

If the above seems daunting, consider a packaged Managed File Transfer solution rather than building your own. As a solution, MFT is easily deployable and can meet your HIPAA security and compliance needs by following the above guidelines. A well-designed MFT solution will include built-in capabilities to address the HIPAA security rule, including:

 

  1. Providing a secure, contained environment for ePHI sharing that offers access controls for authentication, authorization, and built-in security of the MFT that meet or exceed HIPAA guidelines.
  2. Built-in certified encryption cipher suites which meet or exceed NIST 800-52 guidelines.
  3. Audit and logging mechanisms that can be used to help ensure proper file naming conventions and confidentiality of the ePHI processed through the MFT.

 

As with any decision involving HIPAA security, choosing an MFT should include input and requirements from your compliance representative, the IT and security administrators, and the business owner.

 


[1] https://www.gpo.gov/fdsys/pkg/FR-2013-01-25/pdf/2013-01073.pdf

[2] https://www.ftc.gov/tips-advice/business-center/guidance/peer-peer-file-sharing-guide-business

[3] http://www.hhs.gov/hipaa/for-professionals/breach-notification/guidance/index.html

[4] http://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-88r1.pdf

That’s a good question, what do self-driving vehicles have to do with our infrastructure? Is it that it’s untested, untrusted and unproven and could result in death and fear mongering? That’s certainly a true enough statement, though what is the key difference in the distinction of ‘autonomous’ vs merely self-driving vehicles?

Screen Shot 2016-11-09 at 10.36.24 PM.png

Screen Shot 2016-11-09 at 10.33.07 PM.png

 

 

Telemetry is the watch word

 

Today’s networks and systems are a huge ball of data, big data, helpful, insightful, useless and befuddling endless piles of information.   Left up to their own devices that information lives in its respective bubble waiting for us to ‘discover’ a problem and then start peeling back the covers to figure out what is going on.   The Autonomous Self-Driving example vs simply ‘self-driving’ is that you’re using data from many continuous and constant streams, using that data to correlate events and understand conditions.   In its primitive state it can be fairly effective, in a networked sense; Imagine every vehicle on the road communicating with each other, constantly panning and analyzing everything in front of, behind you and here, there and everywhere.   Compound that data with collected information from external sources such as road sensors, lights and other conditions and you have the power of having traffic management be automated (Slap weather stations into each of these vehicles and we get closer to predicting even more accurate weather patterns)

 

But hey, whoa, What about my network? My systems?!

 

More and more we’re continuing to see solutions which are evolved far beyond simply a point solution. SIEMs don’t just collect security and event information in a bubble. Syslogs aren’t just an endless repository of arbitrary strings of ‘event’ information.  SNMP need not live caught in its own trap.

 

There are tools, solutions, frameworks, and suites of tools which aim to bring your NOC and SOC into the future, a future wholly unknown.   There is no true panacea to tie everything together and be the end-all-be-all solution, though as time goes on evolutions and consolidations of products have been starting to make that possible.   There was a time when I ran a massive enterprise we would have ‘point’ tools, which do an amazing job of keeping up on THEIR data and telemetry though they were independent and not even remotely interdependent. Monitoring VMware with vCOPS, Monitoring the network with Orion and NPM, collecting some event data with ArcSight, while separately collecting Syslog information with Kiwi Syslog server, and yet SNMP traps would flow into SNMPc, oh and lets not forget monitoring Microsoft… That’s where System Center came in. 

 

On the one hand that may seem like an excessive amount of overkill, yet each ‘product’ covered and fulfilled its purpose, doing 80% of what it did well, yet in the remaining 20% unable to cover the rest of the spread. (Slight disclaimer, there were some 50+ more tools, those were just the ‘big’ ones that we’ve all likely heard of J)

 

So each of these solutions as they evolve or other products in the industry continue to evolve they’re taking what has effectively been the ‘cruise control’ button in our cars or even slightly better than cruise control and building the ability to provide real data, real analytics, real telemetry so that the network and our systems can work for us and with us, vs being little unique snowflakes that we need to care and feed for and figure out when things go wrong.

 

So what have you been using or looking at to help drive the next generation of infrastructure systems telemetry?   Are you running any Network Packet Brokers, Sophisticated ‘more than SIEM’ like products, or Solarwinds suites to tie many things together, Or has anyone looked at Intel’s open sourced Open Telemetry Framework, SNAP?

 

Please share your experiences!

For years hospitals have been using IP-enabled carts to track the location of expensive medical equipment. For years manufacturing facilities have deployed large numbers of IP-enabled handheld scanners. And for years utility companies have been converting water meters and electric meters to IP-based platforms.  Most of these devices are part of some corporate network, but today the Internet of Things typically refers to the myriad of IP-enabled personal devices scattered throughout our homes and strapped to our bodies.

 

These devices are typically inexpensive, disposable, and seemingly innocent. But remember that the home typically isn’t a corporate network with professional security safeguards. Even the best network administrators and information security officers struggle with locking down their corporate networks, so how can the average non-technical person protect themselves, their personal information, and even their very safety in this world of ubiquitous and continual connectivity?

 

Security is the emerging concern for the Internet of Things, and we as technology professionals need to build an awareness of these issues with our non-technical friends.

 

The most common issues include:

 

1)  transmission of unencrypted data over the public internet

2)  access to device management interfaces that have minimal security mechanisms

3)  nothing in place to update and patch software and firmware

 

Whether it’s a thermostat you control from an app, a baby video-monitor you can stream on your computer, a pacemaker you can monitor from a website, or a residential front door you can unlock with your smartphone, the latest and most popular IoT devices impact us in the very personal ways. These devices have very few, if any, control systems, and they typically use easy-to-use interfaces that aren’t necessarily secure.

 

Sure, the details vary device by device and manufacturer by manufacturer, but these seem to be the most common themes. I don’t think anyone is overtly against securing their home networks and individual devices, but there isn’t much awareness among the non-technical population of how vulnerable these types of devices truly are.

 

First, we in the technology industry know right away that opening port 80 inbound to your baby monitor stream is bad news, but that’s how many of these devices have been designed. Manufacturers of IoT devices haven’t put in the time and effort to secure their IP-enabled products to provide security out of the box. IoT devices often receive and send data over the public internet using unencrypted and therefore completely insecure channels.

 

In a corporate network, data can be easily segregated and encrypted so that devices using HTTP and not HTTPS, for example, are surrounded with security boundaries to protect the rest of the network, and teleworkers typically use an encrypted remote access VPN solution. In home networks, there is normally no overall network security strategy to accommodate for unencrypted traffic containing personal information going to the public internet. For the home network, an easy solution might be to use a trusted VPN proxy service.

 

Next, many of these devices have minimal authentication mechanisms to control access to a device. Perhaps this is an effort to remove burdensome security controls from the end-user experience, or maybe it’s in order to reduce the cost of the product. In any case, access to common IoT devices is often controlled by a simple password, and sometimes, in worst-case scenarios, there is no authentication at all.

 

This vulnerability should be top of mind for many of us following technology news considering the recent denial of service attack on DynDNS using millions (perhaps tens of millions) of infected IoT devices from around the world. 

 

It’s true that increasing password length, changing them frequently, and using two-factor authentication are all added layers of work for an end-user of an IoT device, but this is likely the easiest way to add security to otherwise insecure devices.

 

Lastly, manufacturers should be providing the means to upgrade software and firmware from time to time in order to combat new security vulnerabilities. Ultimately, I believe this is something consumers have to demand, so we need to influence manufacturers to provide patches with their products along with step-by-step instructions for how to apply them. This is part of any decent corporate security program, and it should now be part of our personal security programs as well.

 

A huge diversity of IP-enabled devices on a corporate network isn’t anything new, but their proliferation in the home and strapped to our bodies is. As technology professionals, we need to build an awareness of these security issues with our non-technical friends. Some solutions are easy and relatively painless to implement, but I also believe that over time, this growing awareness will also influence manufacturers to change their designs. 

Solarwinds products have consistently excelled in their ability to report meaningful data quickly in order to identify and remediate IT issues. SolarWinds Virtualization Manager (VMAN) does this by having the ability to create custom alerts and scripts in conjunction with 3rd party performance software. SolarWinds and Infinio have created a technology partnership to leverage these abilities.  VMAN has out of the box functionalities to alert on high VM latency or IOPS usage that can kick off a custom Infinio script to improve performance.   Solarwinds and Infinio will be presenting  a joint webcast this afternoon to go over how Infinio was quickly able to setup and integrate with Virtualization Manager in a single afternoon. Later I will add the script that VMAN used to kick off Ininfio so you can have a template to start with.

 

https://attendee.gotowebinar.com/register/2537960187701946628

 

Jared

Now that Microsoft Windows server 2016 is generally available as of October 12, 2016, everyone is ready to upgrade their servers right? With every new Operating system release there is obviously going to be a list of new features and improvements. However, are these worth it to be bleeding edge and upgrade before the next service pack? Let’s take a quick look.

 

New and Improved

What’s new and probably a hot item is the support of containers. No, not your mom’s Tupperware but containers like Docker. With support for Docker Microsoft is playing nice with Open source which it would normally considers its competitor. This will help the big giant if they want to be a key player in the public cloud space and compete with the Amazon and Google. If you want to know how to get started with Docker on Windows Server 2016 check out Melissa Palmer’s post http://24x7itconnection.com/2016/10/11/getting-started-docker-container-platform-windows-2016/ .

Nano Sever is another hot item. It’s the headless server that is slimmer, faster and better than the tradition windows server. Think of it like server core mode but a lot better. You can install Nano from the Datacenter or Standard edition of windows 2016. It’s ideal for running as a compute host Hyper-V virtual machines.

Windows Server 2016 also introduces Host Guardian and shield vms. This “shields’ virtual machines and protects the data on them from unauthorized access. You can even lock out your Hyper-V admin. There are also improvements for Active directory certificate services, Active directory domain services, Active directory federation services, and Web application proxy to name a few. 

 

Are you Bleeding edge?

The new features sound great don’t they? But are you really going to jump on it right away? One could say I’m not using Windows server for containers or a Hyper-V host so I should be ok. However, if you're on the bleeding edge and want to start updating your servers you should know that they are a few things that have removed from server 2016.

The share and storage management not been in the MMC is no longer available in Windows Server 2016. So if you're thinking about managing servers with an older OS through Windows Server 2016 you won't be able too. You will need to logon locally to that older server use the snap in locally from that server.

Another change is the security configuration wizard has also been removed. This has been replaced by turning on all features by default. If you to manage those features, you can only do it through policy or the Microsoft security compliance manager tool.

 

Since this has only been generally available for about a month all the bugs and gotchas are starting to come out. Last week it was announced that there issues with running Exchange 2016 Cu3 on Windows Server 2016. If you're thinking about running Exchange 2016 on the latest version well you shouldn’t. There are known issues with the IIS host process and Microsoft says there's no workaround at this time so save yourself a headache and stick with windows server 2012.

 

So my advice is test, test, test and test before you go into production if you are going to upgrade to Windows Server 2016. It’s still very new out and over the new few weeks I am sure we will hear more rumblings of gotchas. 

scuff

Hack your life with bots

Posted by scuff Nov 9, 2016

Having recently dropped Cortana for Siri (I’m still mourning that, but I have apps!), I must admit I haven’t integrated either into my life as a habit. I’m a writer at heart, so it still feels more natural for me to type and tap rather than talk to get things done.  The next generation have never known a phone without a voice activated assistant though. To them, it’s not a bot, it’s a productivity tool.

 

The voiceless bots have their place though. Chatbots on web help us find information while feeling like there’s a human replying back. Services like swole.me purely process data and plan meals without us having to make all the decisions.

 

With Amazon’s Alexa and Google Home, our voice activated servants are growing. Except in Australia where you can’t buy either. You can get creative acquiring one though and setting Alexa to a location of Guam, which shares the same time zone.

On their own, these bots are glorified searchers and data entry units – “What time is the game on TV?” “Set a reminder for tomorrow to send James a birthday card.”

 

Hook them up to connected things in your home and you are entering home automation territory. Combine a Raspberry Pi with Siri for an Apple-esque voice controlled home.

Use Alexa to dim your Philips Hue lights or turn on your Belkin WeMo switch. She will also place your Amazon orders (again, outside of Australia) unless you disable that for the safety of your credit card balance. You can also add Alexa to your If This Then That recipes for access to other online services.

 

As consumer Cloud services have grown, people are more driven by functionality than by brand loyalty. It doesn’t cost much (sometime it’s even free) to have multiple different service subscriptions, so you’ll find people using Google Calendar and Dropbox instead of Google Drive. Connectivity of these services is another battleground, with bots like IFTTT and Zapier automating routine tasks between disconnected services or even automating tasks in the same product set.

 

Microsoft recently entered the market, well part of it anyway, with Flow. I say part of it because Microsoft’s small list of connective services are heavily weighted towards business & Enterprise use and less at personal apps. I’m watching to see if that changes. Microsoft has extended this type of service with advanced conditions in its connectors and the ability to add gateways to on-premises data.

 

Behind all of this, the bots are collecting and processing data. We are giving them information to feed off, that they will hopefully keep private and only use to improve their services (and sometimes their recommendations). Even in our personal lives, we’re connected to the Big Data in the Cloud. Don’t think about that for too long or we’ll get into Skynet territory. That’s another article coming soon.

 

But it is another driver behind Digital Transformation and what companies are now dealing with. All of these electronic records of usage and purchases are driving how companies create, refine and supply products and services to their customers. All of that has an impact on ultimately what the I.T. requirements are for an organisation.

 

Do the bots concern you or have you automated your life outside of the office? Let me know in the comments.

 

-SCuffy

In Redmond this week for the annual Microsoft MVP Summit. I enjoy spending time learning new things, meeting fellow MVPs, and exchanging ideas directly with the data platform product teams. Being able to provide some influence on future product direction, as well as features, is the highlight of my career right now. It's a wonderful feeling to see your suggestions reflected in the products and services over time.

 

Anyway, here's a bunch of links I found on the Intertubz that you might find interesting, enjoy!

 

Banks Look Up to the Cloud as Computer Security Concerns Recede

For those of you still holding out hope that the Cloud isn't a real thing to be taken seriously, this is the asteroid coming to wipe you and the rest of the dinosaurs out.

 

Minecraft: Education Edition officially launches

Looks like we've found a way to get kids to stop playing Minecraft: we've turned it into work.

 

Dongle Me This: Why Am I Still Buying Apple Products?

Because they are simple and (usually) just work. But yeah, the dongles are piling up in this house as well.

 

Two Centuries of Population, Animated

You know I love data visualizations, right? Well, here's another.

 

Practical Frameworks for Beating Burnout

Long but worth the read, and do your best to consume the message which is "do less, but do it well".

 

There is no technology industry

It's never about the tech, it's *always* about the data.

 

This ransomware is now one of the three most common malware threats

Excuse me for a minute while I take backups of everything I own and store them in three different locations.

 

Next week I am expecting to see a lot of screens like this one, installing SQL Server on Linux:

sql-linux - 1.jpg

A network monitor uses many mechanisms to get data about the status of its devices and interconnected links. In some cases, the monitoring station and its agents collect data from devices. In others, the devices themselves report information to the station and agents. Our monitoring strategy should use the right means to get the information we need with the least impact to the network's performance. Let's have a look at the tools available to us.

Pull Methods

We'll start with pull methods, where the network monitoring station and agents query devices for relevant information.

SNMP

SNMP, the Simple Network Management Protocol has been at the core of query-based network monitoring since the early 1990s. It began with a simple method for accessing a standard body of information called a Management Information Base or MIB. Most equipment and software vendors have embraced and extended this body to varying degrees of consistency.

Further revisions (SNMPv2c, SNMPv2u and SNMPv3) came along in the 19d early 2000s. These respectively added some bulk information retrieval functionality and improved privacy and security.

SNMP information is usually polled from the monitoring station at five-minute intervals. This allows the station to compile trend data for key device metrics. Some of these metrics will be central to device operation: CPU use, free memory, uptime, &c. Others will deal with the connected links: bandwidth usage, link errors and status, queue overflows, &c.

We need to be careful when setting up SNMP queries. Many networking devices don't have a lot of spare processor cycles for handling SNMP queries, so we should minimize the frequency and volume of retrieved information. Otherwise, we risk impact to network performance just by our active monitoring.

Query Scripts

SNMP is an older technology and the information that we can retrieve can be limited. When we need to get information that isn't available through a query, we need to resort to other options. Often, script access to the device's command-line interface (CLI) is the simplest method. Utilities like expect or scripting languages like python or go will allow information to be extracted by filtering CLI output to extract necessary data.

Like SNMP, we need to be careful about taxing the devices we're querying. CLI output is usually much more detailed than an SNMP query and requires more work on the part of the device to produce it.

Push Methods

Push methods are the second group of information gathering techniques. With these, the device is sending the information to the monitoring station or its agents without first being asked.

SNMP

SNMP has a basic push model where the device sends urgent information to the monitoring station and/or agents as events occur. These SNMP traps cover most changes in most categories that we want to know about right away. For the most part, they trigger on fixed occurrences: interface up/down, routing protocol peer connection status, device reboot, &c.

RMON

RMON, Remote Network MONitoring was developed as an extension to SNMP. It puts more focus on the information flowing across the device than on the device itself and is most often used to define conditions under which an SNMP trap should be sent. Where SNMP will send a trap when a given event occurs, RMON can have more specific triggers based on more detailed information. If we're concerned about CPU spikes, for example, we can have RMON send a trap when CPU usage goes up too quickly.

Syslog

Most devices will send a log stream to a remote server for archiving and analysis. By tuning what is sent at the device level, operational details can be relayed to the monitoring station in near real time. The trick is to keep this information filtered at the transmitting device so that only the relevant information is sent.

Device Scripting

Some devices, particularly the Linux-based ones, can run scripts locally and send the output to the monitoring server via syslog or SNMP traps. Cisco devices, for example can use the Tool Command Language (TCL) or Embedded Event Manager (EEM) applets to provide this function.

Your Angle

Which technologies are you considering for your network monitoring strategies? Are you sticking with the old tried and true methods and nothing more, or are you giving some thought to something new and exciting?

Evolution is not just for science conversations; it is a critical aspect of effective IT management. As federal technology environments become more complex, the processes and practices used to monitor those environments must evolve to stay ahead of -- and mitigate -- potential risks and challenges.

 

Network monitoring is one of the core IT management processes that demands growth in order to be effective. In fact, there are five characteristics of advanced network monitoring that signal a forward-looking, sophisticated solution:

 

  1. Dependency-aware network monitoring
  2. Intelligent alerting systems
  3. Capacity forecasting
  4. Dynamic network mapping
  5. Application-aware network performance

 

If you’ve implemented all of these, you have a highly evolved network.  If you have not, it might be time to start thinking about catching up.

 

1. Dependency-aware network monitoring

 

A sophisticated network monitoring system provides all dependency information: not only which devices are connected to each other, but also network topology, device dependencies and routing protocols. This type of solution then takes that dependency information and builds a theoretical picture of the health of your agency’s network to help you effectively prioritize network alerts.

 

2. Intelligent alerting system

 

The key to implementing an advanced network monitoring solution is having an intelligent alerting system that triggers alerts on deviation from normal performance based on dynamic baselines calculated from historical data. And an alerting system that understands the dependencies among devices can significantly reduce the number of alerts being escalated. This supports “tuned” alerts so that admins get only one ticket when there is a storm of similar events.

 

3. Capacity forecasting

 

An agencywide view of utilization for key metrics, including bandwidth, disk space, CPU and RAM, plays two very important roles in capacity forecasting:

 

     1. No surprises. You must know what’s “normal” at your agency to understand when things are not normal. You can see trends over time and can be prepared in advance for changes that are happening on your network.

 

     2. Having the ability to forecast capacity requirements months in advance will give you the opportunity to initiate the procurement process in advance of when the capacity is needed.

 

4. Dynamic network mapping

 

Dynamic network mapping allows you to display the data on how devices are connected on your network on a single screen, with interactive, dynamic maps that can display link utilization, device performance metrics, and automated geolocation. This way, you can see how everything is connected and find the source of the slowdown.

 

5. Application-aware network performance

 

Application-aware network performance monitoring collects information on individual applications as well as network data and correlates the two to determine what is causing an issue. You can see if it is the application itself causing the issue or if there is a problem on the network.

 

Evolving your network monitoring solution will help keep you ahead of the technology curve and help meet budget challenges by providing more in-depth information to ensure your monitoring is proactive and strategic.

 

Find the full article on Government Computer News.

Amongst the key trending technologies moving forward in the enterprise and data center space is that of the virtualization of the network layer. Seems a little ephemeral in concept, right? So, I’ll explain my experience with it, its benefits, and limitations.

 

First, what is it?

NFV (Network Functions Virtualization) is intended to ease the requirements placed on physical switch layers. Essentially, the software for the switch environment sits on the servers rather than on the switches themselves. Historically, when implementing a series of physical switches, an engineer must use the language of the switch’s operating system, to create an environment in which traffic goes where it is supposed to, and doesn’t where it shouldn’t. VLans, routing tables, port groups, etc. are all part of these command sets. These operating systems have historically been command line, arcane, and quite often discretely difficult to reproduce. A big issue is that the potential for human error, while not disparaging the skills of the network engineer, can be quite high. It’s also quite time-consuming. But, when it’s right, it simply works.

 

Now, to take that concept, embed the task into software that sits on standardized servers, and can be rolled out to the entire environment in a far more rapid, standardized and consistent manner. In addition to that added efficiency, and consistency, NFV can also reduce the company’s reliance on physical switch ports, which lowers the cost in switch gear, the cost in heating/cooling, and the cost in data center space.

 

In addition to the ease of rolling out new sets of rules, with the added consistency across the entire environment, there comes a new degree of network security. MicroSegmentation is defined as: The process of segmenting a collision domain into various segments. MicroSegmentation is mainly used to enhance the efficiency or security of the network. The MicroSegmentation performed by the switch results in the reduction of collision domains. Only two nodes will be present as a result of the collision domain reduction.

 

So MicroSegmentation, probably the most important function of NFV, doesn’t actually save the company money in a direct sense, but what it does do is allow for the far more controlled aspect of traffic flow management. I happen to think that this security goal, coupled with the most important ability to roll these rules out globally and identically with a few mouse clicks make for a very compelling product stream.

 

One of the big barriers of entry in the category, at the moment, is the cost of the product, and a bit of differing approach in each of the product streams. So Cisco’s ACI, for example, and while it attempts to address similar security and consistency goals has a very different modus operandi than NSX from VMware. Of course, there are some differentiations, but in addition, one of the issues is how would the theoretical merging of both ACI and NSX within the same environment work? As I do understand it, the issues could be quite significant… A translation effort, or API to bridge the gap, so to speak, would be a very good idea.

 

Meanwhile, the ability to isolate traffic, and do it consistently and across a huge environment could prove itself to be quite valuable to enterprises, particularly where compliance, security, and size are issues. I think about multi-tenant data centers, such as service providers, where the data being housed in the data center must be controlled, the networks must be agile, and the changes must take place in an instant are absolutely key for this category of product. However, I also think that for healthcare, higher education, governmental, and other markets, there are big adoption that will take advantage of these technologies.

It has been more than 20 years since HIPAA was enacted on August 21, 1996, and in those two decades it has seen quite a bit of change – especially with regards to its impact on IT professionals. First, with the HITECH Act which established the privacy, security, and breach notification rules in 2003, and then the issuance of the Final Omnibus Rule in 2013. Each advancement served to define the responsibilities of IT organizations to protect the confidentiality, integrity, and availability of electronic protected health information (PHI) – one of the central tenants of HIPAA – and stiffen the consequences for noncompliance. But, despite these changes, HIPAA’s enforcement agency (the Office of Civil Rights or OCR) did not begin to issue monetary fines against covered entities until 2011. And only in recent years, with the announcement of the OCR’s Phase 2 HIPAA Audits, did they begin to set their sights on Business Associates (BAs), those 3rd-party providers of services, which, due to their interaction with ePHI, are now legally required to uphold HIPAA.

 

2016, however, has seen increasing monetary fines, including a $5.55 million settlement by Advocate Health Care[1]. In 2013, the Advocate data breach released information about 4 million users. This breach occurred two years prior to the Anthem breach, disclosed in March 2015, which affected up to 80 million users, making it the largest health care data breach up to that point. Given the time it takes OCR to analyze and respond to data breaches, don’t expect to see any Anthem data breach analysis from OCR in 2016.

 

In the meantime, OCR is implementing their Phase 2 audits. This round of audits delves more deeply into organization policies with business associates, and examines some documents from live workflows. Some organizations have already received notification of their Phase 2 audits, which were sent on July 11, 2016. A total of 167 entities were notified of their opportunity to participate in a “desk audit,” which would examine their HIPAA implementation against the new Phase 2 guidelines. This initial audit will cover 200-250 entities, with most of them being completed via the desk audit process. A few entities will be selected for onsite audit[2].

 

What is a Phase 2 Audit?

First, Phase 2 audits cover both Covered Entities (CEs) and Business Associates. (Recall that the Final Omnibus Rule held CEs and BAs joint and severally liable for compliance). In the pilot phase audits, only Covered Entities were examined. Second, in this phase most of the audits will be completed via a “desk audit” procedure.

 

A desk audit is a nonintrusive audit where the CE or BA receives an information request from OCR. The information is then uploaded via a secure portal to a repository. The auditors will work from the repository to generate their findings.

 

Based on the updated Phase 2 audit guidelines, Phase 2 audits will cover the following areas.

 

Privacy Rule Controls

Uses and Disclosures; Authorizations; Verifications

Group Health Plans

Minimum Necessary Standard; Training

Notices; Personnel designations

Right to Access; Complaints

Security Rule Controls

Assigned Responsibility; Workforce Security

Risk Analysis and Risk Management

Incident Response, Business Continuity and Disaster Recovery

Facility and BA controls

Access Control and Workstation Security

Data Breach

Administrative Requirements

Training; Complaints

Sanctions

Breach; Notice

 

What is different from previous audits is the format of the audit (desk audit vs onsite) and the focus on reviewing not just policies, but worked examples of policy and procedure implementation[3] using actual samples of the forms and contents, as well as the outcomes of the requests contained within the forms. The complete list of all the elements of these initial Phase 2 audits is extensive. You can read the complete list on the HHS website.

 

Phase 2 audits require data or evidence of the results of an exercise of some of the HIPAA policies or processes. From an entity perspective that takes the audit to a practical level.

 

Turning specifically to the audit of security controls, most IT and security pros who have been through an IT risk assessment or an ISO audit will be familiar with the HIPAA audit structure. The approach is traditional; comparing policies to evidence the policy has been correctly implemented.

 

If you have not been through an audit before, don’t panic. Here are some basic rules.

 

  1. Be prepared. Review the evidence you will need to provide. Wherever possible, gather that evidence in a separate data room.
  2. Be respectful. Even if this is a desk audit being completed via portal, provide only the evidence asked for, in a legible format, within the time frame requested.
  3. Be honest. If you don’t have the evidence requested, notate that you don’t have it, yet. If you have implemented the control, but just don’t have the evidence, provide documentation of what you have implemented.
  4. Be consistent. To the extent possible, use the same format for providing evidence. If you need to pull logs, put them into a nicely formatted, searchable spreadsheet.
  5. Be structured. Make it easy for the auditor to find and examine your responses. If you need to provide policies, have them neatly formatted in the same font and structure. It’s especially nice if you are an auditor reading lots of documentation to have good section headings and white space between sections. PDF is the most advisable format, but make it searchable for ease of verification.

 

Recall that the purpose of the Security Rule is to ensure two main things:

  1. That ePHI is correct, backed up, and accessible.
  2. That only appropriate, authorized users and processes have access to ePHI.

 

You can expect that you will need to provide evidence, mostly logs, that cover the controls that ensure the purpose of the Security Rule is being met. For example:

  1. Who is the responsible security officer, and what is their job description?
  2. When was your risk assessment completed?
  3. Have you implemented any remediation or compensating controls identified in your risk assessment?
  4. Can you demonstrate evidence that security violations are being identified and remediated? Expect your incident response procedures to be examined.
  5. Can you demonstrate that the workforce is complying with your security policies and procedures, including security training?
  6. Can you demonstrate that those who need access to ePHI can do so, and that when authorization is revoked (due to change of status, termination, etc.), that the electronic access is changed as well?
  7. What evidence can you show of anti-malware compliance?
  8. Are your cryptographic controls in place and up to date?  [Hint: see our blog post on PCI DSS 3.2 updates for information on SSL/TLS]
  9. Are your disaster recovery and business continuity plans actionable and tested? Include facilities access plans, which should address physical unauthorized access attempts.
  10. Have you implemented all of the access controls and policies required to share data with BAs?
  11. Can you demonstrate compliance with de-identification disposal requirements of electronic media upon change or de-activation?
  12. Don’t forget contracts. Even though you may be responsible for mostly technical controls, the Security Rule does have requirements for your contracts with BAs.

 

As with most compliance schemes, a number of the requirements are well understood standard security best practices. As our panel at THWACKcamp [See the Shields Up Panel Recap] agreed, the old adage, “If you are secure you are probably compliant,” applies to HIPAA, too.

Have you had a recent audit experience you’d like to share?  Please comment on this post. We can all learn from each other’s experiences. Happy auditing!

 


[1] http://www.hhs.gov/about/news/2016/08/04/advocate-health-care-settles-potential-hipaa-penalties-555-million.html#

[2] http://www.hhs.gov/sites/default/files/OCRDeskAuditOpeningMeetingWebinar.pdf

[3] http://www.hhs.gov/hipaa/for-professionals/compliance-enforcement/audit/protocol/

Getting Started

The network monitoring piece of network management can be a frightening proposition. Ensuring that we have the information we need is an important step, but it's only the first of many. There's a lot of information out there and the picking and choosing of it can be an exercise in frustration.

A Story

I remember the first time I installed an intrusion detection system (IDS) on a network. I had the usual expectations of a first-time user. I would begin with shining a spotlight on denizens of the seedier side of the network as they came to my front door. I would observe with an all-seeing eye and revel in my newfound awareness. They would attempt to access my precious network and I would smite their feeble attempts with.... Well, you get the idea.

 

It turns out there was a lot more to it than I expected and I had to reevaluate my position. Awareness without education doesn't help much. My education began when I realized that I had failed to trim down the signatures that the IDS was using. The floodgates had opened, and my logs were filling up with everything that had even a remote possibility of being a security problem. Data was flowing faster than I could make sense of it. I had the information I needed and a whole lot more, but no more understanding of my situation than I had before. I won't even get into how I felt once I considered that this data was only a single device's worth.

 

After a time, I learned to tune things so that I was only watching for the things I was most concerned about. This isn't an unusual scenario when we're just getting started with monitoring. It's our first jump into the pool and we often go straight to the deep end, not realizing how easy it is to get in over our heads. We only realize later on that we need to start with the important bits and work our way up.

The Reality

Most of us are facing the task of monitoring larger interconnected systems. We get data from many sources and attempt to divine meaning out of the deluge. Sometimes the importance of what we're receiving is obvious and relevant. eg. A message with critical priority telling us that a device is almost out of memory. In other cases, the applicability of the information isn't as obvious. It just becomes useful material when we find out about a problem through other channels.

 

That obvious and relevant information is the best place to start. When the network is on the verge of a complete meltdown, those messages are almost always going to show up first. The trick is in getting them in time to do something about them.

Polling

Most network monitoring installations begin with polling devices for data. This may start with pinging the device to make sure it's accessible. Next, comes testing the connections to the services on the device to make sure that none of them have failed. Querying the device's well-being with Simple Network Management Protocol (SNMP) usually accompanies this too. What do these have in common? The management station is asking the network devices, usually at five minute intervals, how things are going. This is essential for collecting data for analysis and getting a picture of how things are going when everything is running. For critical problems, something more is needed.

Alerting

This is where syslog and SNMP traps come into play. This data is actively sent from the monitored devices as events occur. There is no waiting for five minute intervals to find out that the processor has suddenly spiked to 100% or that a critical interface has gone down. The downside is that there is usually a lot more information presented than is immediately necessary. This is the same kind of floodgate scenario I ran into in my earlier example. Configuring syslog to send messages at the "error" level and above is an important sanity-saving measure. SNMP traps are somewhat better for this as they report on actual events instead of every little thing that happens on the device.

The Whisper in the Wires

Ultimately, network monitoring is about two things:

 

  1. Knowing where the problems are before anyone else knows they're there and being able to fix them.

  2. Having all of the trend data to understand where problems are likely to be in the future. This provides the necessary justification to restructure or add capacity before they become a problem.

 

The first of these is the most urgent one. When we're building our monitoring systems, we need to focus on the critical things that will take our networks down first. We don't need to get sidetracked by the pretty pictures and graphs... at least not until that first bit is done. Once that's covered, we can worry about the long view.

 

The first truth of RFC 1925 "The 12 Networking Truths" is that it has to work. If we start with that as our beginning point, we're off to a good start.

The PCI Data Security Standards define the security practices and procedures that govern the systems, services, and networks that interact with cardholder or sensitive payment authentication data. The environment in which cardholder data flows is defined as the cardholder data environment (CDE) and comprises the “people, processes, and technologies that store, process, or transmit cardholder data or sensitive authentication data”. 

 

While some PCI deployments are simple, such as a single Point of Sale terminal directly connected to a merchant authority, other deployments, whether interacting with older systems, or deployments that have store and forward needs, or use cases where an acquirer of cardholder data needs to transmit or share data with another service provider, are more complicated. You may find yourself needing a solution that allows you to transfer cardholder data while maintaining PCI compliance.

 

When you need to move PCI data, whether within the CDE or for further processing outside of the CDE, you can use a managed file transfer (MFT) solution to accomplish this task. In this situation, you need to ensure that the MFT complies with all aspects of PCI DSS.

 

The main requirement governing data transfer is Requirement 4, which states that cardholder data must be encrypted when transmitted across open, public networks. More specifically the encryption implementation must ensure:

 

1. Only trusted keys and certificates are accepted.

2. The protocol in use only supports secure versions and configurations.

3. The encryption strength is appropriate for the encryption methodology in use.

 

For file transfer, the usual transports are either FTP over SSL, which runs the traditional FTP protocol tunneled through an SSL session, or HTTP running over SSL/TLS. Occasionally SSH2 is needed and may be used in situations where it is not possible to set up bi-directional secure transfers, or when only an interim transfer is needed.

 

A properly configured managed file transfer solution will enable users to:

 

1. Automatically transfer cardholder data for further processing

2. Support ad hoc secure transfers

3. Generate onetime use secure transfer links

 

However, care must be taken to adhere to new PCI DSS 3.2 authentication and encryption requirements, as well as to ensure cardholder data is kept only for the time necessary to achieve the legitimate business need. We will address each of the new PCI requirements to ensure you can safely continue to use your managed file transfer solution.

 

Multifactor Authentication

PCI DSS 3.2 clarifies that any administrative, non-console access to the cardholder environment must support multi-factor authentication. This means multiple passwords or passwords plus security questions are no longer valid authentication protocols.

 

For years web application and even SSH access has relied upon simple security questions, or even just user ID and password, to properly identify themselves to systems. Unfortunately as seen in the recent Yahoo data breach disclosure , security questions may be kept in the clear, and such questions are often chosen from a standard list.

 

From a PCI managed file transfer authentication requirements perspective, 3.2 multifactor authentication only impacts user to server initiated transfers, or administrative access to a server located in the cardholder data environment. If you are currently using either of these two scenarios with only password authentication, should plan for migration by February 2018.  You can read more about new PCI authentication requirements the PCI 3.2 changes blog post here:

 

Encryption

The changes to PCI 3.2 regarding encryption are more extensive than the authentication requirements. The most common transport layer encryption used for managed file transfer will depend upon SSL/TLS protocols to deliver the security for data in motion. Early versions of SSL/TLS have known vulnerabilities that make them unsuitable for ongoing use in managed file transfer according to the new PCI standard. Although the 3.2 requirements permit the use of SSL/TLS if properly patched, configured, and managed there is no need to use these older versions of SSL/TLS in a managed file transfer environment as most systems and browsers have been updated to support TLS 1.2 for some time. That said even when configuring your server to accept the TLS 1.2 protocol and above is still the matter of which cipher suites to select. TLS 1.2 supports over 300 cipher suites and not all of them are acceptable for use with cardholder data.

 

PCI DSS 3.2 does not directly specify the cipher suites to use with TLS, leaving the implementer with the following requirement “The encryption strength is appropriate for the encryption methodology in use” . PCI does provide additional guidance and points to NIST publication 800-52, which was last updated in April 2014. However, since that publication date several critical vulnerabilities have been found in the implementations of certain cipher suites used by SSL/TLS and additional vulnerabilities have been found in OpenSSL, which is a commonly used library. These include:

 

- Freak , which forces a downgrade to an exploitable version of RSA

- Drown , which relies upon a server supporting SSLv2 to compromise a client using TLS

- Five critical vulnerabilities in the OpenSSL implementation reported September 16, 2016

 

From NIST 800-52 the following cipher suites for TLS 1.2 servers are recommended:

 

• TLS_RSA_WITH_AES_256_GCM_SHA384

• TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256

• TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256

• TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384

• TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256

• TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256

 

Care must be taken to ensure that null ciphers, and lower grade encryption ciphers are not configured by default, as these ciphers can be used in Man in the Middle Attacks. To mitigate this risk, OWASP  recommends using a whitelist approach, which means either limiting your server to only use certain ciphers, such as those specified above, or if you cannot whitelist your cipher suites, ensuring that you disable weak cipher suites .

 

The cipher suite is not the only cryptographic element of your managed file transfer solution. The SSL/TLS server also needs a private key. The private key should be generated by a known Certificate Authority, in an X.509 or PKI certificate. Furthermore, in order to be PCI compliant your certificate should meet NIST SP 800-57 key management requirements. From a practical perspective OWASP recommends server certificates should:

 

1. Use a key size of at least 2048 bits

2. Not use wild card certificates

3. Not use SHA-1

4. Not use self-signed certificates

5. Use only fully qualified DNS names

 

NIST 800-57 provides detailed guidance on protecting private keys and from a PCI perspective the important elements of key management are:

 

Ensuring the integrity of the private key from:

1. Accidental or intentional reuse, modification, compromise

2. Exceeding the relevant cryptographic period (how long a private is expected to be in use)

3. Incorrect configuration of a private key

 

It may seem like overkill to be so focused on encryption protocols, cipher suites and private keys, however if the private key is compromised, as it was with the Sony PlayStation 3 , your entire system is now vulnerable.

 

Storing Cardholder Data

While there are no changes to requirements around storing cardholder data in PCI 3.2, if you do use managed file transfer you are storing cardholder data. Along with the technical guidelines on storing cardholder data , consider how you are going to mitigate the risk of accidental disclosure by removing any files containing cardholder data as soon as possible after the business use is completed. Having a policy that establishes data retention, secure destruction, and logging the execution of these activities will ensure you maintain PCI compliance.

 

There are other requirements associated with any system or solution that operates under PCI but the new requirements for PCI 3.2 focus on authentication and encryption. By working with your IT staff in advance and detailing your PCI use cases and requirements with a focus on authentication and encryption you can confidently deploy managed file transfer in your PCI environment.

 

Do you use file transfer solutions today?  Are you comfortable with the security they provide for Personally Identifiable Information?

Infrastructure automation is nothing new. We’ve been automating our server environments for years, for example. Automating network devices isn’t necessarily brand new either, but it’s never been nearly as popular as it has been in recent days.

 

Part of the reason network engineers are embracing this new paradigm is because of the potential time-savings that can be realized by scripting common tasks. For example, I recently worked with someone to figure out how to script a new AAA configuration on hundreds of access switches in order to centralize authentication. Imagine having to add those few lines of configuration one switch at a time – especially in a network in which there were several different platforms and several different local usernames and passwords. Now imagine how much time can be saved and typos avoided by automating the process rather than configuring the devices one at a time.

 

That’s the good.

 

However, planning the pseudocode alone became a rabbit hole in which we chased modules on GitHub, snippets from previous scripts, and random links in Google trying to figure out the best way to accommodate all the funny nuances of this customer’s network. In the long run, if this was a very common task, we would have benefited greatly from putting in all the time and effort needed to nail our script down simply because it would then be re-usable and shareable with the rest of the community. However, by the time I checked in again with some more ideas, my friend was already well underway configuring the switches manually simply because it was billable time and he needed to get the job done right away. There’s a balance between diminishing returns and long-term benefits to writing code for certain tasks.

 

That’s the bad.

 

We had some semblance of a script going, however, and after some quick peer review we wanted to use it on the remaining switches. Rather than modify the code to remove the switches my friend already configured, we left it alone because we assumed it wouldn’t hurt to run the script against everything.

 

So we ran the script, and several hundred switches became unreachable on the management network. Nothing went hard down, mind you, but we weren’t able to get into almost the entire access layer. Thankfully this was a single campus of several buildings using a lot of switch stacks, so with the help of the local IT staff, the management access configuration on all the switches was rolled back the hard way in one afternoon. This happened as a result of a couple guys with a bad script. We still don’t really know what happened, but we know that this was a human error issue – not a device issue.

 

That’s the ugly.

 

Network automation seeks to decrease human error, but the process requires skill, careful peer review, and maybe even a small test pool. Otherwise, the blast radius of a failure could be very large and impactful. There is also great automation software out there with easy-to-use interfaces that can enable you to save time without struggling to learn a new programming language.

 

But don’t let that dissuade you from jumping with both feet into learning Python and experimenting with scripting common tasks. In fact, there are even methods for preventing scripting misconfigurations as well. Just remember that along with the good, there can be some bad, and if ignored, that bad could get ugly.

Cloud Dollars.png

 

The Cloud! The Cloud! Take us to the Cloud it’s cheaper than on-premises, why? Because someone in marketing told me so!  No, but seriously. Cloud is a great fit for a lot of organizations, a lot of applications, a lot of a lot of things! But just spitting ‘Cloud’ into the wind doesn’t make it happen, nor does it always make it a good idea.   But hey, I’m not here to put Cloud down (I believe that’s called Fog) nor am I going to tout it unless it’s a good fit.   However, I will share some experiences, and hopefully you’ll share your own because this has been a particular area of interest lately, at least with me but I’m weird about things like deep tech and cost benefit models.

 

The example I’ll share is one which is particularly dear to my heart. It’s dear because It’s about a Domain Controller!   Domain Controllers are for all intents and purposes, machines which typically MUST remain on at all times, yet don’t necessarily require a large amount of resources.  So when you compare a domain controller running On-Premises let’s say as a Virtual Machine in your infrastructure it carries with it an arbitrary cost aggregated and then taken as a percentage of the cost of your Infrastructure, Licensing, allocated resources, and O&M Maintenance cost for Power/HVAC and other.   So how much does a Domain Controller running as a Virtual Machine run inside your data center? If you were not to say, “It Depends” I might be inclined not to believe you, unless you do detailed charge back for your customers.

 

Yet, we’ve stood up that very same virtual machine inside of Azure, let’s say a standard Single Core, Minimal memory A1-Standard instance to act as our Domain Controller.   Microsoft Azure pricing for our purposes was pretty much on the button, coming in at around ~$65 per month.   Which isn’t too bad, I always like to look at 3 years at a minimum for the sustainable life of a VM just to contrast it to the cost of on-premises assets and depreciation.   So while $65 a month sounds pretty sweet, or ~$2340 over three years I have to also consider other costs which I might not normally be looking at.  Egress network bandwidth, Cost of backup (Let’s say I use Azure backup, that adds another $10 a month, so what’s another $360 for this one VM)

 

The cost benefits can absolutely be there if I am under or over a particular threshold, or if my workloads are historically more sporadic and less ‘always-on, always-running’ kind of services.

An example of this, is we have a workload which normally takes LOTS of resources and LOTS of cores and runs until it finishes.   We don’t have to run it too often (Quarterly) and allocating those resources, obtaining the assets while great, they’re not used every single day.   So we spin up a bunch of Compute or GPU Optimized jobs and when it might have taken days or weeks in the past we can get it done in hours or days, which means we get results and we release the resources once we get our data dumped out.

 

Certain workloads will tend to be more advantageous to others to be kept on-premises or hosted exclusively in the cloud, whether sporadically or all the time.   That really comes down to what matters to you, your IT and your support organization.

 

This is where I’m hoping you my fellow IT Pros can share your experiences (Good, Bad, Ugly) about workloads you have moved to the Clouds, I’m preferable to an Azure, Google or Amazon as they’ve really driven things down to a commoditized goods and battle amongst themselves, whereas an ATT, RackSpace, and other ‘hosted’ facility type cloud can skew the costs or benefits when contrasted to the “Big Three”

 

So what has worked well for you, what have you loved and hated about it. How much has it cost you? Have you done a full shift taking ALL your workload to a particular cloud or Clouds. Have you said ‘no more!’ and taken workloads OFF the Cloud back On-Premises? Share your experiences so that we may all learn!

 

P.S., We had a set of Workloads hosted Off-Premises in Azure which were brought wholly back in house as the high performance yet persistent always-on nature of the workloads was costing 3x-4x more than if we had simply bought the Infrastructure and hosted it internally. (Not every workload will be a winner )

 

Thanks guys and look forward to hearing your stories!

itsnormmal.jpg

Normalcy is boring, or is it?

          Something that I have been working on is helping to come up with a baseline security plan for an IT team and their infrastructure.  What I have ran into is that having a basic template and starting point really helps.  Fantastic right?  Well, when I start off by giving them credit for monitoring they look peculiar at me as in why would monitoring be a starting point?  To be fair and accurate a few high five me as they are like SAWEETNESS (meant to be spelled wrong as that literally is how I speak, ok back to the blog ) check that off the list of things to come!  Today, I'm going to go over this one portion of the plan and show why "knowing normal" is actually a starting point for a great security best practices and policies.

 

     First things first,my favorite quote "If you don't know what's normal how the heck do you know when something's wrong?".  Baseline and accurate monitoring history will show you whats normal.  This also will show you how your infrastructure handles new applications and loads when you are monitoring so its not just for up down that is just a side perk honestly.

 

Ok, now once you know what normal is the following will help you to see issues easier and be aware.  So remember the below is once you have monitored and understand your normalcy of your devices your monitoring.

 

Monitoring security features

  • Node -  up/down
    • This will show you if there is a DoS happening or a configuration error with no ability to ping a device. 
    • Will show you areas within your monitoring that are being possibly attacked.
    • Allows you to have a clear audit of the event that are taking place so you can use for management and your team for assessments.
  • Node - CPU/Memory/Volume
    • CPU will show you if there is an increase spike as that will help to show where to look for what increased or caused this spike that never went away.
    • Memory allows you to know if there is a spike obviously something is holding it hostage and you need to address this and prevent or resolve. 
    • Volume if you see a drive increase its capacity OR decrease quickly and are alerted to this you may be able to stop things like ransom ware quickly.  The trick is to be monitoring AND have alerts setup to make you aware of drastic changes.
  • Interface - utilization
    • Utilization will show you if a sudden increase of data is transferring into or out of an interface.
  • Log File monitoring
    • Know when AD attempts are failing.
      • This is something I see a lot of times and the person monitoring just states "yes, but its just an old app making the request no biggy".  Ok, to me I'm like fix the old application so this is no longer NOISE and when you have these coming in from outside this app you are more inclined to investigate and stop the whole.
    • Encryption know if files are being encrypted on your volumes
    • Directory changes if directory/file changes are happening you need to beware period
  • Configuration monitoring
    • Real-time change notification that compares to the baseline config is vital to make sure no one is changing configurations outside of your team.  Period end OF STORY.  (I preach this a lot I know.  #SorryNotSorry)
  • Port monitoring
    • rogue devices plugging into your network needs to be known when and who immediately

 

          This is obviously not all the reasons you can use against normalcy but its once again a start.  Understanding normal is vital to set up accurate alerts, reports, and monitoring features.  As you hone in your skills on assessing what you are monitoring and alerting you'll see things drop off while others will increase within your environment.

 

          Don't be shy to ask questions like, why is this important?  I seen this article on an attack, how can we be alerted in the future if this happens to us?  Some of the best monitoring I've seen is due to looking through THWACK and reading articles on what's going on in mainstream.  Bring this knowledge to your monitoring environment and begin crafting an awesome arsenal against, well, the WORLD.

 

HTH

~Dez~

  

Don’t you love an intriguing headline? I’m not going to talk about our beloved IT jobs just yet, but some other industries are in for a wake-up call.

 

Microsoft Research made an interesting announcement recently. Tasked with improving digital speech recognition, Xuedong Huang (Microsoft’s chief speech scientist) declared 12 months ago ““Speech technology is so close to the level of human performance,” he said. “I believe in the next three years we’ll reach parity.” They’ve done it in less than a year.

 

Microsoft’s speech recognition system is now making the same or fewer errors than professional human transcriptionists. The 5.9 percent error rate is about equal to that of people who were asked to transcribe the same conversation, and it’s the lowest ever recorded against the industry standard Switchboard speech recognition task. “We’ve reached human parity,” said Xuedong Huang, the company’s chief speech scientist. “This is an historic achievement.” Deep neural networks with large amounts of data were key to this technology breakthrough. Does it spell the end for human transcriptionists? Services like Facebook and YouTube already offer automatic captioning and their technology is only going to get better.

 

We’re on the brink of coding computers to make better recommendations than we do.

 

The Xero financial software is a great example of this. At this year’s Xerocon event in Brisbane, Australia, CEO Rod Drury showed off plans to remove the option for adding accounting codes when entering invoices. Why remove this fundamental accounting task feature? Because humans are stuffing it up and machines can do it better. In Xero’s tests, it took a very small amount of time for the machines to learn which things to code to which accounts, resulting in a lower error rate than when a human inputted the data. Watch out, bookkeepers. The technology enhancements in the financial services industry are just getting started.

 

Again, the key is access to data. Machines can hold significant more historical data than the human brain can. They are faster at analyzing it and they are better at identifying unexpected patterns. Have you ever run Insights across data in Microsoft’s Power BI? The question in the air is whether this analysis will see bots become better financial advisers than our accountants, based on historical trends and current economic analysis.

 

Just as IT Pros are told not to ignore the rise of Cloud computing; other industries should be careful not to sleep when machine learning and AI are on the rise. This is the real digital disruption for the companies that we provide IT support for.

 

Are we prepared for it?

 

-SCuffy

As the 2016 year is coming to the end, I've looked back and wow it has been the year of the upgrades for me at my day job. While they have all been successful (some took longer than expected ) , there were bumps, tears and even some screaming to get to the finish line. My team and I are seasoned IT professionals but that didn't stop us from making mistakes and assumptions. What I've learned after doing 5 major upgrades this year is that never assume and always be prepared for the worst and hope for the best.

 

As you embark on the annual journey of upgrades there are so many factors to look at to make sure that it is successful. While it may seem trivia at times depending on the upgrade you do, it never hurts to go through a basic upgrade run through, like a playbook or if you have a PMO work with a Project Manager. Project Managers can be life savers! But you do not need a Project Manager if you take the time to go through and gather all the information and requirements as part of your planning.

 

After looking back through all the upgrades I've done this year,  I decided to write this post and hope that we can all learn a little something with the lessons we've learned and can be avoided by others.

Let’s get back to basics…

Once we start talking upgrades let's go back to the basics and answer the five “Ws” and get some requirements; WHAT, WHY, WHO, WHERE, and WHEN. Understanding those basic requirements goes a long way. It provides the basic foundation for understanding the situation and what all needs to be done.


WHAT- Let’s first ask what are we upgrading? Is this a server operating system upgrade or an application upgrade? Determining the type of upgrade is vital because this will affect the answers to your other requirements. Once you know what you are upgrading you will need to determine if your current environment can support the upgrade. Depending on what you are upgrading, it can feel like opening a can of worms, as you find you may need to upgrade other systems to make sure it’s compatible with the upgrade you are trying to complete. You may also find out that the upgrade reaches beyond your realm of responsible and crosses over into our departments and function. A “simple” application upgrade can end up costing millions if your current environment does not support all components.

 

Some examples questions to ask:

  1. If you're doing an application upgrade does your current hardware specs meet the recommendations for the newer version? If it does not, you may need to invest in new hardware.
  2. Is this an operating system upgrade?
  3. Is this an in-place upgrade or parallel environment?
  4. Or a complete server replacement?
  5. Can you go direct to the new version or will you need to install CU’s to get current?
  6. Can your current network infrastructure support the upgrade? Does it require more bandwidth?
  7. If you are using Load Balancers or Proxy Servers, do those support the upgrade?
  8. Are there client applications that connect to your systems and are you running supported versions of the client applications?
  9. Do you have Group Policies that need to be modified?
  10. What other applications “connect” maybe impacted?
  11. Are there any legacy customizations in the will be impacted?
  12. Will there licensing impacts or changes with the upgrade?

 

Sample upgrade scenario:

 

An application like Exchange Server has far reaching impacts beyond just the application. If an Exchange DAG is implemented the network must meet certain requirements to satisfy successful replication between the databases across datacenters. Working with your network team ensures your requirements are met. You will may possibly need the storage team if you are using a SAN for your storage which may require new hardware and we all know that can be a project in itself to upgrade a SAN.

 

An often overlooked item is the client connection to exchange. What version of Outlook are users using to connect to get to their email? If you are using an unsupported version of outlook users may have issues connecting to email. Which we all know would be a nightmare to deal with. Let’s look at the impact of outlook versions to an exchange upgrade. If your outlook versions are not supported, you will need to work with the desktop teams to get everyone to a supported version.  This can be costly, from the support to implementing and deploying the upgrade to supported outlook versions and then depending on how your Microsoft Office is licensed you may need to buy additional licenses and we all know that isn’t cheap. 

 

WHY - Let’s ask why are you doing the upgrade? Is the upgrade needed to address an issue or this to say current? Once this has been identified, you can find out what and or if new features you are going to be getting and what value does it bring to the table.

 

Additional questions to ask:

 

  1. Will any new features impact my current environment?
  2. If I am addressing an issue with the upgrade, what is it fixing and are there other workarounds?
  3. Will the upgrade break any customizations that maybe in the environment?
  4. Can the upgrade be deferred?

 

WHO- Once you’ve figured out the “WHAT” you will know the “WHO” that need to be involved. Getting all the key players will help make sure that you have your ducks in a row.

 

  1.     What other teams will you need to have involved?

 

      • Network team
      • Security Team
      • Storage Team
      • Database Team
      • Server Team
      • Application Team
      • Desktop Support
      • Help Desk
      • Project Management Team
      • Testing and certification Team
      • Communications team to inform end users
      • Any other key players – external business partners if your systems are integrated

 

In certain cases, you may need even need a technology partner to help you do the upgrade. This can get complicated as you will need to determine who is responsible for each part of the upgrade. Having a partner do the upgrade is convenient as they can assume the overall responsibility of the success of the upgrade and you can watch and learn from them. Partners can bring value as they are often “experts” and have done these upgrades before and should know the pitfalls and what to watch out for. If you are using a partner, I would recommend you do your own research in addition to the guidance and support provided by the partner because sometimes the ball can be dropped on their end as well. Keep in mind they are humans and may not know all about a particular application, especially it’s very new.

 

WHEN- When are you planning to do the upgrade? Most enterprises do not like disruptions so you will need to determine if this must be done on the weekends or can you do the upgrade without impacting users in production during the weekday.

 

The timing of your upgrade can impact other activities that maybe going on in your network. For example, you probably do not want to be doing an application upgrade like Skype for Business or Exchange the same weekend the Network team is replacing/upgrading the network switches. This could have you barking up trees when there isn’t really need to be.

 


WHERE– This may seem like an easy question to answer but depending on what your upgrading you may need to make certain arrangements. Let’s say your replacing hardware in the datacenter, you will certainly need someone in the datacenter to be able to perform the switch out. If your datacenter is hosted, perhaps you will need hands on tech to perform a reboot of the physical servers in the event the remote reboot doesn’t work.

 

I’ve been in situations where the reboot button doesn’t work and the power cord of the server had to be pulled to reboot the server back online, this involved getting someone on in the datacenter to do that. Depending on your setup and processes this may require you to put support tickets in advance and coordination with the hosting datacenter hosting team. Who wants to sit waiting around for several hours to have a server rebooted just to progress to the next step in an upgrade?

 

 

HOW - How isn't really a W but it is an important step. Sometimes the HOW can be answered by the WHAT, but sometimes it can't so you must ask "HOW will this get upgraded?". Documenting the exact steps to complete the upgrade, whether it's in place or parallel environment will help you identify any potential issues or key steps that maybe missing from the plan. Once you have the steps outline in detail it's good to do a walk through of the plan with all involved parties so all the expectations are clear and set. This is also helps prevent any scope creep that could appear along the way. Having a documented detailed step plan will also help during the actual upgrade in event something goes wrong and you need to do troubleshooting.

 

Proper Planning keeps the headaches at bay…

 

It would seem common sense and almost a standard to answer the 5 W’s when doing upgrades but you would be surprised but how often how many questions are not asked. Too often we get comfortable in our roles and overlook the simple things and make assumptions. Assumptions can lead to tears and headaches if they cause a snag in your upgrade. However, lots of ibuprofen can be avoided if we plan as best as can and go back to the basics of asking the 6 W’s for information gathering.

Home for a week after the PASS Summit before heading back out to Seattle on Sunday for the Microsoft MVP Summit. It's the one week a year where I get to attend an event as an attendee and not working a booth or helping to manage the event. That means, for a few days I get to put my learn on and immerse myself in all the new stuff coming to the Microsoft Data Platform.

 

As usual, here's a bunch of links I found on the Intertubz that you might find interesting, enjoy!

 

AtomBombing: The Windows Vulnerability that Cannot be Patched

I've been digging around for a day or so on this new threat and from what I can tell it is nothing new. This is how malware works, and the user still needs to allow for the code to have access in the first place (click a link, etc.). I can't imagine who among us falls for such attacks.

 

This is the email that hacked Hillary Clinton’s campaign chief

Then again, maybe I can imagine the people that fall for such attacks.

 

Apple's desensitisation of the human race to fundamental security practices

And Apple isn't doing us any security favors, either.

 

Mirai Malware Is Still Launching DDoS Attacks

Just in case you thought this had gone away for some reason.

 

Earth Temperature Timeline

A nice way of looking at how Earth temperatures have fluctuated throughout time.

 

Surface Studio zones in on Mac's design territory

We now live in a world where Microsoft hardware costs more than Apple hardware. Oh, and it's arguably better, too, considering the Surface still has the escape and function keys.

 

Swarm of Origami Robots Can Self Assemble Out of a Single Sheet

Am I the only one that's a bit creeped out by this? To me this seems to be getting close to having machines think, and work together, and I think we know how that story ends.

 

Management in ten tweets

Beautiful in its simplicity, these tweets could serve as management 101 for many.

 

I wasn't going to let anyone use the SQL Sofa at PASS last week until I had a chance to test it first for, um, safety reasons.

couch - 1.jpg

It is a good time to remember that improving agency IT security should be yearlong endeavor. Before gearing up to move forward with implementing new fiscal year 2017 IT initiatives, it is a best practice to conduct a security audit to establish a baseline and serve as a comparison to start thinking about how the agency’s infrastructure and applications should change, and what impact that will have on IT security throughout the year.

 

Additionally, security strategies, plans and tactics must be established and shared so that IT security teams are on the same page for the defensive endeavor.

 

Unique Security Considerations for the Defense Department

 

Defense Department policy requires agencies follow NIST RMF to secure information technology that receives, processes, stores, displays, or transmits DOD information. I’m not going to detail the six-step process—suffice it to say, agencies must implement needed security controls, then assess whether they were implemented correctly and monitor effectiveness to improve security.

 

That brings us back to the security audit: A great way to assess and monitor security measures.

 

Improving Security is a Year-Round Endeavor

 

The DOD has a complex and evolving infrastructure that can make it tricky to detect abnormal activities and ensure something isn’t a threat, while also not prohibiting legitimate traffic. Tools such as security information and event management platforms automate some of the monitoring to lessen the burden.

 

The tools should automate the collection of data and analyze it for compliance, long after audits have been completed.

 

It should also be easy to demonstrate compliance using automated tools. Automated tools should help to quickly prove compliance, and if the tools come with DISA STIGs and NIST FISMA compliance reports, that’s another huge time-saver.

 

Performance monitoring tools also improve security posture by identifying potential threats based on anomalies. Network, application, firewall and systems performance management and monitoring tools with algorithms that highlight potential threats effectively ensure compliance and security on an ongoing basis.

 

Five additional best practices help ensure compliance and overall secure infrastructure throughout the year:

 

  • Remove the need to be personally identifiable information (PII) compliant, unless it’s absolutely critical. For example, don’t store stakeholder PII unless required by agencies processes. Not storing the data mitigates responsibility risks for securing it.

 

  • Remove stored sensitive information that isn’t needed. Understand precisely what and how data is stored and ensure what is kept is encrypted, making it useless to attackers.

 

  • Improve network segmentation. Splitting the network into discrete “zones” boosts performance and improves security, a win-win. The more a network is segmented, the easier it will be to improve compliance and security.

 

  • Eliminate passwords. Think about all the systems and applications that fall within an audit zone, and double check proper password use. Better yet, eliminate passwords and implement smart cards, recognized as an industry best practice.

 

  • Build a relationship with the audit team. A close relationship with the audit team ensures they can be relied upon for best practices and other recommendations.

 

  Find the full article on Signal.

Filter Blog

By date: By tag: