1 2 3 4 Previous Next

Geek Speak

2,456 posts

This week's Actuator comes to you from the month of May. As you know, May is located right next to June. That means 2018 is almost half over. Now is a good time to check in on your goals for the year and mark your progress.

 

As always, here are some links from the Intertubz that I hope will hold your interest. Enjoy!

 

Amazon cloud business just keeps rolling along

Hard to believe a company that just made $5 billion in revenue for Q1 is the same company that felt the need to jack their Prime subscription by 20%, from $99 to $119.

 

GDPR Kills the American Internet: Long Live the Internet!

May is here, and on the 25th GDPR compliance becomes a thing. Here, Cringely talks about how ICANN is not currently GDPR compliant, and the interesting discussion about the data privacy rights of individuals.

 

Transcription Service Leaked Medical Records

Business suffers a ransomware attack and then follows up by exposing patient data online. Saying “sorry” isn’t enough anymore, folks. We need to start imposing penalties for this lack of basic security. We need to do something to force people and companies to treat data as if lives depended on it.

 

$35 Million Penalty for Not Telling Investors of Yahoo Hack

Yes, exactly like this, except that is too small a fine for a company that size.

 

MacOS monitoring the open source way

I enjoy when tech companies openly disclose how they tackle the same problems we all face. Here Dropbox shares their approach, allowing everyone else to compare and contrast their methods. In the end, it is through this open disclosure and knowledge sharing that we all get a little better in how we administer our own enterprise.

 

You know how HTTP GET requests are meant to be idempotent?

This sounds like something Patrick would discover.

 

Bacoin, the First-Ever Cryptocurrency Backed by the Gold Standard of Bacon

Finally, a cryptocurrency I can sink my teeth into.

 

I need one of these for my front yard:

 

Here is an interesting article from my colleague Joe Kim, in which he explores database performance management.

 

Databases are complex, multifaceted, and vital to the health of every agency. They are the heart of every data center and arguably one of the most important components of an agency’s technology infrastructure, whether on-premises, in the cloud, or within a hybrid IT environment.

 

For these reasons, optimizing database performance is critical to enabling an optimized data center. There are five things a federal IT manager can do to meet this goal:

 

  1. Ensure that databases are healthy
  2. Gain visibility into data and metrics
  3. Put data and metrics in context
  4. Track optimization plans
  5. Create and maintain a performance baseline

 

Let’s look at each of these steps individually to understand how they all fit together:

 

Ensure database health

 

When it comes to databases, health and performance are two different things; databases must be healthy before they can be optimized. Signifiers of database health include things like CPU utilization, I/O statistics, and memory pressure. Collectively, these metrics can indicate whether a database can perform well.

 

Gain visibility

 

The next step is to begin the performance optimization process, ensuring that queries can execute quickly and throughput can be maximized. This starts with gaining full visibility into the data and metrics needed to assess database performance. For example, the ability to drill down into granular metrics like resource contention and a database’s workload is key in helping to identify and mitigate the root cause of a performance problem.

 

Put data in context

 

Agencies should be sure that data delivered by your monitoring tool is structured and presented in a way that gives you and your team the real insights necessary to fix problems and optimize performance.  Specifically, the data should help quickly identify and resolve the root cause of performance issues, and not lead you down a “rat hole” of unnecessary, second-level research and analysis.

 

Track optimization plans

 

There are things your team will do to test and optimize performance, including running optimization queries, for example. Make sure all queries and tests are tracked, and that results are carefully correlated with the tests being performed.

 

Create and maintain a performance baseline

 

It’s nearly impossible to tell when a database is underperforming if you don’t have a daily baseline “normal” to measure against. The best approach is to implement a comprehensive series of management and monitoring tools.

 

Make sure you also have the ability to view down, up, and across; it should let you drill down into the database, across database technologies, and across deployment methods (including cloud).  And, finally, make sure the tools you choose allow you to establish a historical record of performance metrics.

 

All this information, coupled with the ability to create a baseline, will help ensure that your IT teams have the tools they need to optimize the health and performance of their database.

 

To learn more about a tool that can help ensure that your database is performing at optimal levels, watch the Federal Webinar: Getting to know SolarWinds® Database Performance Analyzer.

 

Find the full article on Federal Technology Insider.

When it comes to technology, device management resources have come a long way, just as much as the technology in our actual devices has. As a network or systems admin, you can probably relate to that statement in one way or another. Network admins may have used device profiling at one time or another and you server admins probably pushed out a few changes with group policy I could imagine. Devices that are known to your IT environment are not the issue anymore. While still important, nowadays, the applications and other resources that are available to IT staff everywhere allow us to make changes to devices on a large scale… that is on one condition usually: that they are under our control.

 

The Front Door: The Network

The network provides the first level of protection against BYOD devices a lot of the time. It is the first thing users with outside devices commonly will connect to upon arrival. Proper segmentation of the network is a basic way to provide security to the network when dealing with BYOD devices. This can include firewalls or network access lists to control what these outside devices are able to access. Short, sweet, and to the point. This is a basic way that some companies choose to handle BYOD devices. They simply give these devices web access and restrict access to internal company resources.

 

What happens the first time a vendor comes on-site and needs access to your network to fix a device or an application though? You will not have control of their device, nor will a simple internet connection suffice. They will need permissive, yet secure access to the internal network in one way or another. In terms of the network, device profiling can allow both wired and wireless network users to get an individualized access control list based on their user credentials or device for starters.

 

BYOD Devices And The Software They Bring

BYOD devices can be of many different brands and models and with that they carry a wide range of software as well. Some of these devices are more secure than others. The goal of server admins is the same though: to keep the internal systems and applications secure. One way this can be done is with device posturing. Device posturing is the process of ensuring that devices that come on to the network are up to predetermined system security standards. If they are not, they will not be allowed to connect. Server admins are commonly tasked with ensuring that devices under their control are up to date with the latest security updates and free of malware. Device posturing allows admins to ensure that the security standards they set are upheld by company employees with corporate assets and visitors bringing their own devices onsite.

 

The Users: Who They Are and What They Need

The other way that I want to mention that brings both challenge (and control) to admins concerning BYOD users is around the users themselves. When the users are looked at in a granular sense, security can really be heightened very quickly. So many times, access to internal systems is controlled based on things like the wireless network they are on, or the network vlan they are assigned to. Nothing more. Everyone in that subnet would commonly share a similar set of firewalls rules and permissions. The beginning of the process of getting more granular is by managing security based on user account and user security groups. Users can be given permissions to resources based on their position in the company, a team they are working on, or an application they, as a vendor, are assigned to support. Going a step further is the topic that I previously mentioned and that is around individualized access policies that can work on a per user basis. Regardless, one common theme that is repeated whether you are in networking, server support, or desktop support is that users should only be given the access that is required, and nothing more.

 

Those three topics are some of the common things that come up when the conversation of BYOD is mentioned. Discussing and developing a plan around these things will ensure that you are putting the needed focus on such a sensitive topic. While this is not a guide for BYOD devices in your network, these three areas of focus will be a good start to securing BYOD devices in your IT environment.

It’s that time of year again. SolarWinds 2018 IT Trends Report is out! See the SolarWinds IT Trends Index for the full story; I've covered the top takeaways below.

 

Cloud computing and hybrid IT remain IT professionals' top priority for the next five years because these elements meet today's business needs while serving as the foundation for constructs like machine learning and AI.

  • 94 percent of IT professionals surveyed indicate that cloud and/or hybrid IT is one of the top five most important technologies in their IT organization's technology strategy today, with 51 percent listing it as their number one most important technology.
  • IT professionals ranked cloud and hybrid IT as the most important technologies (by weighted rank) over the next three to five years in digital transformation and as technologies that have the greatest potential to provide productivity/efficiency benefits and ROI.

 

At the same time, IT professionals are prioritizing internal investments in containers as a proven solution to the challenges of cloud computing and hybrid IT, and a key enabler of innovation.

  • 44 percent of respondents ranked containers as the most important technology priority today, and 38 percent of respondents ranked containers as the most important technology priority three to five years from now.
  • Concurrently, AI and ML investments are expected to increase over the next three to five years.
  • 37 percent of respondents indicate AI is the biggest priority and 31 percent of respondents indicate ML is the biggest priority three to five years from now (compared to 29 percent and 21 percent today, respectively).

 

 

The results of the IT Trends Survey suggest a dissonance between the views of IT professionals and their senior managers on priorities for IT investment over the next three to five years.

  • On the weighted list of technologies IT professionals believe are needed for an IT organization's digital transformation over the next three to five years, AI did not make the top five.
    • This contrasts with a recent CEO survey, which found that 81 percent of CEOs consider AI and machine learning to be a priority for their business, up from just 54 percent in 2016 (Fortune).

 

 

While IT professionals continue prioritizing cloud computing and hybrid IT, adoption of these technologies has made it challenging to optimize performance of their systems and applications.

  • 58 percent of IT professionals surveyed indicated that by weighted rank, cloud/hybrid IT presents the greatest challenges when it comes to implementation, roll-out and day-to-day performance
  • Nearly half (47 percent) of all IT professionals surveyed think that their IT environments are not operating at optimal levels
    • Over half of all IT professionals surveyed spend less than 25 percent of their time proactively optimizing performance
    • Nearly half of IT professionals spend 50 percent or more of their time reactively maintaining and troubleshooting their IT environment

 

Many IT professionals cite a lack of organizational strategy and inadequate investment in areas such as user training as the most common barriers to system optimization.

  • Of IT professionals indicating their environments are not optimized, 43 percent ranked inadequate organizational strategy as one of the top three barriers to achieving optimization
    • To achieve true performance and work toward a successful digital transformation, IT professionals require deeper strategic collaboration with business leaders.

 

Do these takeaways mirror your organizational priorities and challenges? Let me know in the comment section below.

Who’s passed more than a weekend going almost blind because something in your <replace with your beloved system, not necessarily IT> didn't work as it should, and it produced, hopefully, thousands of lines of error messages, almost all of which were the same? The key words here are, “almost the same."

 

A syslog system could save you precious time, and your family will be thankful for it, but it's up to you to find a way to filter the lines that are "almost" like the others, the ones that are different save for a single cypher that makes your day.

 

Syslog is a tool. Actually, it is the first logging tool. Try to put a screwdriver beside your PC, you won't be disappointed if it doesn't disassemble the machine. The tool leverages your skills, it doesn’t replace them.

 

There are several tricks to filter the lines produced by a syslog. The key is to find the right pattern.

 

Today there are so many GUI tools to perform these operations, but the core is the same: match a pattern to the bundle of lines, and filter the ones that will help to solve your problem.

Taking a step back, during syslog configuration, you can set it up to write the logs in a manner that is closer to your pattern’s creation.

 

We could ask to put the time-data at the beginning of the line, or maybe the severity, or, again, the service involved. That's not important. What is important is that you do it in a way that suits you.

 

So, the pattern. This is the critical point, the difference between success or wasted nights. Similar as a Google research, you need to find the right keyword for the best result.

There are several free tools to parse logs, and the same goes for patterns. There are many databases of them, but maybe your issue is more specific? Anyway, a good starting point is log parsing.

 

Today there are many advanced utilities born from syslog, a number of which are open source: rsyslog, syslog-ng, logwatch, just to name a few. The main difference is that rsyslog apply filters on the logs produced by syslog to perform actions, for example, if present the word "localhost" sends an email to someone@domain.com, or if the source IP 10.10.10.10 is present it writes the line on file "10101010".

 

Syslog-ng is a more complex utility that not only filters, but also correlates and classifies. Logwatch is interesting, and I’ll cover it in a future post.

 

All of the tools use a set of precompiled patterns, in many cases modifiable. Let me say that it's quite unusual to not find the right filter for your very specific requirements.

Besides these tools there's another subset to consider when talking of logs: SNMP - traps and polling. Usually they're used for monitoring purposes instead of analysis, but the core concept is the same: a tool that writes lines that are constantly filtered by another tool that sends traps - or waits to be read by a poller to raise an alert. The first part of the process is the same: logging - and the second is similar too: filtering.

 

So, enjoy, try, install, reinstall, destroy, but above all, keep logging! It can save you a lot of precious free time, and it can help your peers as well, even those from other parts of the world.

March 31 was World Backup Day. If you are like me, you probably spent most of your day burning old CDs to tape storage.

 

I often see forum posts from accidental administrators who want to know how to recover data without a backup. The short answer is, “Now is a good time to work on your resume.” The longer answer is, “Recreate all your data.”

 

But the truth is that you shouldn’t ever be in this position. The number one job for any administrator is recovery. If you can’t recover, you can’t keep your job.

 

So, here are six ways for you to help protect your backups and your job.

 

Know What You Need

Many of those forum posts share a common thread, which is this: the contributor clearly does not know about the system he is tasked with recovering. So, the first step is to start making a list of all your servers and applications. Ask people the simple question, “What are the critical systems and applications you work with every day, week, and month?” Don’t forget to answer these questions yourself. Make a list, use it as a reference, and keep it updated. This is where monitoring tools that have auto-discovery are your best friend.

 

Configure Your Backups

This seems like something Captain Obvious would tell you, but yeah, configure the backups. It’s not enough to just know what needs to be backed up, you need to make sure backups are in place. Take care to note data volume here, as you may not want all your backups happening at the same time and flooding the network. Or, worse yet, having your backups run longer than 24 hours, causing your backup software to start a new day before the previous day is complete. Good times.

 

Verify Backups Are Happening

You must build a process to ensure that the backups are happening. My preference here is to make sure I have three pieces of information. The first is that the backup job ran without error. The second is that the backup media is available. The third is that the backups remain consistent with our RTO and RPO requirements.

 

Test Your Recovery

Backups are valuable, but restores are priceless. You should be testing your recovery process on a frequent basis. Many companies do DR testing once or twice a year. I find that the volume of data grows far too much in that length of time, making DR exercises difficult. I advocate frequent testing of the recovery process to verify that the backup media is good, and that the RTO and RPO requirements are being met.

 

Protect Your Backups

For database backups, I like using passwords and encryption. Anything you can do to take an extra step of security to protect that data is worth your time. You should approach your backups with a very simple concept: assume it will be lost or stolen. If it was lost or stolen, make sure you minimize your risk by protecting the backup in some way.

 

Consider Extra Copies

If your data is critical, you want to consider having extra copies of your backups. I like the idea of using a mix of offsite tape storage and a cloud backup provider. That way I reduce my risk by storing different formats in different locations. Just make certain that you have defined an RPO and RTO for each method being used.

 

Summary

Backups are necessary for your business continuity planning. It is often easier to build a recovery plan first because that will often dictate your backup strategy. Whatever backup strategy you deploy, these six steps will help you ensure that your next disaster does not result in a resume-generating event.

This week's Actuator comes to you from spring, where, for the first time in months, it is not snowing as I write. I've already brought out the patio furniture and I'm hoping in the next week or two to move into the outdoor office. It's going to feel nice to sit outside and enjoy the sunshine.

 

As always, here are some links from the Intertubz that I hope will hold your interest. Enjoy!

 

AWS Lambda Still Towers Over the Competition, but for How Much Longer?

AWS is poised to own 70% of all future serverless applications. Not bad for a book store that recently decided to sell groceries.

 

Titus, the Netflix container management platform, is now open source

If your company is just getting started with containers and you have questions about how to manage them, check out what they do at Netflix. Chances are Netflix might know a thing or two about things like scalability and DevOps. You can benefit from that knowledge. For free.

 

Attackers exfiltrated a casino’s high-roller list through a connected fish tank

Ocean’s Eleven, no doubt.

 

Microsoft launches a phishing attack simulator and other security tools

It’s wonderful to see Microsoft taking such a proactive approach to security. We’ve spent decades building apps with performance and convenience priorities ahead of security. It’s nice to see security being placed first, where it belongs.

 

Facebook Admits to Tracking Non-Users Across the Internet

So, if you don’t want Facebook to track you, all you need to do is join Facebook and tell them to stop. This is why Zuckerburg is worth billions and I’m still tuning queries.

 

No boundaries for Facebook data: third-party trackers abuse Facebook Login

I’m shocked, shocked to discover these third-party applications may be a risk to my data privacy. Wait a minute. No, I am not shocked. This is expected behavior from people and companies that don’t mind abusing the principle of informed consent in order to earn a dollar.

 

Meet Boston Dynamics' Family Of Robots

We’re doomed.

 

Remember when you could buy software inside the Apple store? Here's the only software they sell now:

Monitoring has always been a loosely defined and somewhat controversial term in IT organizations. IT professionals have very strong opinions about the tools they use, because monitoring and alerting is one of the key components of keeping systems online, performing, and delivering IT’s value proposition to the business. However, as the world has shifted more toward a DevOps mindset, especially in larger organizations, the mindset around monitoring systems has also shifted. You are not going to stop monitoring systems, but you might want you want to rationalize what metrics you are tracking in order to give better focus.

 

What To Monitor?

 

While operating systems, hardware, and databases all expose a litany of metrics that can be tracked, choosing that many performance metrics can make it challenging to pay attention to critical things that may be happening in your systems. Such deep dive analysis is best reserved for deep troubleshooting of one-off problems, not day-to-day monitoring. One approach to consider is classifying systems into broad categories and applying specific indicators that allow you to evaluate the health of a system.

 

User Facing Systems like websites, e-commerce systems, and of course everyone’s most important system, email, have availability as the most important metric. Latency and throughput are secondary metrics, but especially for customer facing systems are equally as important.

 

Storage and Network Infrastructure should emphasize latency, availability, and durability. How long does a read or write take to complete, or how much throughput is a given network connection seeing?

 

Database systems, much like front-end systems, should be focused on end-to-end latency, but also throughput--how much data is being processed, how many transactions are happening per time period.

 

It is also important to think about which aspect of each metric you want to alert (page an on-call operator). I like to approach this with two key rules: any page should be on something that can be actionable (service down, hardware failures), and always remember there is a human cost to paging, so if you can have automation respond to page and fix something with a shell script, that’s all the better.

 

It is important to think about the granularity of your monitoring. For example, the availability of a database system might only need to be checked every 15 seconds or so, but the latency of the same system should be checked every second or more to capture all query activity. At each layer of your monitoring, you will want to think about this. This a classic tradeoff of volume of data collection in exchange for more detailed information.

 

Aggregation

 

In addition to doing real-time monitoring, it is important to understand how your metrics look over time. This can give you key insights into things like SAN capacity and the health of your systems over time. Also, it leads you to be able to identify anomalies and hot spots (i.e. end of month processing), but also plan for peak loads. This leads to another point - you should consider collection of metrics as distributions of data rather than averages. For example, if most of your storage requests are answered in less than 2 milliseconds, but you have several that take over 30 seconds, those anomalies will be masked in an average. By using histograms and percentiles in your aggregation, you can quickly identify when you have out of bounds values in an otherwise well-performing system.

 

 

Standardize

 

Defining a few categories of systems, and standardizing your data collection allows for common definitions, and can drive towards having common service level indicators. This allows you to build a common template for each indicator and common goal towards higher levels of service.

By Paul Parker, SolarWinds Federal & National Government Chief Technologist

 

Here is an interesting article from my colleague Joe Kim, in which he discusses the impact of artificial intelligence on cybersecurity.

 

Agencies are turning to artificial intelligence (AI) and machine learning to bolster the United States’ cybersecurity posture.

 

Agencies are dealing with enormous amounts of data and network traffic from many different sources, including on-premises and from hosted infrastructures—and sometimes a combination of both. Humans can’t sift through this massive amount of information, which makes managing security a task that cannot be exclusively handled manually.

 

AI alleviates many of these challenges. Machines can automatically comb through millions of packets of information and detect suspicious behavior. The more data these machines analyze, the more intelligent they become, and the better they are at noticing, predicting, and preventing security breaches.

 

But while AI offers many great benefits, it should not be considered a replacement for human intervention or existing network monitoring tools. Instead, AI should complement and support the people and tools that agencies are already using to keep their networks safe.

 

The human factor remains critical

 

The cyber threat landscape continues to change rapidly, and some aspects of that landscape require human intervention now more than ever before. Respondents to our Federal Cybersecurity Survey indicated a wide range of threat sources, from foreign governments to hackers, terrorists and, beyond.

 

The biggest threat, though, appears to come from careless or untrained insiders, with 54 percent of respondents listing them as their top concern. This point exemplifies why people still very much matter when it comes to cybersecurity. Even though machines and systems can be highly effective at preventing suspicious behavior, they are not great at training staff to adhere to agency policies or practice strong overall security hygiene.

 

Of course, AI can certainly help prevent malicious or careless insiders from doing damage. Automatic detection of suspicious activity and immediate alerts can help managers respond more quickly to potential threats. It can also be used to fill in gaps resulting from the lack of human resources or security training, and significantly decrease the time it takes to analyze data. As such, AI can reduce attack identification and response times from days to hours or even minutes.

 

Even so, humans will still be needed to react to and implement those responses. They remain a critical piece of the cybersecurity puzzle.

 

Traditional monitoring solutions are still vital

 

Just as humans will continue to play an important role in network security in the age of AI, tools such as security information and event management (SIEM) systems, network configuration management and user device monitoring programs should remain a foundational element of agencies’ initiatives. These solutions supplement AI by extracting information from the constant noise, allowing managers to focus on truly critical issues and pinpoint security threats.

 

Like AI tools, traditional network monitoring programs can analyze huge amounts of data. They complement this ability with continuous monitoring of user activity and network devices and provide automated threat intelligence alerts along with contextual information to help managers act on that information. Indeed, our survey indicated that these tools continue to play a significant role in keeping networks protected; for example, 44 percent of respondents using some form of device protection solution stated they are able to detect rogue devices within minutes.

 

In short, while AI is extremely useful, it should not be used exclusively. Instead, agencies should plan on augmenting existing best practices and the abilities of their staff with AI. Because although AI is good and here to stay, it’s the use of tried and true resources that will continue to lift up the machines as they rise.

 

Find the full article on SIGNAL.

Recently, ITWorld asked me to share some thoughts on "IT's Worst Addictions (And How to Cure Them)" (https://www.itworld.com/article/3268305/it-strategy/worst-it-addictions-and-how-to-cure-them.html). While I had shared a number of thoughts on the topic, space and format restricted the post so that only a couple of my ideas were printed. I wanted to share a more complete version with you here.

 

Sensitivity First

The tone of the original article was fairly light, using the word "addiction" in it's informal, rather than medical, context. This is understandable, and in that framework it's easy to lapse into AA-style thinking/language that conflates “IT addictions” with true  addictive behaviors and issues. I think doing so would be unfair to individuals (and their families,  friends, and coworkers) who are dealing with the very real and very serious impact of real addictions every day. I want to avoid trivializing something that has caused so much real trauma and pain, stolen years, and lost lives.

 

At the same time, I recognize that the obsessive behaviors we’re discussing can be remarkably similar to true addiction. Therefore, traditional conversations about addiction may be a source of guidance and wisdom for us.

 

In this post, I hope it's clear that this is a line I'm treading sensitively so that it's clear I'm not making light of a serious topic.

 

That said, over the course of my career I have noticed there are certain behavioral traps and anti-patterns that IT professionals fall into.

 

Let’s start with the IT pro obsessions that everyone thinks of that I have no desire to talk about, because they are well-known and have been chewed over thoroughly:

  • Everything to do with your phone (duh)
  • Communication channels (email, slack, work IM, etc.) (duh)
  • Coffee (duh)

 

Those are the obvious ones. Now let's look at some that are not so obvious:

 

Checking that screen one more time

What “that screen” is differs for each IT pro, but we all have that one thing we compulsively check. It could be the NOC dashboard; it could be the performance tracker for our “baby” system; it could be the cloud statistics. One would hope that for many, it’s the monitoring dashboard.

 

The latest and greatest

This refers to the compulsive need to update, whether we can make a valid financial justification for it or not. Again, the specific manifestation varies. It could be the latest phone, tablet, or laptop, the newest phone service (Google Fi, anyone?), the fastest home internet service, or pro-sumer grade equipment.

 

Monitors

(The hardware kind. I wouldn't ever say you could have too many SolarWinds monitors!)

There are very few IT pros who would say "no" to adding one (or four) more screens to their system, if they had the option. Better still, this desire does not hinge on how many screens one already has. More is always better.

 

Training/Certifications

As strange as it sounds, some IT pros have to be on top of the latest learning. That means lifetime subscriptions to online courses, obsessively upgrading certifications, and more.

 

News

Many IT pros are hopeless news junkies. It may manifest in a single area (politics, sports, tech trends, entertainment) or a combination of those, but the upshot is that we want to know the latest updates, whether they come on our mobile device, the third screen of our main computer, or good old fashioned wood pulp dropped at our front door each morning.

 

Collectibles

Once again, this obsession has a nearly infinite number of variations, including LEGO sets, watches, comic books, figurines. and more. Many IT pros have “that thing” that they go out of their way (and often break their budget) for.

 

(It should be noted that SolarWinds, with our ever-expanding array of buttons and stickers sporting unique ideas, happily feeds into this obsession.)

 

Community

Contrary to the stereotype of the nerdy loner, IT pros tend to be very dedicated to building and being part of a community (or several). While these communities often have an online component, most focus on (and culminate in) an IRL meet-up where they can share stories, offer support, and just bask in the glow of like-minded folks. These communities might be vendor-supported (SWUG, CiscoLive, Microsoft Ignite, etc); vendor-agnostic but professionally oriented (SQL Saturdays, DevOpsDays, PHP.ug, etc.), non-professional but infinitely geeky (D&D conventions and Comic Cons rank high on this list, but are by no means the only examples); or otherwise focus on cultures, medical challenges, car ownership, and more. The point is that IT pros often become deeply (some might say obsessively) involved in these communities and seeing them thrive.

 

The sharing corner

So what are YOUR compulsive IT distractions? Let me (and the rest of us) know in the comments below. Based on feedback, I may even pull together some thoughts on how we all can address the negative aspects of these behaviors and become better for the effort.

One of the biggest draws of the public cloud includes services like managed Kubernetes or server-less functions. Managed services like these enable IT organizations to consume higher-level services, which allow the IT organization to focus their efforts on opportunities to create business value from technology.

 

Configuration management tools like Chef, Puppet, and Ansible are central to modern cloud deployments. These tools enable an automated and consistent configuration of instances. This allows administrators to utilize cloud-native practices like immutable infrastructure, in which instances or servers are treated as low-value objects that can be easily recreated, as opposed to long-living servers that are carefully maintained.

 

Each of the popular configuration management tools utilize a server/agent construct in which the agent or node is managed by the configuration management server and pulls its configuration from the server. This introduces long living infrastructure in the form of the configuration management servers that must be maintained. Creating automation cookbooks or modules is challenging enough without having to provision and maintain the infrastructure required to facilitate the automation.

 

The benefits of managed configuration management are:

  1. Quickly test new versions - One of the challenges with configuration management tools is keeping up with the release cycle of the software. Utilizing a managed solution enables IT teams to quickly spin up a configuration management server and rapidly test new features with little hassle.
  2. Simplify upgrades - Once a new version of the configuration management tool has been tested and deemed production ready then the process of upgrading the server infrastructure begins. This requires a considerable amount of time and effort from the engineer. With a fully managed solution all of that time and effort is given back to the engineers.
  3. Enable isolated automation development environments - The ability to provision a production like environment along with the configuration management platform gives automation engineers an isolated environment to test their automation changes with a greater assurance that it won't break a shared environment.
  4. Scalable - Building the configuration management infrastructure that scales properly as the environment reaches thousands and tens of thousands of nodes is incredibly complex and makes things like upgrades that much more painful. The ability to utilize a single solution for ten nodes or ten thousand nodes is incredibly valuable.

 

Managed Deployment

The following solutions are managed deployments. This means the configuration management software company has added a deployment solution to the respective cloud provider's marketplace to allow the infrastructure to be provisioned with the click of a button.

 

Chef Automate

The Chef Automate platform is a configuration management platform created by Chef Software that provides an end-to-end solution for automation engineers to develop, test, and deploy their cookbooks. AWS and Azure offer a marketplace offering of a Chef Automate that can be deployed and ready to use within minutes.

 

Puppet Enterprise

Puppet Enterprise is a configuration management platform created by Puppet that provides a standard set configuration management functionality. Both AWS and Azure offer a marketplace deployment option.

 

Fully Managed

The following solutions are fully managed configuration management solutions such that the cloud provider manages your configuration management platform on your behalf and allows engineers to focus on automation cookbooks or modules.

 

AWS OpsWorks

AWS OpsWorks is a fully managed solution that completely abstracts the server infrastructure related to configuration management. This allows organizations to take full advantage of configuration management tools like Chef and Puppet without the administrative overhead of managing a server.

 

Azure Automation

Azure automation utilizes PowerShell DSC (Desired State Configuration) for managed configuration management, this fits perfectly in-line with Microsoft's vision for PowerShell to manage all things.

 

Ultimately the value proposition for configuration management tools is in the consistent automated configuration of the instances and not in managing the infrastructure to support the configuration management tools. Future posts will delve into other core operational aspects that are critical to cloud environments but in and of themselves don't provide much value.

I've been responsible for Disaster Recovery and Business Continuity Preparedness for my company for nearly eight years. In my role as Certified Business Continuity Professional, I have conducted well over a dozen DR exercises in various scopes and scale. Years ago, I inherited a complete debacle. Like almost all other Disaster Recovery Professionals, I am always on the lookout for better means and methods to further strengthen and mature our DR strategy and processes. So, I roll my eyes and chuckle when I hear about all of these DRaaS solutions or DR software packages.

 

My friends, I am here to tell you that to oversee a mature and reliable DR program the devil is in the details. The bad news is that there is no real quick fix, one-size-fits- all, magic wand solution available that will allow you to put that proverbial check in the "DR" box. Much more is needed. Just about all the DR checklists and white papers I've ever downloaded, at the risk of being harassed by the sponsoring vendor, pretty much give the same recommendations. What they neglect to mention are the specifics, the intangibles, the details that will make or break a DR program.

 

First, test. Testing is great, important, and required. But before you schedule that test and ask the IT department, as well as many members from your business units, to participate and waste a portion of their weekend, you darn well better be ready. Remember, it is your name that is on this exercise. You don’t want to have to go back however man months later and ask your team to give up another weekend to participate. Having to test processes only to fail hard after the first click will quickly call your expertise into question.

 

Second, trust but verify. If you are not in direct control of the mission critical service, then you audit and interview those who are responsible, and do not take their word for it when they say, "It'll work." Ask questions, request a demonstration, look at screens, walk through scenarios and always ask, "What if...?"

 

Third, work under the assumption that the SMEs aren't always available. Almost every interview has a Single Point of Failure (SPoF) by the third "What if?" question.

"Where are the passwords for the online banking interfaces stored?"

"Oh! Robert knows all of them." answered the Director of Accounts Payable.

"What if Robert is on vacation on an African safari?"

"Oh!" said the director. "That would be a problem."

"What if we didn't fulfill our financial obligations for one day? Two or three days. A week?" I asked.

"Oh! That would be bad. Real bad!"

Then comes the obligatory silence as I wait for it. "I need to do something about that." Make sure you scribble that down in your notes and document it in your final summary.

 

Fourth, ensure the proper programming for connectivity, IP/DNS, and parallel installations. This is where you will earn your keep. While the DRaaS software vendors will boast of simplicity, the reality is that they can only simplify connectivity so much. Are your applications programmed to use IP and not FQDNS? Does your B2B use FTP via public IP or DNS? And do they have redundant entries for each data center? The same question can be applied to VPNs. And don't forget parallel installations, including such devices as load balancers and firewalls. Most companies must manually update the rules for both PRD and DR. I've yet to meet a disciplined IT department that maintains both instances accurately.

 

Fifth, no one cares about DR as much as you do. This statement isn't always true, but always work under the assumption that it is. Some will care a whole lot. Others will care a little. Most will hardly care at all. It is your job to sell the importance of testing your company's DR readiness. I consistently promote our company's DR readiness, even when the next exercise isn't scheduled. My sales pitch is to remind IT that our 5,000+ employees are counting on us. People's mortgage payments, health insurance, children’s tuition all rely on paychecks. It is our duty to make sure our mission-critical applications are always running so that revenue can be earned and associates can receive those paychecks. This speech works somewhat because, let's face it, these exercises are usually a nuisance. While many IT projects push the business ahead, DR exercises are basically an insurance policy.

 

Sixth, manage expectations. This is pretty straightforward, but keep in mind that each participant has his/her own set of expectations. Whether it be the executives, the infrastructure teams and service owners, or the functional testers. For example, whenever an executive utters the words "pass" or "fail," immediately correct them by saying, "productive," reminding them that there is no pass/fail. Three years ago I conducted a DR exercise that came to a dead stop the moment we disconnected the data center's network connectivity. Our replication software was incorrectly configured. The replicators in DR needed to be able to talk to the Master in our production data center. All the participants were saying that the exercise was a failure, which triggered a certain level of panic. I corrected them and said, "I believe this was our finest hour!" Throughout your career, you should be prepared to correct people and help manage their expectations.

 

Seventh, delegate and drive accountability. Honestly, this isn't my strong suit. With every exercise that I have conducted, the lead-up and prep often finds dozens of gaps and showstoppers. What I need to be better at doing is holding the service owners accountable and delegate the responsibility of remediation when a gap or showstopper is identified. Instead, I often fall back on my 20+ year IT background and try to fix it myself. This consumes my time AND lets the service owners off the hook. For example, while prepping for my most recent exercise, I learned that a 2TB disk drive that contains critical network shares had stopped replicating months ago. The infrastructure manager told me that the drive was too big and volatile and that it was consuming bandwidth and causing other servers to fail their RPO. Once I got over my urge to scream, I asked what the space threshold was that needed to be reached to be able to turn the replication back on. I then asked him what he could do to reduce disk space. He shrugged and said, "I don't know what is important and what isn't." So, I took the lead and identified junk data and reduced disk space by 60 percent. I should have made him own the task, but instead took the path of least resistance.

 

Eight, documentation. Very few organizations have it. And those who do have documentation usually have only what is obsolete. The moment it is written down, some detail has changed. Also, what I have learned is that very few people refer to documentation after it is created.

 

So, there you have it. I have oodles more, but this article is long enough already. I hope you find what I shared useful in some capacity. And remember, when it comes to DR exercises, the devil is in the details.

When designing the underlying storage infrastructure for a set of applications, several metrics are important.

 

First, there’s capacity. How much storage do you need? This is a metric that’s well understood by most people. People see GBs and TBs on their own devices and subscription plans on a daily basis, so they’re well aware of it.

 

There’s also performance, which is a bit more difficult. People tend to think in terms of “slow vs. fast," but these are subjective metrics. For storage, the most customer-centric metric is response time. How long does it take to process a transaction? Response time is, however, the product of a few other metrics, including I/O operations per second, the size of an I/O, and the queue depth of other I/O in front of you.

 

Sizing a storage system

If you size a storage system to meet both capacity and peak performance requirements, you will generally have low response times. Capacity is easy; I need X Terabytes. Ideally, you’d also have some performance numbers to base the size of your system on, including expected IOps, I/O size, and read:write ratio to name a few. If you don’t have these performance requirements, a guesstimate is often the closest you can get.

 

With this information, and an idea of which response time you’re aiming for, it’s possible to configure a system that should be in the sweet spot. Small enough to make it cost effective, yet large enough that you can absorb some growth and/or unexpected peaks in performance and capacity. Depending on your organization and budget, you might undersize it to only cover the 95th percentile peak performance, or you might oversize it to facilitate growth in the immediate future.

 

Let it grow, let it grow… and monitor it!

Over time though, your environment will start to grow. Data sets increase and more users connect to it. Performance demands grow in step with capacity. This places additional demands on the system; demands that it wasn’t sized for initially.

 

Monitoring is crucial in this phase of the storage system lifecycle. You need to accurately measure the capacity growth over time. Automated forecasts will help immensely. Keep an eye on the forecasting algorithms and the statistics history. If the algorithm doesn’t use enough historical data, it might result in extremely optimistic or pessimistic predictions!

 

Similarly, performance needs to be guaranteed throughout the life of the array. The challenge with performance monitoring is that it’s usually a chain of components that influence each other. Disks connect to busses, which connect to processors, which connect to front-end ports, and you need to monitor them all. Depending on the component that’s overloaded, you might be able to upgrade it. For example, connect additional front-end ports to the SAN or upgrade the storage processors. At some point though, you’re going to hit a limit. Then what?

 

Failure domain

Fewer, larger systems have several advantages over multiple smaller arrays. There are fewer systems to manage, which saves you time in monitoring and day-to-day maintenance. Plus, there’s fewer losses, as silos tend to not be fully utilized.

 

One important aspect to consider, though, is the failure domain. What's the impact if a system or component fails? Sure, you could grow your storage system to the largest possible size. But if it fails, how long would you need to restore all that data? In a multi-tenancy situation, how many customers would be impacted by a system failure? Licenses for larger systems are sometimes disproportionally more expensive than their smaller cousins; does this offset the additional hassle of managing multiple systems? There’s multiple approaches possible. Let me know which direction you’d choose: fewer, bigger systems, or multiple smaller systems!

This week's Actuator comes to you from an unseasonably cold spring here in New England. It snowed this week, making for a chilly Boston Marathon. Here's hoping we get rewarded later with a few extra weeks of summer warmth through October.

 

As always, here are some links from the Intertubz that I hope will hold your interest. Enjoy!

 

In a Leaked Memo, Apple Warns Employees to Stop Leaking Information

Apple, one of the least security focused manufacturers on the planet, has issues keeping secrets. Sounds about right.

 

Strong feedback loops make strong software teams

Not just software, but all teams. An important part of the feedback loop is how a person handles the feedback. Different people have different tolerance levels for criticism of their work. It’s important to remember that when building teams, or assigning tasks.

 

Verizon 2018 Data Breach Investigations Report: Tales of dirty deeds and unscrupulous activities

Verizon published their annual data breach report, showing us details about who is behind the organized attacks, their motivations, and the industries targeted. This report should be required reading for everyone working in IT.

 

Waymo seeks permission to test fully driverless cars in California

It’s been a while since I talked about autonomous cars, so here’s a quick update. If you didn’t know, Waymo is a division of Alphabet, which is Google’s parent company. So, Waymo is Google. Let that sink in and think about the data that Google will collect about you and your travel habits.

 

Mark Zuckerberg's Congressional testimony showed that a bedrock principle of online privacy is a complete and utter fraud

Informed consent is the loophole that internet companies have exploited for decades. Maybe now we can work on closing that loophole.

 

User Privacy Isn't Solely a Facebook Issue

A nice reminder that data security and privacy is a bigger issue. Your ISP has the most control over your security and privacy online.

 

'Dear Mark, this is why I hate you.' An open letter to Zuckerberg

To opt-out of Facebook infringing on your privacy, you must first sign up to use Facebook. I can’t understand why more people aren’t outraged to the point that we just shut Facebook down completely.

 

Speaking of spring, now's a good time to get started on that re-wiring project you've been putting off for a while.

 

By Paul Parker, SolarWinds Federal & National Government Chief Technologist

 

There’s a lot of attention being paid to innovative and groundbreaking new technologies like machine learning (ML), artificial intelligence (AI), and blockchain. But does the reality surrounding these technologies match the hype?

 

That’s one of the key questions featured in the Public Sector version of the SolarWinds IT Trends Report 2018: The Intersection of Hype and Performance. We surveyed more than 100 IT practitioners, managers, and directors from public sector organizations. Our goal was to gauge their perspectives on the technologies that are making the biggest differences for their agencies.

 

Below are some of the top findings from this year’s report. Complete details can be found here.

 

Hybrid IT and Cloud Remain Top Priorities

 

Despite the excitement surrounding emerging technologies, IT professionals are continuing to prioritize investments related to hybrid IT and cloud computing. In fact:

 

  • 97 percent of survey respondents listed hybrid IT/cloud among the top five most important technologies to their organization’s IT strategies
  • 50 percent listed hybrid IT/cloud as their most important technologies

 

Automation, the Internet of Things, Big Data, and Software-Defined “Everything” are Gaining Steam

 

According to the respondents, the rest of the top five most important tools include:

 

  • Automation (#2)
  • Big data analytics (#3)
  • The Internet of Things (#4)
  • Software-defined everything (#5)

 

Automation and big data analytics scored highly in the category “most important technologies needed for digital transformation over the next three to five years.”  Respondents were also bullish on the productivity and efficiency benefits of automation and big data analytics, as well as their potential to deliver a high ROI. Still, hybrid IT/cloud continued to set the pace as the top spot in all categories.

 

Reality Hasn’t Caught Up with the Hype around AI, Machine Learning, Blockchain, and Robotics

 

While there’s been a great deal of media attention and interest in these four technologies, the reality is, for now, IT professionals are focusing their attention on proven technologies that can deliver an immediate value in their predominantly complex hybrid IT environments. Respondents do not deny the importance of AI, machine learning, blockchain, and robotics technologies. They are simply not designating them as “mission critical”—at least, not yet.

 

However, these solutions are expected to gain increasing prominence over the next few years. Namely:

 

  • 34 percent of respondents believe that AI will be the primary technology priority over the next three to five years
  • 36 percent feel the same about machine learning
  • Other technologies mentioned include robotics (by 14 percent of respondents) and blockchain (7 percent)

 

Can’t Contain Containers

 

Conversely, open source containers turned out to be big movers this year. Forty-four percent of respondents ranked containers as one of their most important technology priorities today. That’s a huge jump over last year’s report, in which just 16 percent of IT professionals indicated they were working on developing containerization skills 2017 IT Trends Report.

 

Containers can help organizations increase agility while addressing some challenges introduced by hybrid IT/cloud environments. As noted by respondents, those challenges include:

 

  • Environments that are not optimized for peak performance (46 percent of respondents)
  • Significant time spent reactively maintaining and troubleshooting IT environments (45 percent of respondents)

 

Technology Adoption Barriers Remain

 

Survey respondents also identified some key challenges standing in the way of technology adoption. In addition to IT environments not operating at optimal levels, many IT professionals cite a lack of organizational strategy and inadequate investment in areas like user training.

 

  • 44 percent of respondents ranked inadequate organizational strategy as one of the top three obstacles to better optimization
  • User and technology training was also considered a barrier by 43 percent of respondents

 

More information on the SolarWinds annual IT Trends Report and the complete public sector results can be found here.

 

Are these findings consistent with what you’re seeing in your organization? Or, are you experiencing something completely different? Share your perspective in the comments.

Filter Blog

By date: By tag:

SolarWinds uses cookies on its websites to make your online experience easier and better. By using our website, you consent to our use of cookies. For more information on cookies, see our cookie policy.