Skip navigation
1 2 3 Previous Next

Geek Speak

2,434 posts

Monitoring has always been a loosely defined and somewhat controversial term in IT organizations. IT professionals have very strong opinions about the tools they use, because monitoring and alerting is one of the key components of keeping systems online, performing, and delivering IT’s value proposition to the business. However, as the world has shifted more toward a DevOps mindset, especially in larger organizations, the mindset around monitoring systems has also shifted. You are not going to stop monitoring systems, but you might want you want to rationalize what metrics you are tracking in order to give better focus.

 

What To Monitor?

 

While operating systems, hardware, and databases all expose a litany of metrics that can be tracked, choosing that many performance metrics can make it challenging to pay attention to critical things that may be happening in your systems. Such deep dive analysis is best reserved for deep troubleshooting of one-off problems, not day-to-day monitoring. One approach to consider is classifying systems into broad categories and applying specific indicators that allow you to evaluate the health of a system.

 

User Facing Systems like websites, e-commerce systems, and of course everyone’s most important system, email, have availability as the most important metric. Latency and throughput are secondary metrics, but especially for customer facing systems are equally as important.

 

Storage and Network Infrastructure should emphasize latency, availability, and durability. How long does a read or write take to complete, or how much throughput is a given network connection seeing?

 

Database systems, much like front-end systems, should be focused on end-to-end latency, but also throughput--how much data is being processed, how many transactions are happening per time period.

 

It is also important to think about which aspect of each metric you want to alert (page an on-call operator). I like to approach this with two key rules: any page should be on something that can be actionable (service down, hardware failures), and always remember there is a human cost to paging, so if you can have automation respond to page and fix something with a shell script, that’s all the better.

 

It is important to think about the granularity of your monitoring. For example, the availability of a database system might only need to be checked every 15 seconds or so, but the latency of the same system should be checked every second or more to capture all query activity. At each layer of your monitoring, you will want to think about this. This a classic tradeoff of volume of data collection in exchange for more detailed information.

 

Aggregation

 

In addition to doing real-time monitoring, it is important to understand how your metrics look over time. This can give you key insights into things like SAN capacity and the health of your systems over time. Also, it leads you to be able to identify anomalies and hot spots (i.e. end of month processing), but also plan for peak loads. This leads to another point - you should consider collection of metrics as distributions of data rather than averages. For example, if most of your storage requests are answered in less than 2 milliseconds, but you have several that take over 30 seconds, those anomalies will be masked in an average. By using histograms and percentiles in your aggregation, you can quickly identify when you have out of bounds values in an otherwise well-performing system.

 

 

Standardize

 

Defining a few categories of systems, and standardizing your data collection allows for common definitions, and can drive towards having common service level indicators. This allows you to build a common template for each indicator and common goal towards higher levels of service.

By Paul Parker, SolarWinds Federal & National Government Chief Technologist

 

Here is an interesting article from my colleague Joe Kim, in which he discusses the impact of artificial intelligence on cybersecurity.

 

Agencies are turning to artificial intelligence (AI) and machine learning to bolster the United States’ cybersecurity posture.

 

Agencies are dealing with enormous amounts of data and network traffic from many different sources, including on-premises and from hosted infrastructures—and sometimes a combination of both. Humans can’t sift through this massive amount of information, which makes managing security a task that cannot be exclusively handled manually.

 

AI alleviates many of these challenges. Machines can automatically comb through millions of packets of information and detect suspicious behavior. The more data these machines analyze, the more intelligent they become, and the better they are at noticing, predicting, and preventing security breaches.

 

But while AI offers many great benefits, it should not be considered a replacement for human intervention or existing network monitoring tools. Instead, AI should complement and support the people and tools that agencies are already using to keep their networks safe.

 

The human factor remains critical

 

The cyber threat landscape continues to change rapidly, and some aspects of that landscape require human intervention now more than ever before. Respondents to our Federal Cybersecurity Survey indicated a wide range of threat sources, from foreign governments to hackers, terrorists and, beyond.

 

The biggest threat, though, appears to come from careless or untrained insiders, with 54 percent of respondents listing them as their top concern. This point exemplifies why people still very much matter when it comes to cybersecurity. Even though machines and systems can be highly effective at preventing suspicious behavior, they are not great at training staff to adhere to agency policies or practice strong overall security hygiene.

 

Of course, AI can certainly help prevent malicious or careless insiders from doing damage. Automatic detection of suspicious activity and immediate alerts can help managers respond more quickly to potential threats. It can also be used to fill in gaps resulting from the lack of human resources or security training, and significantly decrease the time it takes to analyze data. As such, AI can reduce attack identification and response times from days to hours or even minutes.

 

Even so, humans will still be needed to react to and implement those responses. They remain a critical piece of the cybersecurity puzzle.

 

Traditional monitoring solutions are still vital

 

Just as humans will continue to play an important role in network security in the age of AI, tools such as security information and event management (SIEM) systems, network configuration management and user device monitoring programs should remain a foundational element of agencies’ initiatives. These solutions supplement AI by extracting information from the constant noise, allowing managers to focus on truly critical issues and pinpoint security threats.

 

Like AI tools, traditional network monitoring programs can analyze huge amounts of data. They complement this ability with continuous monitoring of user activity and network devices and provide automated threat intelligence alerts along with contextual information to help managers act on that information. Indeed, our survey indicated that these tools continue to play a significant role in keeping networks protected; for example, 44 percent of respondents using some form of device protection solution stated they are able to detect rogue devices within minutes.

 

In short, while AI is extremely useful, it should not be used exclusively. Instead, agencies should plan on augmenting existing best practices and the abilities of their staff with AI. Because although AI is good and here to stay, it’s the use of tried and true resources that will continue to lift up the machines as they rise.

 

Find the full article on SIGNAL.

Recently, ITWorld asked me to share some thoughts on "IT's Worst Addictions (And How to Cure Them)" (https://www.itworld.com/article/3268305/it-strategy/worst-it-addictions-and-how-to-cure-them.html). While I had shared a number of thoughts on the topic, space and format restricted the post so that only a couple of my ideas were printed. I wanted to share a more complete version with you here.

 

Sensitivity First

The tone of the original article was fairly light, using the word "addiction" in it's informal, rather than medical, context. This is understandable, and in that framework it's easy to lapse into AA-style thinking/language that conflates “IT addictions” with true  addictive behaviors and issues. I think doing so would be unfair to individuals (and their families,  friends, and coworkers) who are dealing with the very real and very serious impact of real addictions every day. I want to avoid trivializing something that has caused so much real trauma and pain, stolen years, and lost lives.

 

At the same time, I recognize that the obsessive behaviors we’re discussing can be remarkably similar to true addiction. Therefore, traditional conversations about addiction may be a source of guidance and wisdom for us.

 

In this post, I hope it's clear that this is a line I'm treading sensitively so that it's clear I'm not making light of a serious topic.

 

That said, over the course of my career I have noticed there are certain behavioral traps and anti-patterns that IT professionals fall into.

 

Let’s start with the IT pro obsessions that everyone thinks of that I have no desire to talk about, because they are well-known and have been chewed over thoroughly:

  • Everything to do with your phone (duh)
  • Communication channels (email, slack, work IM, etc.) (duh)
  • Coffee (duh)

 

Those are the obvious ones. Now let's look at some that are not so obvious:

 

Checking that screen one more time

What “that screen” is differs for each IT pro, but we all have that one thing we compulsively check. It could be the NOC dashboard; it could be the performance tracker for our “baby” system; it could be the cloud statistics. One would hope that for many, it’s the monitoring dashboard.

 

The latest and greatest

This refers to the compulsive need to update, whether we can make a valid financial justification for it or not. Again, the specific manifestation varies. It could be the latest phone, tablet, or laptop, the newest phone service (Google Fi, anyone?), the fastest home internet service, or pro-sumer grade equipment.

 

Monitors

(The hardware kind. I wouldn't ever say you could have too many SolarWinds monitors!)

There are very few IT pros who would say "no" to adding one (or four) more screens to their system, if they had the option. Better still, this desire does not hinge on how many screens one already has. More is always better.

 

Training/Certifications

As strange as it sounds, some IT pros have to be on top of the latest learning. That means lifetime subscriptions to online courses, obsessively upgrading certifications, and more.

 

News

Many IT pros are hopeless news junkies. It may manifest in a single area (politics, sports, tech trends, entertainment) or a combination of those, but the upshot is that we want to know the latest updates, whether they come on our mobile device, the third screen of our main computer, or good old fashioned wood pulp dropped at our front door each morning.

 

Collectibles

Once again, this obsession has a nearly infinite number of variations, including LEGO sets, watches, comic books, figurines. and more. Many IT pros have “that thing” that they go out of their way (and often break their budget) for.

 

(It should be noted that SolarWinds, with our ever-expanding array of buttons and stickers sporting unique ideas, happily feeds into this obsession.)

 

Community

Contrary to the stereotype of the nerdy loner, IT pros tend to be very dedicated to building and being part of a community (or several). While these communities often have an online component, most focus on (and culminate in) an IRL meet-up where they can share stories, offer support, and just bask in the glow of like-minded folks. These communities might be vendor-supported (SWUG, CiscoLive, Microsoft Ignite, etc); vendor-agnostic but professionally oriented (SQL Saturdays, DevOpsDays, PHP.ug, etc.), non-professional but infinitely geeky (D&D conventions and Comic Cons rank high on this list, but are by no means the only examples); or otherwise focus on cultures, medical challenges, car ownership, and more. The point is that IT pros often become deeply (some might say obsessively) involved in these communities and seeing them thrive.

 

The sharing corner

So what are YOUR compulsive IT distractions? Let me (and the rest of us) know in the comments below. Based on feedback, I may even pull together some thoughts on how we all can address the negative aspects of these behaviors and become better for the effort.

One of the biggest draws of the public cloud includes services like managed Kubernetes or server-less functions. Managed services like these enable IT organizations to consume higher-level services, which allow the IT organization to focus their efforts on opportunities to create business value from technology.

 

Configuration management tools like Chef, Puppet, and Ansible are central to modern cloud deployments. These tools enable an automated and consistent configuration of instances. This allows administrators to utilize cloud-native practices like immutable infrastructure, in which instances or servers are treated as low-value objects that can be easily recreated, as opposed to long-living servers that are carefully maintained.

 

Each of the popular configuration management tools utilize a server/agent construct in which the agent or node is managed by the configuration management server and pulls its configuration from the server. This introduces long living infrastructure in the form of the configuration management servers that must be maintained. Creating automation cookbooks or modules is challenging enough without having to provision and maintain the infrastructure required to facilitate the automation.

 

The benefits of managed configuration management are:

  1. Quickly test new versions - One of the challenges with configuration management tools is keeping up with the release cycle of the software. Utilizing a managed solution enables IT teams to quickly spin up a configuration management server and rapidly test new features with little hassle.
  2. Simplify upgrades - Once a new version of the configuration management tool has been tested and deemed production ready then the process of upgrading the server infrastructure begins. This requires a considerable amount of time and effort from the engineer. With a fully managed solution all of that time and effort is given back to the engineers.
  3. Enable isolated automation development environments - The ability to provision a production like environment along with the configuration management platform gives automation engineers an isolated environment to test their automation changes with a greater assurance that it won't break a shared environment.
  4. Scalable - Building the configuration management infrastructure that scales properly as the environment reaches thousands and tens of thousands of nodes is incredibly complex and makes things like upgrades that much more painful. The ability to utilize a single solution for ten nodes or ten thousand nodes is incredibly valuable.

 

Managed Deployment

The following solutions are managed deployments. This means the configuration management software company has added a deployment solution to the respective cloud provider's marketplace to allow the infrastructure to be provisioned with the click of a button.

 

Chef Automate

The Chef Automate platform is a configuration management platform created by Chef Software that provides an end-to-end solution for automation engineers to develop, test, and deploy their cookbooks. AWS and Azure offer a marketplace offering of a Chef Automate that can be deployed and ready to use within minutes.

 

Puppet Enterprise

Puppet Enterprise is a configuration management platform created by Puppet that provides a standard set configuration management functionality. Both AWS and Azure offer a marketplace deployment option.

 

Fully Managed

The following solutions are fully managed configuration management solutions such that the cloud provider manages your configuration management platform on your behalf and allows engineers to focus on automation cookbooks or modules.

 

AWS OpsWorks

AWS OpsWorks is a fully managed solution that completely abstracts the server infrastructure related to configuration management. This allows organizations to take full advantage of configuration management tools like Chef and Puppet without the administrative overhead of managing a server.

 

Azure Automation

Azure automation utilizes PowerShell DSC (Desired State Configuration) for managed configuration management, this fits perfectly in-line with Microsoft's vision for PowerShell to manage all things.

 

Ultimately the value proposition for configuration management tools is in the consistent automated configuration of the instances and not in managing the infrastructure to support the configuration management tools. Future posts will delve into other core operational aspects that are critical to cloud environments but in and of themselves don't provide much value.

I've been responsible for Disaster Recovery and Business Continuity Preparedness for my company for nearly eight years. In my role as Certified Business Continuity Professional, I have conducted well over a dozen DR exercises in various scopes and scale. Years ago, I inherited a complete debacle. Like almost all other Disaster Recovery Professionals, I am always on the lookout for better means and methods to further strengthen and mature our DR strategy and processes. So, I roll my eyes and chuckle when I hear about all of these DRaaS solutions or DR software packages.

 

My friends, I am here to tell you that to oversee a mature and reliable DR program the devil is in the details. The bad news is that there is no real quick fix, one-size-fits- all, magic wand solution available that will allow you to put that proverbial check in the "DR" box. Much more is needed. Just about all the DR checklists and white papers I've ever downloaded, at the risk of being harassed by the sponsoring vendor, pretty much give the same recommendations. What they neglect to mention are the specifics, the intangibles, the details that will make or break a DR program.

 

First, test. Testing is great, important, and required. But before you schedule that test and ask the IT department, as well as many members from your business units, to participate and waste a portion of their weekend, you darn well better be ready. Remember, it is your name that is on this exercise. You don’t want to have to go back however man months later and ask your team to give up another weekend to participate. Having to test processes only to fail hard after the first click will quickly call your expertise into question.

 

Second, trust but verify. If you are not in direct control of the mission critical service, then you audit and interview those who are responsible, and do not take their word for it when they say, "It'll work." Ask questions, request a demonstration, look at screens, walk through scenarios and always ask, "What if...?"

 

Third, work under the assumption that the SMEs aren't always available. Almost every interview has a Single Point of Failure (SPoF) by the third "What if?" question.

"Where are the passwords for the online banking interfaces stored?"

"Oh! Robert knows all of them." answered the Director of Accounts Payable.

"What if Robert is on vacation on an African safari?"

"Oh!" said the director. "That would be a problem."

"What if we didn't fulfill our financial obligations for one day? Two or three days. A week?" I asked.

"Oh! That would be bad. Real bad!"

Then comes the obligatory silence as I wait for it. "I need to do something about that." Make sure you scribble that down in your notes and document it in your final summary.

 

Fourth, ensure the proper programming for connectivity, IP/DNS, and parallel installations. This is where you will earn your keep. While the DRaaS software vendors will boast of simplicity, the reality is that they can only simplify connectivity so much. Are your applications programmed to use IP and not FQDNS? Does your B2B use FTP via public IP or DNS? And do they have redundant entries for each data center? The same question can be applied to VPNs. And don't forget parallel installations, including such devices as load balancers and firewalls. Most companies must manually update the rules for both PRD and DR. I've yet to meet a disciplined IT department that maintains both instances accurately.

 

Fifth, no one cares about DR as much as you do. This statement isn't always true, but always work under the assumption that it is. Some will care a whole lot. Others will care a little. Most will hardly care at all. It is your job to sell the importance of testing your company's DR readiness. I consistently promote our company's DR readiness, even when the next exercise isn't scheduled. My sales pitch is to remind IT that our 5,000+ employees are counting on us. People's mortgage payments, health insurance, children’s tuition all rely on paychecks. It is our duty to make sure our mission-critical applications are always running so that revenue can be earned and associates can receive those paychecks. This speech works somewhat because, let's face it, these exercises are usually a nuisance. While many IT projects push the business ahead, DR exercises are basically an insurance policy.

 

Sixth, manage expectations. This is pretty straightforward, but keep in mind that each participant has his/her own set of expectations. Whether it be the executives, the infrastructure teams and service owners, or the functional testers. For example, whenever an executive utters the words "pass" or "fail," immediately correct them by saying, "productive," reminding them that there is no pass/fail. Three years ago I conducted a DR exercise that came to a dead stop the moment we disconnected the data center's network connectivity. Our replication software was incorrectly configured. The replicators in DR needed to be able to talk to the Master in our production data center. All the participants were saying that the exercise was a failure, which triggered a certain level of panic. I corrected them and said, "I believe this was our finest hour!" Throughout your career, you should be prepared to correct people and help manage their expectations.

 

Seventh, delegate and drive accountability. Honestly, this isn't my strong suit. With every exercise that I have conducted, the lead-up and prep often finds dozens of gaps and showstoppers. What I need to be better at doing is holding the service owners accountable and delegate the responsibility of remediation when a gap or showstopper is identified. Instead, I often fall back on my 20+ year IT background and try to fix it myself. This consumes my time AND lets the service owners off the hook. For example, while prepping for my most recent exercise, I learned that a 2TB disk drive that contains critical network shares had stopped replicating months ago. The infrastructure manager told me that the drive was too big and volatile and that it was consuming bandwidth and causing other servers to fail their RPO. Once I got over my urge to scream, I asked what the space threshold was that needed to be reached to be able to turn the replication back on. I then asked him what he could do to reduce disk space. He shrugged and said, "I don't know what is important and what isn't." So, I took the lead and identified junk data and reduced disk space by 60 percent. I should have made him own the task, but instead took the path of least resistance.

 

Eight, documentation. Very few organizations have it. And those who do have documentation usually have only what is obsolete. The moment it is written down, some detail has changed. Also, what I have learned is that very few people refer to documentation after it is created.

 

So, there you have it. I have oodles more, but this article is long enough already. I hope you find what I shared useful in some capacity. And remember, when it comes to DR exercises, the devil is in the details.

When designing the underlying storage infrastructure for a set of applications, several metrics are important.

 

First, there’s capacity. How much storage do you need? This is a metric that’s well understood by most people. People see GBs and TBs on their own devices and subscription plans on a daily basis, so they’re well aware of it.

 

There’s also performance, which is a bit more difficult. People tend to think in terms of “slow vs. fast," but these are subjective metrics. For storage, the most customer-centric metric is response time. How long does it take to process a transaction? Response time is, however, the product of a few other metrics, including I/O operations per second, the size of an I/O, and the queue depth of other I/O in front of you.

 

Sizing a storage system

If you size a storage system to meet both capacity and peak performance requirements, you will generally have low response times. Capacity is easy; I need X Terabytes. Ideally, you’d also have some performance numbers to base the size of your system on, including expected IOps, I/O size, and read:write ratio to name a few. If you don’t have these performance requirements, a guesstimate is often the closest you can get.

 

With this information, and an idea of which response time you’re aiming for, it’s possible to configure a system that should be in the sweet spot. Small enough to make it cost effective, yet large enough that you can absorb some growth and/or unexpected peaks in performance and capacity. Depending on your organization and budget, you might undersize it to only cover the 95th percentile peak performance, or you might oversize it to facilitate growth in the immediate future.

 

Let it grow, let it grow… and monitor it!

Over time though, your environment will start to grow. Data sets increase and more users connect to it. Performance demands grow in step with capacity. This places additional demands on the system; demands that it wasn’t sized for initially.

 

Monitoring is crucial in this phase of the storage system lifecycle. You need to accurately measure the capacity growth over time. Automated forecasts will help immensely. Keep an eye on the forecasting algorithms and the statistics history. If the algorithm doesn’t use enough historical data, it might result in extremely optimistic or pessimistic predictions!

 

Similarly, performance needs to be guaranteed throughout the life of the array. The challenge with performance monitoring is that it’s usually a chain of components that influence each other. Disks connect to busses, which connect to processors, which connect to front-end ports, and you need to monitor them all. Depending on the component that’s overloaded, you might be able to upgrade it. For example, connect additional front-end ports to the SAN or upgrade the storage processors. At some point though, you’re going to hit a limit. Then what?

 

Failure domain

Fewer, larger systems have several advantages over multiple smaller arrays. There are fewer systems to manage, which saves you time in monitoring and day-to-day maintenance. Plus, there’s fewer losses, as silos tend to not be fully utilized.

 

One important aspect to consider, though, is the failure domain. What's the impact if a system or component fails? Sure, you could grow your storage system to the largest possible size. But if it fails, how long would you need to restore all that data? In a multi-tenancy situation, how many customers would be impacted by a system failure? Licenses for larger systems are sometimes disproportionally more expensive than their smaller cousins; does this offset the additional hassle of managing multiple systems? There’s multiple approaches possible. Let me know which direction you’d choose: fewer, bigger systems, or multiple smaller systems!

This week's Actuator comes to you from an unseasonably cold spring here in New England. It snowed this week, making for a chilly Boston Marathon. Here's hoping we get rewarded later with a few extra weeks of summer warmth through October.

 

As always, here are some links from the Intertubz that I hope will hold your interest. Enjoy!

 

In a Leaked Memo, Apple Warns Employees to Stop Leaking Information

Apple, one of the least security focused manufacturers on the planet, has issues keeping secrets. Sounds about right.

 

Strong feedback loops make strong software teams

Not just software, but all teams. An important part of the feedback loop is how a person handles the feedback. Different people have different tolerance levels for criticism of their work. It’s important to remember that when building teams, or assigning tasks.

 

Verizon 2018 Data Breach Investigations Report: Tales of dirty deeds and unscrupulous activities

Verizon published their annual data breach report, showing us details about who is behind the organized attacks, their motivations, and the industries targeted. This report should be required reading for everyone working in IT.

 

Waymo seeks permission to test fully driverless cars in California

It’s been a while since I talked about autonomous cars, so here’s a quick update. If you didn’t know, Waymo is a division of Alphabet, which is Google’s parent company. So, Waymo is Google. Let that sink in and think about the data that Google will collect about you and your travel habits.

 

Mark Zuckerberg's Congressional testimony showed that a bedrock principle of online privacy is a complete and utter fraud

Informed consent is the loophole that internet companies have exploited for decades. Maybe now we can work on closing that loophole.

 

User Privacy Isn't Solely a Facebook Issue

A nice reminder that data security and privacy is a bigger issue. Your ISP has the most control over your security and privacy online.

 

'Dear Mark, this is why I hate you.' An open letter to Zuckerberg

To opt-out of Facebook infringing on your privacy, you must first sign up to use Facebook. I can’t understand why more people aren’t outraged to the point that we just shut Facebook down completely.

 

Speaking of spring, now's a good time to get started on that re-wiring project you've been putting off for a while.

 

By Paul Parker, SolarWinds Federal & National Government Chief Technologist

 

There’s a lot of attention being paid to innovative and groundbreaking new technologies like machine learning (ML), artificial intelligence (AI), and blockchain. But does the reality surrounding these technologies match the hype?

 

That’s one of the key questions featured in the Public Sector version of the SolarWinds IT Trends Report 2018: The Intersection of Hype and Performance. We surveyed more than 100 IT practitioners, managers, and directors from public sector organizations. Our goal was to gauge their perspectives on the technologies that are making the biggest differences for their agencies.

 

Below are some of the top findings from this year’s report. Complete details can be found here.

 

Hybrid IT and Cloud Remain Top Priorities

 

Despite the excitement surrounding emerging technologies, IT professionals are continuing to prioritize investments related to hybrid IT and cloud computing. In fact:

 

  • 97 percent of survey respondents listed hybrid IT/cloud among the top five most important technologies to their organization’s IT strategies
  • 50 percent listed hybrid IT/cloud as their most important technologies

 

Automation, the Internet of Things, Big Data, and Software-Defined “Everything” are Gaining Steam

 

According to the respondents, the rest of the top five most important tools include:

 

  • Automation (#2)
  • Big data analytics (#3)
  • The Internet of Things (#4)
  • Software-defined everything (#5)

 

Automation and big data analytics scored highly in the category “most important technologies needed for digital transformation over the next three to five years.”  Respondents were also bullish on the productivity and efficiency benefits of automation and big data analytics, as well as their potential to deliver a high ROI. Still, hybrid IT/cloud continued to set the pace as the top spot in all categories.

 

Reality Hasn’t Caught Up with the Hype around AI, Machine Learning, Blockchain, and Robotics

 

While there’s been a great deal of media attention and interest in these four technologies, the reality is, for now, IT professionals are focusing their attention on proven technologies that can deliver an immediate value in their predominantly complex hybrid IT environments. Respondents do not deny the importance of AI, machine learning, blockchain, and robotics technologies. They are simply not designating them as “mission critical”—at least, not yet.

 

However, these solutions are expected to gain increasing prominence over the next few years. Namely:

 

  • 34 percent of respondents believe that AI will be the primary technology priority over the next three to five years
  • 36 percent feel the same about machine learning
  • Other technologies mentioned include robotics (by 14 percent of respondents) and blockchain (7 percent)

 

Can’t Contain Containers

 

Conversely, open source containers turned out to be big movers this year. Forty-four percent of respondents ranked containers as one of their most important technology priorities today. That’s a huge jump over last year’s report, in which just 16 percent of IT professionals indicated they were working on developing containerization skills 2017 IT Trends Report.

 

Containers can help organizations increase agility while addressing some challenges introduced by hybrid IT/cloud environments. As noted by respondents, those challenges include:

 

  • Environments that are not optimized for peak performance (46 percent of respondents)
  • Significant time spent reactively maintaining and troubleshooting IT environments (45 percent of respondents)

 

Technology Adoption Barriers Remain

 

Survey respondents also identified some key challenges standing in the way of technology adoption. In addition to IT environments not operating at optimal levels, many IT professionals cite a lack of organizational strategy and inadequate investment in areas like user training.

 

  • 44 percent of respondents ranked inadequate organizational strategy as one of the top three obstacles to better optimization
  • User and technology training was also considered a barrier by 43 percent of respondents

 

More information on the SolarWinds annual IT Trends Report and the complete public sector results can be found here.

 

Are these findings consistent with what you’re seeing in your organization? Or, are you experiencing something completely different? Share your perspective in the comments.

Enterprise networks and IT environments can be a very unique type of organization to work with. No matter what division is involved, change management can be a stressful thing for an IT environment if not handled correctly. If proper planning is made, then changes can go smoothly! Regardless, there are some nuances you want to keep in mind and problems to be sure to avoid.

 

Other Teams Within The Change Management Process

 

When it comes to change management, the team making the change generally has all of their ducks in a row. They have the change thought through, tested, and planned out. When it comes to making that change, they know who is doing what and exactly what needs to happen. Then the curveballs get thrown. How many times have you come across a situation like this:

 

The network team wants to make a change bringing down the edge internet routers while the server admins are doing a mail server migration to the cloud for a group of users. The loss of an edge connection causes the mailbox upload to stop and the server admins to exceed their allotted downtime window to complete the migration.

 

This is probably not a scenario that is uncommon to many people working in enterprise IT environments. So many times, certain divisions can become very narrow-minded and have a lack of regard for other teams under the greater scope of the IT team as a whole. Those narrow-minded teams are choosing to only focus on their changes and projects. Breakdowns in communication like this have a tendency to escalate into larger, more difficult issues. The reality of the situation is that the teams involved may not even be part of your organization and may include providers like cloud vendors, for example. All of these teams, both internal and external, need to be included in the communication process when it comes to planning your changes.

 

Know What You Are Affecting Downstream

 

It’s no surprise that enterprise IT structures can be very complex topologies with many different technologies in play. When it comes to change management, there are so many devices that rely on each other, that serious thought needs to be given when planning upcoming technical changes whether they are on the network, server, or desktop side of things. Take network changes for instance. Simple additions of routes can fix issues with certain network devices while breaking end to end connectivity for others. Dynamic routing protocols can amplify these minor changes as they are shared between devices. Server environments can have this issue as well. Virtual datacenter changes can affect multiple physical hosts containing a wide range of virtual servers. Again, even minor changes can be amplified to affect a large number of devices and users. The due diligence that going into planning IT changes ensures that you as the admin are fully aware of all devices that will be affected when the changes are made.

 

Be Aware of your Hybrid Environment

 

Local IT changes are one thing. You can make your changes and can always have local "console" access if needed in the event of something going wrong. In hybrid IT environments, this may not always be the case. Remotely hosted servers such as web servers or cloud hosted domain controllers need special consideration when it comes to the administration process. On-premises processes, such as restoring from a backup, can be very different when taking place on a device hosted in the cloud. Being aware of the affected devices for a change, their location, and the details of their management is ever more important as hybrid IT environments are becoming so common.

 

Preparing a Strong Change Management Plan

 

My personal strategy for change management is made up of four particular steps that I always make sure to follow to ensure a smooth change process.

 

  1. Have a documented scope of work.
  2. Communicate the process to all affected parties ahead of time: on-premises and remote.
  3. Complete prep work beforehand where possible.
  4. Always have a backup plan and or a rollback process.

 

Documented scopes of work ensure that everyone is on the same page and all of the steps that need to be accomplished during the change are laid out ahead of time. This ensures everyone is on the same page and all required tasks get taken care of and not forgotten. Once this plan is developed, you can effectively communicate this process to all affected parties. As long as this is communicated in advance other affected users and teams can send any questions or concerns they may have. With maintenance windows getting smaller and smaller prep work can be very beneficial when it comes to the change management process. This can include scripting changes, downloading updates ahead of time, and even scheduling automated tasks. Focusing on these tasks ahead of time can save you valuable time and effort when the window for your change takes place. Lastly, always have a backup plan. This could be as easy as a simple configuration or data backup or even be a bit more involved like a full rollback process. Either way, make sure that if things go south, you have a process laid out that you can follow. This ensures you are never in a situation where there is an “I don’t know what to do” moment, and that’s what is important.

 

The change management process does not need to be a difficult thing no matter what size your organization might be. With a little bit of planning and some attention to detail, you can ensure that your maintenance windows are stress free and go off without a hitch!

By Paul Parker, SolarWinds Federal & National Government Chief Technologist

 

Here is an interesting article from my colleague Joe Kim, in which he explores the impact of software defined networking on agency networks.

 

It’s ironic, but true: software-defined networking (SDN) is tough to define. Perhaps that’s why agency network administrators are still trying to wrap their minds around the concept of SDN, despite its known benefits. They know that the tools will likely make their lives as network managers much easier and provide greater agility, security, and cost-savings. But still they ask: How do I approach SDN and make it work for my agency?

 

SDN implementation can pose significant challenges. Millions of dollars’ worth of legacy network equipment, accumulated over the years and well-integrated into the IT infrastructure, need to be replaced. But there is the potential for a huge payoff on the other side. Adopting SDN sets your agency up for a more efficient future, and lays the groundwork for greater innovation at less cost.

 

Figuring out if now is the time to begin building toward that future should be undertaken in the same manner as other major technology initiatives: through testing and analysis. Before diving into the SDN waters, it’s a good idea to set up a test environment, if possible. Simulate production so you can gain a better understanding of whether SDN is appropriate for your agency.

 

Monitor network performance for better quality of service

 

Two of the big reasons agencies are implementing SDN is to remove the potential for human error and simplify network management. The idea is to make networks more automated to deliver faster, more reliable, and overall better quality of service.

 

Monitoring the network during testing will provide some insight into whether this goal will be achievable through SDN. Tag what is affected by SDN and what is not, and closely track availability and uptime. Use network performance and configuration monitoring tools (which you may already have in your arsenal) to assess your SDN deployment. If you see that SDN is positively impacting uptime, you’ll feel more comfortable making the move.

 

Understand that SDN will cost you, but it could also save you

 

The cost of migrating toward SDN will, of course, vary depending on the agency and the scope of its needs, but one thing is certain: it’s going to be expensive. In addition to requiring additional employee hours, SDN requires a deep analysis to determine necessary hardware updates. Layered on top of that is the purchase price of the SDN solutions themselves. Over time, costs can easily run into the hundreds of thousands of dollars, stretching federal IT budgets.

 

Run some numbers before embarking on a full-scale migration. Ascertain whether the cost of managing SDN is less than the costs incurred managing a manual network environment. Include the cost of set-up time and the processes that are being automated in this analysis. If the numbers work in SDN’s favor, you have a good bottom-line reason for taking the SDN plunge.

 

Understand the risks so you can be prepared for them

 

Even after you’ve decided to move forward, know that SDN migrations can be fraught with risk. Therefore, they should not be done in a wholesale, “big bang” type of manner, but accomplished using a piecemeal, highly thoughtful approach. This makes the migration easier while helping to preserve uptime as much as possible.

 

Even so, sooner or later changes on the core network will inevitably impact certain services, such as switching or routing. When this happens, there will be some downtime — unavoidable when one turns their network over to SDN and begins to rely on changes being made without human intervention. Knowing this in advance, however, can help you plan for it, making it easier to navigate these bumps in the road heading to greater agility and nimbleness.

 

Adopt additional best practices to optimize your SDN

 

There are other best practices you should consider adopting after you’ve begun implementing SDN in your agency. Teams should get certified on SDN or get functional training on how to work in an SDN environment, which is much different than what most professionals are accustomed to managing. Establish a protocol of backing up policies on a regular basis, as opposed to just backing up configurations of network devices. Employ monitoring as an ongoing discipline, continuously and automatically analyzing your network for potential issues that could prove harmful so that you can react to them quickly.

 

Above all, do not be afraid to experiment. Understand that mistakes will inevitably occur, and you will probably fail at least some of the time. Learn from these times. Improve upon processes. Make things better.

 

After all, that’s how we achieve progress. Perhaps for you, that progress will include a future network that is more agile, secure, reliable, and software-defined.

 

Find the full article on our partner DLT’s blog Technically Speaking.

A practice leader denotes an IT professional who cannot only walk the walk and talk the talk; but also build and lead teams to do the same. Practice leaders have to be the calm within the ever-changing storm. Everyone is talking about technologies and forgetting that friction comes from two sources – people and process.

 

Tech is easy, but people are non-linear, differential equations that are hard to solve since they change over time. People influence processes. Think of processes as forces that can be directed positively or negatively by people. People pollute processes with political motives, selfish decisions, and personal biases. These inheritances introduce friction into processes and further affect future personal interactions.

 

Leadership sets the edge for an organization’s culture. It is the driving force for either good or bad. Where will you lead your organization? And will it retain and attract people that move the organization’s culture forward while continuing to win business? Leaders need to have answers to these questions.

 

Becoming a practice leader is one path that an IT professional can take in their career. It combines the technical expertise and experience to deliver services into practice with the ability to communicate and lead through complex scenarios. Is practice leadership a path that you are taking? What are some of the things that have worked for you and your teams? Conversely, what are some of the things that have not worked? Let me know in the comment section.

The battle of the legends has come to an end.

Though we started with 33 only one could ascend.

Our winner is a beast who fights fire with flames.

Puff, Maleficent, Smaug, & Toothless are a few of the famed.

Hundreds of you jumped on the bandwagon,

The winner of your votes was none other than the Dragon!

 

Dragon won the final round claws down!

Nessie only managed to win 22% of the vote, and was a clear underdog from the start of this battle royal.

Dragon was a force of nature throughout this bracket and easily extinguished the competition each round.

 

Here’s a look back at Dragon’s other bracket victories:

Fairy Tales Round 1: Dragon vs. Unicorn

Fairy tales round 2: Leprechaun vs Dragon

Fairy tales round 3: Dragon vs Phoenix

Gruesomes vs Fairy Tales Round 4: Kraken vs Dragon

 

What are your final thoughts on this year’s bracket?

 

Do you have any bracket theme ideas for next year?

 

Tell us below!

This week's Actuator comes to you from Austin, where I am visiting HQ for a few days in an effort to escape the cold Spring we are having in New England. Here's hoping the cold doesn't follow me like last time when I brought that ice storm.

 

As always, here are some links from the Intertubz that I hope will hold your interest. Enjoy!

 

Let's Stop Giving Retailers a Free Pass on Data Breaches

This. Companies will never change until they have a financial incentive to change.

 

Stop Talking About IoT Security And Do Something About It

Maybe we could stop writing blog posts about IoT security and start providing lists of companies for us to avoid (see above).

 

Microsoft plans to invest $5 billion in IoT over the next 4 years globally

And there is the first shoe dropping, as a major cloud provider pushes a truckload of cash into IoT. I am certain that Microsoft will spend dollars on improving IoT security.

 

The space race is over and SpaceX won

This is great news for Elon, and it’s likely going to serve as a distraction from the current financial status of Tesla.

 

MIT is making a device that can 'hear' the words you say silently

This is a horrible idea. Someone needs to tell MIT to shut this down. Maybe if we all think it, they will get the message.

 

Facebook reported in 7 countries for breaking European privacy law

And just when Zuck thought things couldn’t get worse.

 

T-Mobile Austria is OK with Storing Passwords Partly in Clear Text

Just when you thought we had moved beyond corporations doing dumb things in public, T-Mobile in Austria wants you to know that they are also amazingly good at social media.

 

I like this approach to GDPR compliance:

By Paul Parker, SolarWinds Federal & National Government Chief Technologist

 

Here is an interesting article from my colleague Joe Kim, in which he points out the shift in responsibilities caused by hybrid IT.

 

Moving to a hybrid environment, where part of your infrastructure is in the cloud while the rest of it remains on-premises, may require a far greater shift in responsibilities for the federal IT team than anticipated.

 

In a traditional on-premises environment, the federal IT manager needs three things to be successful: responsibility, accountability, and authority.

 

In a hybrid IT environment, the federal IT manager is still responsible and accountable. However, part of the cloud dynamic is that a manager’s level of authority and control will vary depending on the cloud provider and its offerings. But what about authority over the network you use to access cloud resources? A carrier’s network usually won’t give you the authority to make changes.

 

Here’s where the issue of visibility comes in. The only way to mitigate the loss of pure authority over a hybrid network is to have visibility into the details of its performance and health. This visibility is key when troubleshooting and dealing with service providers and carriers who, when service is slow or has failed, often revert to the default answer, “Everything looks fine on this end.”

 

The importance of visibility

 

Why is visibility key? Because carriers are not always up front about what they’re seeing, or whether they’re even looking into a slowdown within your infrastructure.

 

Let’s say your service is not responding. A data center manager can check the internal network, and it will probably be just fine. That manager can then call the Software as a Service (SaaS) provider, who will likely say everything is fine on their end as well. The next step is to initiate a support call to the internet service provider (ISP) to find out whether or not the problem is somewhere in the middle, within the service provider’s realm.

 

The support person will likely say, “Everything looks fine here,” which is a challenge. Exacerbating the challenge is the federal IT manager’s inability to see into the service provider’s network.

 

The federal IT manager must be able to see any latency introduced by any device as packets flow through it. This information, both for the current state and historical usage, will show where the packets are going once they leave your premises, as well as how fast they’re traveling.

 

Complete visibility—a necessity for a successful hybrid IT transition—comes in the form of IT monitoring tools that provide a view of your entire environment: on-premises, in the cloud, and everything in between. These tools must be able to show a variety of device types (routers, load balancers, storage, servers, etc.) from a range of vendors.

 

Two last pieces of advice. First, be sure the IT monitoring tools you choose account for the virtual layer, whether it’s virtual servers or virtual networking, as much as the physical layer. Second, because your IT environment will only grow larger and more complex as it extends further into the cloud, the tools you select must be able to scale with the number and type of devices.

 

Find the full article on Federal Technology Insider.

I was off last week to celebrate Pesach / Passover so I thought it would be a good time to offer you a taste of an upcoming eBook I'm working on, "The Four Questions of Monitoring," which uses that holiday both as its inspiration and as a thematic framework. I'll be publishing snippets of it here and there.

 

**************************

(image courtesy of Manta)

 

Once a year, Jews around the world gather together to celebrate Pesach (also known as "Passover,” "The Feast of Matzah,” or even "The Feast of the Paschal Lamb”). More a ceremonial meal than actual "feast,” this gathering of family and friends can last until the wee hours of the morning. The dinnertime dialogue follows a prescribed order (or "seder,” which actually means "order" in Hebrew) that runs the gamut from leader-led prayers to storytelling to group singalongs to question-and-answer sessions and even—in some households—a dramatized retelling of the exodus narrative replete with jumping rubber frogs, ping-pong ball hail stones, and wild animal masks.

 

At the heart of it all, the Seder is designed to do exactly one thing: to get the people at the table to ask questions. Questions like, "Why do we do that? What does this mean? Where did this tradition come from?" To emphasize: the Seder is not meant to answer questions, but rather provoke them.

 

As a religion, Judaism seems to love questions as much (or more) than the explanations, debates, and discussions they lead to. I'm fond of telling co-workers that the answer to any question about Judaism begins with the words, "Well, that depends..." and ends two hours later when you have three more questions than when you started.

 

The fact that I grew up in an environment with such fondness for questions may be what led me to pursue a career in IT, and to specialize in monitoring. More on that in a bit.

 

But the ability to ask questions is nothing by itself. An old proverb says, "One fool can ask more questions than seven wise men can answer." And that brings me back to the Pesach Seder. Near the start of the Seder meal, the youngest person at the table is invited to ask the Four Questions. They begin with question, "Why is this night different from all other nights?" The conversation proceeds to observe some of the ways that the Pesach meal has taken a normal mealtime practice and changed it so that it's off-kilter, abnormal, noticeably (and sometimes shockingly) different.

 

Like many Jewish traditions, there is a simple answer to the Four Questions. At the surface, it's done to demonstrate to children that questions are always welcome. It's a way of inviting everyone at the table to take stock of what is happening and ask about anything unfamiliar. But it doesn't stop there. If you dig just a bit beneath that easy surface reasoning you'll find additional meaning that goes surprisingly deep.

 

In Yeshivah — a day-school system for Jewish children that combines secular and religious learning — the highest praise one can receive is, "Du fregst un gut kasha," which translates as, "You ask a good question.”

 

This is proven out in a story told by Rabbi Abraham Twersky, a deeply religious psychiatrist. He says that when he was young, his teacher would relish challenges to his arguments. In his broken English, the teacher would say, “You right! You 100 prozent right!! Now, I show you where you wrong!”

 

The impact of this culture of questioning does not limit itself to religious thinking. Individuals who study in this system find that it extends to all areas of life, including the secular.

 

When asked why he became a scientist, Isidor I. Rabi, the Nobel laureate in physics, answered,

''My mother made me a scientist without ever intending it. Every other mother in Brooklyn would ask her child after school, 'So? Did you learn anything today?' But not my mother. She always asked me, 'Did you ask a good question today?' That difference—asking good questions—made me become a scientist!''

 

The lesson for us, as monitoring professionals, is twofold. First, we need to foster that same sense of curiosity, that same willingness to ask questions, even when we think the answers may be a long time in coming. We need to question our own assumptions. We need to relish the experience of asking so that it pushes us past the inertia of owning an answer, which is comfortable. And second, we need to find ways to invite questions from our colleagues, as well. Like the Seder, we may have to present information in a way that is shocking, noticeable, and engaging, so that people are pushed beyond their own inherent shyness (or even apathy) to ask, "What is THAT all about?”

 

The deeper message of the Passover seder speaks to the core nature of questions, and the responsibility of those who attempt to answer. "Be prepared,” it seems to say. "Questions can come from anywhere, about anything. Be willing to listen. Be willing to think before you speak. Be willing to say, 'I don't know, but let's find out!' You must also be willing to look past trite answers. Be ready to reconsider, and to defend your position with facts. Be prepared to switch, at a moment’s notice, from someone who answers, to someone who asks."

 

Once again, I believe that being exposed to this tradition of open honesty and curiosity is what makes the discipline of monitoring resonate for me.

Filter Blog

By date: By tag:

SolarWinds uses cookies on its websites to make your online experience easier and better. By using our website, you consent to our use of cookies. For more information on cookies, see our cookie policy.