AI Never Sleeps in the Data Center

A data center typically employs between 30-200 people depending on the size and the intended project use. But only a fraction of those folks run the IT side. Besides swapping gear—the infamous “remote hands”—there isn’t much to do as long as there aren’t any outages, and rumor has it a data center is a nice place to sleep—it’s dark, cool, and the deep humming noise can have a pacifying impact on some.
In fact, the cost of human resources around a data center is mostly negligible, compared to the other running costs like energy and connectivity.

As data centers are used to operate with low a headcount, automation is advanced already. And while many traditional businesses are still exploring options as of what to automate, in a data center, it’s a bit simpler, as there are clearly defined KPIs, like lowering the cycle-time, the frequency/rate of deployments, and the reduction of the average handle time for individual processes.

Artificial Intelligence Promises Further Cost Reductions

But how, and where? The short answer is probably “in cutting expenses and optimizing spending,” but let’s look at a few things.

For the obvious tasks, solutions exist already: artificial intelligence (AI) in data centers helps speed up root cause analysis in case of failure, but it’s also capable of applying predictive analysis to prevent hardware failures, or to be more precise, it will point out when to replace gear before the probability of a failure reaches a threshold.

Infrastructure utilization benefits a lot from AI, and a machine could optimize utilization much better than a human as it can look at more values in less time, and the same applies for features like capacity planning.

Service management can be a mixed bag. Some tasks, like the initial provisioning and ongoing orchestration, are highly automated already and can be fully assigned to an AI with little to no concerns. But other tasks might sound simple at first, but are in fact quite involved, like configuration and patch management. Many variables make it a case-by-case decision for when and what to patch, and that’s not easy to get for a machine.

Mid-term, AI could run patches in a test environment and simulate real user behaviour on itself, to assess if the patch would have a negative impact on production use, but we’re not quite there yet.

Somewhere in between sit special use cases, for example health and performance monitoring, where machine learning is already applied, so we can call it observability, and proper AI is the next step up. 
AIOps is the buzzword on this topic.

Given the size of the market, there aren’t many solutions available, but it’s still a complex task to find the right one. Many organizations still seek “the best” instead of taking a step back and asking themselves what they want to achieve. But the problem is, even evaluating an AI solution can be a costly project, so an AI-in-a-box system using some form of swarm intelligence is probably better than developing something from scratch just to have a bespoke version, even if there’s a framework available.
It’s definitely beneficial to have proper monitoring in place even during the evaluation phase of the AI solution.

When There’s Light, There’s Shadow

As with each new technology, or new use cases, there are valid concerns but also unnecessary resistance to fight.
For automation, it was mostly centered around the lack of time to develop and test automation scripts, particularly if there wasn’t enough expertise available.
Trust was an additional concern, and that’s shared with deploying AI. Most of us probably said “If I want a job done right, I have to do it myself” to ourselves at least once, and this applies when delegating tasks to a machine as well.

But with AI, there are suddenly additional fears around losing control. We’re not in The Matrix, but trust issues are real.

Integration Woes

Often, the complexity of integrating a solution and link it with what’s already there is a challenge. It’s more than beneficial to have developers at hand, even beyond the deployment phase.
An interesting technology to explore is Low or No Code. The latest innovations allow a non-dev admin to create simple applications for further customization without a deep dive into coding.

Some building blocks already allow to bridge infrastructure with service management with nothing more than a flow-chart user interface. This will open the road for further automation, and it won’t take long until we see a Low-Code-AI-interface. Now that’s a new buzzword I just made up.

The challenge for data center management is to assess the overall cost of the low-code platform as well as the time the network or infrastructure administrator wouldn’t be available for their usual tasks and compare it with the cost of a real developer on a temporary contract base.

  • Deutsche Version:

  • There has been and will continue, with no doubt a greater pace of tooling which can deliver a low code/no code framework for many IT tasks. These purpose built process automation applications exist already, but as you indicated adoption has been a slow and gradual affair, which could be attributed to many reasons; trust, knowledge, cost and fear all part of my experience when talking to people.

    Baby steps are perfectly possible in this sector, where AI does not need to mean full automation. It does not necessarily mean that the Orion product identifies an issue that exists, makes a trigger call to solution to perform automatic remediation and sit back and watch Orion identify the issue no longer exists. AI comes in to things at the monitoring layer, where it is taking multiple values/conditions and making sense of what is going on, with the AI providing the root cause. The remediaton layer, does not need to be fully automated, it can simply be a harvesting of more data to bring clarity and focus in to the root cause, or intelligent notification.

    I for one am looking forward to AI bringing many improvements in to how monitoring and the activities that follow monitoring data.

THWACK - Symbolize TM, R, and C