1 2 3 Previous Next

Geek Speak

2,871 posts

Garry Schmidt first got involved in IT Service Management almost 20 years ago. Since becoming the manager of the IT Operations Center at SaskPower, ITSM has become one of his main focuses. Here's part 1 of our conversation.

 

Bruno: How has the support been from senior leadership, your peers, and the technical resources that have to follow these processes?

Garry: It varies from one group to the next, from one person to the next, as far as how well they receive the changes we’ve been making to processes and standards. There have been additional steps and additional discipline in a lot of cases. There are certainly those entrenched in the way they’ve always done things, so there’s always the organizational change challenges. But the formula we use gets the operational people involved in the conversations as much as possible—getting their opinions and input on how we define the standards and processes and details, so it works for them. That’s not always possible. We were under some tight timelines with the ITSM tool implementation project, so often, we had to make some decisions on things and then bring people up to speed later on, unfortunately. It’s a compromise.

 

The leadership team absolutely supports and appreciates the discipline. We made hundreds of decisions to tweak the way we were doing things and the leadership team absolutely supports the improvement in maturity and discipline we’re driving towards with our ITSM program overall.

 

Often, it’s the people at the ground floor executing these things on a day-to-day basis that you need to work a little bit more with to show them this is going to work and help you in the long run rather than just extra steps and more visibility.

 

People can get a little nervous if you have more visibility into what’s going on, more reporting, more tracking. It exposes what’s going on within the day-to-day activities within a lot of teams, which can make them nervous sometimes, until they realize the approach we always try to take is, we’re here to help them be successful. It’s not to find blame or point out issues. It’s about trying to improve the stability and reputation of IT with our business and make everybody more successful.

 

Bruno: How do you define procedures so they align with the business and your end customers? How do you know what processes to tweak?

Garry: A lot of it would come through feedback mechanisms like customer surveys for incidents. We have some primary customers we deal with on a day-to-day basis. Some of the 24/7 groups (i.e., grid control center, distribution center, outage center) rely heavily on IT throughout their day, 24/7. We’ve worked to establish a relationship with some of those groups and get involved with them, talk to them on a regular basis about how they experience our services.

 

We also get hooked in with our account managers and some of the things they hear from people. The leadership team within Technology & Security (T&S) talks to people all the time and gets a sense of how they’re doing. It’s a whole bunch of different channels to try and hear the voice of the people consuming our services.

 

Metrics are super important to us. Before our current ITSM solution, we only had our monitoring tools, which gave us some indication of when things were going bad. We could track volumes of alarms but it’s difficult to convert that into something meaningful to our users. With the new ITSM tool, we’ve got a good platform for developing all kinds of dashboards and reports. We spend a fair amount of time putting together reports reflective of the business impact or their interests. It seems to work well. We still have some room to grow in giving the people at the ground level enough visibility into how things are working. But we’ve developed reports available to all the managers. Anybody can go look at them.

 

Within our team here, we spend a fair amount of time analyzing the information we get from those reports. Our approach is to not only provide the information or the data, but to look at the picture across all these different pieces of information and reports we’re getting across all the operational processes. Then we think about what this means first of all and then what should we do to improve our performance. We send a weekly report out to all the managers and directors to summarize the results we had last week and the things we should focus on to improve.

 

Bruno: Have these reports identified any opportunities for improvement that have surprised you?

Garry: One of the main thrusts we have right now is to improve our problem management effectiveness. Those little incidents that happen all over the place, I think they have a huge effect on our company overall. What we’ve been doing is equating that to a dollar value. As an example, we’re now tracking the amount of time we spend on incident management versus problem management. You can calculate the cost of incidents to the company beyond the cost of the IT support people.

 

In August, I remember the number for that. We spent around 3,000 hours on incident management. If you use an average loaded labor rate of $75 per person, it works out to $220,000 in August. That’s a lot of money. We’re starting to track the amount of time we’re spending on problem management. In August, two hours were logged against problem management. So, $220,000 versus 150 bucks we spent trying to prevent those incidents. When you point it out that way, it gets people’s attention.

Good morning! By the time you read this post, the first full day of Black Hat in London will be complete. I share this with you because I'm in London! I haven't been here in over three years, but it feels as if I never left. I'm heading to watch Arsenal play tomorrow night, come on you gunners!

 

As always, here's a bunch of links I hope you find interesting. Cheers!

 

Hacker’s paradise: Louisiana’s ransomware disaster far from over

The scary part is that the State of Louisiana was more prepared than 90% of other government agencies (HELLO BALTIMORE!), just something to think about as ransomware intensifies.

 

How to recognize AI snake oil

Slides from a presentation I wish I'd created.

 

Now even the FBI is warning about your smart TV’s security

Better late than never, I suppose. But yeah, your TV is one of many security holes found in your home. Take the time to help family and friends understand the risks.

 

A Billion People’s Data Left Unprotected on Google Cloud Server

To be fair, it was data curated from websites. In other words, no secrets were exposed. It was an aggregated list of information about people. So, the real questions should now focus on who created such a list, and why.

 

Victims lose $4.4B to cryptocurrency crime in first 9 months of 2019

Crypto remains a scam, offering an easy way for you to lose real money.

 

Why “Always use UTC” is bad advice

Time zones remain hard.

 

You Should Know These Industry Secrets

Saw this thread in the past week and many of the answers surprised me. I thought you might enjoy them as well.

 

You never forget your new Jeep's first snow.

Omar Rafik, SolarWinds Senior Manager, Federal Sales Engineering

 

Here’s an interesting article by my colleague Jim Hansen about improving security by leveraging the phases of the CDM program and enhancing data protection by taking one step at a time.

 

The Continuous Diagnostics and Mitigation (CDM) Program, issued by the Department of Homeland Security (DHS), goes a long way toward helping agencies identify and prioritize risks and secure vulnerable endpoints.

 

How can a federal IT pro more effectively improve an agency’s endpoint and data security? The answer is multi-fold. First, incorporate the guidance provided by CDM into your cybersecurity strategy. Secondly, and in addition to CDM—develop a data protection strategy for an Internet of Things (IoT) world.

 

Discovery Through CDM

 

According to Cybersecurity and Infrastructure Security Agency (CISA), the DHS sub-agency that has released CDM, the program “provides…Federal Agencies with capabilities and tools to identify cybersecurity risks on an ongoing basis, prioritize these risks based on potential impacts, and enable cybersecurity personnel to mitigate the most significant problems first.”

 

CDM takes federal IT pros through four phases of discovery:

 

What’s on the network? Here, federal IT pros discover devices, software, security configuration settings, and software vulnerabilities.

 

Who’s on the network? Here, the goal is to discover and manage account access and privileges; trust determination for users granted access; credentials and authentication; and security-related behavioral training.

 

What’s happening on the network? This phase discovers network and perimeter components; host and device components; data at rest and in transit; and user behavior and activities.

 

How is data protected? The goal of this phase is to identify cybersecurity risks on an ongoing basis, prioritize these risks based upon potential impacts, and enable cybersecurity personnel to mitigate the most significant problems first.

 

Enhanced Data Protection

 

A lot of information is available about IoT-based environments and how best to secure that type of infrastructure. In fact, there’s so much information it can be overwhelming. The best course of action is to stick to three basic concepts to lay the groundwork for future improvements.

 

First, make sure security is built in from the start as opposed to making security an afterthought or an add-on. This should include the deployment of automated tools to scan for and alert staffers to threats as they occur. This type of round-the-clock monitoring and real-time notifications help the team react more quickly to potential threats and more effectively mitigate damage.

 

Next, assess every application for potential security risks. There are a seemingly inordinate number of external applications to track and collect data. It requires vigilance to ensure these applications are safe before they’re connected, rather than finding vulnerabilities after the fact.

 

Finally, assess every device for potential security risks. In an IoT world, there’s a whole new realm of non-standard devices and tools trying to connect. Make sure every device meets security standards; don’t allow untested or non-essential devices to connect. And, to be sure agency data is safe, set up a system to track devices by MAC and IP address, and monitor the ports and switches those devices use.

 

Conclusion

 

Security isn’t getting any easier, but there are an increasing number of steps federal IT pros can take to enhance an agency’s security posture and better protect agency data. Follow CDM guidelines, prepare for a wave of IoT devices, and get a good night’s sleep.

 

Find the full article on Government Technology Insider.

 

The SolarWinds trademarks, service marks, and logos are the exclusive property of SolarWinds Worldwide, LLC or its affiliates. All other trademarks are the property of their respective owners.

Garry Schmidt first got involved in IT service management almost 20 years ago. Since becoming the manager of the IT Operations Center at SaskPower, ITSM has become one of his main focuses.

 

Bruno: What are your thoughts on ITSM frameworks?

Garry: Most frameworks are based on best practices developed by large organizations with experience managing IT systems. The main idea is they’re based on common practice, things that work in the real world.

 

It’s still a matter of understanding the pros and cons of each of the items the framework includes, and then making pragmatic decisions about how to leverage the framework in a way that makes sense to the business. For another organization, it might be different depending on the maturity level and the tools you use to automate.

 

Bruno: Did SaskPower use an ITSM framework before you?

Garry: There was already a flavor of ITSM in practice. Most organizations, even before ITIL was invented, did incident management, problem management, and change management in some way.

 

The methods were brought in by previous service providers who were responsible for a lot of the operational processes. A large portion of IT was outsourced, so they just followed their own best practices.

 

Bruno: How has ITSM evolved under your leadership?

Garry: We were able to make progress on the maturity of our processes. But really having focused effort to review and update our processes and some of the standards we use, and then automate those standards in our ITSM tool, we were able to take a big step forward in the maturity of our ITSM processes.

 

We do periodic maturity assessments to measure our progress. Hopefully, we’ll perform another assessment not too far down the road. Our maturity has improved since implementing the ITSM tool because of the greater integration and the improvements we’ve made to our processes and standards.

 

We’ve focused mostly on the core processes: incident management, problem management, change, event, knowledge, configuration management. Some of these, especially configuration management, rely on the tools to be able to go down that path. One of the things I’ve certainly learned through this evolution is it’s super important to have a good ITSM tool to hook everything together.

 

Over the years, we’ve had a conglomerate of different tools but couldn’t link anything together. There was essentially no configuration management database (CMDB), so you have stores of information distributed out all over the place. Problem management was essentially run out of a spreadsheet. It makes it really tough to make any significant improvements when you don’t have a centralized tool.

 

Bruno: What kind of evolution do you see in the future?

Garry: There’s a huge opportunity for automation, AI Ops, and machine learning—we can feed all our events into and proactively identify things that are about to fail.

 

More automation is going to help a lot in five years or so.

 

Bruno: What kind of problems are you hoping a system like that would identify?

Garry: We’ve got some solid practices in place as far as how we handle major incidents. If we’ve got an outage of a significant system, we stand up our incident command here within the ITOC. We get all the stakeholder teams involved in the discussion. We make sure we’re hitting it hard. We’ve got a plan of how we’re going to address things, we’ve got all the right people involved, we handle the communications. It’s a well-oiled machine for dealing with those big issues, the severe incidents. Including the severe incident review at the end.

 

We have the biggest opportunity for improvement in those little incidents taking place all over the place all the time. Trying to find the common linkages and the root cause of those little things that keep annoying people all the time. I think it’s a huge area of opportunity for us, and that’s one of the places where I think artificial intelligence or machine learning technology would be a big advantage. Just being able to find those bits of information that might relate where it’s difficult for people to do things.

 

Bruno: When did you start focusing on a dedicated ITSM tool versus ad hoc tools?

Garry: In 2014 to 2015 we tried to put in an enterprise monitoring and alerting solution. We thought we could find a solution with monitoring and alerting as well as the ITSM components. We put out an RFP and selected a vendor with the intent of having the first phase be to implement their monitoring and alerting solutions, so we could get a consistent view of all the events and alarms across all the technology domains. The second phase would be implementing the ITSM solution that could ingest all the information and handle all our operational processes. It turned out the vision for a single, integrated monitoring and alerting solution didn’t work. At least not with their technology stack. We went through a pretty significant project over eighteen months. At the end of the project, we decided to divest from the technology because it wasn’t working.

 

We eventually took another run at just ITSM with a different vendor. We lost four or five years going down the first trail.

 

Bruno: Was it ITIL all the way through?

Garry: It was definitely ITIL practices or definitions as far as the process separation, but my approach to using ITIL is you don’t want to call it that necessarily. It’s just one of the things you use to figure out what the right approach is. You often run into people who are zealots of a framework because they all overlap to a certain extent. As soon as you start going at it with the mindset that ITIL will solve all our problems, you’re going to run into trouble. It’s a matter of taking the things that work from all those different frameworks and making pragmatic decisions about how it will work in your organization.

Everyone’s in the cloud,” they said, “and if you’re not there, your business will die.

That was, when, 2010?

As it turned out, there are too many moving parts, too many applications weren’t ready, bandwidth was too expensive and too unreliable, and, oh, storing sensitive data elsewhere? Move along Sir, nothing to see here.

It took us ten years, which is like a century in tech-years (2010: iPhone 4, anyone?) to get to the current state.

 

2020 is approaching, and everyone is in the cloud.

 

Just in a different way than expected. And not dead, anyway.

While there are loads of businesses operating exclusively in the cloud, the majority are following the hybrid approach.

Hybrid is easier for us folks here in EMEA, where each country still follows their laws, while at the same time sitting under the GDPR umbrella.
But let’s not overcomplicate things. Hybrid is simply the best of both worlds.

 

There are loads of cloud providers, and according to a survey we ran recently with our THWACK® community, Microsoft® Azure® is the most popular one in our customer base. Around 53% of you are using Azure, one way or the other.

We added monitoring support for Azure into the Orion® Platform with Server & Application Monitor (SAM) version 6.5 in late 2017, and since, it’s been possible to monitor servers deployed in the cloud and to retrieve other supporting stats, for example, storage.

BTW, the iPhone 8 was the latest tech back then—to put things into perspective.

Earlier in the year, we added Azure SQL support to Database Performance Analyzer (DPA). DPA was our first product available in the Azure Marketplace—it’s been out there for a while.

 

The question for deploying the Orion Platform in Azure came up frequently.


We heard some interesting stories—true pioneers who run the Orion Platform in the cloud, completely unsupported and “at your own risk,” but hey, it worked.

So, the need was there, and we heard you: Azure deployment became officially supported in 2018, and in early 2019 we added some bits and pieces and created the documentation.

But we identified further room for improvement, partnered with Microsoft, and a few weeks after the iPhone 11 Pro release…I mean, since our latest 2019.4 release, the Orion Platform became available in the Azure Marketplace, and Azure SQL Managed Instances were officially supported in the Orion database.

The next step is obvious. It doesn’t matter where your workloads are located anymore, so why would you be concerned with the location of your monitoring system?

 

Let’s have a look at the present…

 

The speed and usability of your first Orion Platform deployment is improved through the marketplace, as the applications are pre-packaged and so much more than the plain executable you would run on a local server, or even an ISO file.

They’re tested and validated instances with all the necessary dependencies, offering options for customization and flexibility.

Traditionally you’d need to request host resources, clone a VM, right-size it a couple of times, make sure the OS is ready, install dependencies, and install the Orion Platform.

While none of it is rocket science, it’ll take some time, and there’s a risk in something going wrong. You know it’s going to happen.

But not anymore!

 

And in the future…

 

If one thing’s consistent, it’s change. And the change could very well mean you’re migrating the Orion Platform into Azure.
Previously, it was a semi-complex task and probably started with using our own Virtualization Manager and its sprawl-feature to make sure all machines involved are right-sized before migration to prevent costly surprises.

The next step is dealing with the infrastructure, the OS, double-clicking the executable again…I wouldn’t call it a grind, but it’s a grind.

And now?
A few clicks in the Azure Marketplace and the whole thing is deployed, and really all that’s left to do is to take care of moving the database. Alright, I agree, that’s a bit of a simplified view, but you know where I’m going with this. It’s easier. Big time.

 

Keyword database.
It’s my least favourite topic and I’m usually glad when Tom is around to discuss it, but from checking the current requirements, SQL 2014 is the minimum, but please consider some Orion Platform modules require 2016SP1 because of the column store feature.

Running anything older than this version doesn’t make a lot of sense.
Getting new licenses and deploying a new SQL version while at the same time paying attention to possible dependencies could be a roadblock preventing you from upgrading Orion modules, and even this is improved with Azure.

 

Also, it’s easier to test a new version, try a new feature, or even another module.
Why not run a permanent lab environment in the cloud? Just a thought!

Do you want to know more? Sure you do!

 

Here’s the next step for you, while I’m on a date with my next cup of coffee.

Omar Rafik, SolarWinds Senior Manager, Federal Sales Engineering

 

Here’s an interesting article by my colleague Brandon Shopp where he discusses the Army’s new training curriculum and the impacts on network and bandwidth.

 

The U.S. Army is undergoing a major technology shift affecting how soldiers prepare for battle. Core to the Army’s modernization effort is the implementation of a Synthetic Training Environment (STE) combining many different performance-demanding components, including virtual reality and training simulation software.

 

The STE’s One World Terrain (OWT) concept is comprised of five different phases from the initial point of data collection to final application. During phase four, data is delivered to wherever soldiers are training. Raw data is used to automatically replicate digital 3-D terrains, so soldiers can experience potential combat situations in virtual reality through the Army’s OWT platform before setting foot on a battlefield.

 

Making One World Terrain work

 

For the STE to work as expected, the Army’s IT team should consider implementing an advanced form of network monitoring focused specifically on bandwidth optimization. The Army’s objective with OWT is to provide soldiers with as accurate a representation of actual terrain as possible, right down to extremely lifelike road structures and vegetation. Transmitting so much information can create network performance issues and bottlenecks. IT managers must be able to continually track performance and usage patterns to ensure their networks can handle the traffic.

 

With this practice administrators may discover what areas can be optimized to accommodate the rising bandwidth needs presented by the OWT. For example, their monitoring may uncover other applications, outside of those used by the STE, unnecessarily using large amounts of bandwidth. They can shut those down, limit access, or perform other tasks to increase their bandwidth allocation, relieve congestion, and improve network performance, not just regarding STE resources but across the board.

 

Delivering a Consistent User Experience

 

But potential hidden components found in every complex IT infrastructure could play havoc with the network’s ability to deliver the desired user experience. There might be multiple tactical or common ally networks, ISPs, agencies, and more, all competing for resources and putting strain on the system. Byzantine application stacks can include solutions from multiple vendors, not all of which may play nice with each other. Each of these can create their own problems, from server errors to application failures, and can directly affect the information provided to soldiers in training.

 

To ensure a consistent and reliable experience, administrators should take a deep dive into their infrastructure. Monitoring database performance is a good starting point because it allows teams to identify and resolve issues causing suboptimal performance. Server monitoring is also ideal, especially if it can monitor servers across multiple environments, including private, public, and hybrid clouds.

 

These practices should be complemented with detailed application monitoring to provide a clear view of all the applications within the Army’s stack. Stacks tend to be complicated and sprawling, and when one application fails, the others are affected. Gaining unfettered insight into the performance of the entire stack can ward off problems that may adversely affect the training environment.

 

Through Training and Beyond

 

These recommendations can help well beyond the STE. The Army is clearly a long way from the days of using bugle calls, flags, and radios for communication and intelligence. Troops now have access to a wealth of information to help them be more intelligent, efficient, and tactical, but they need reliable network operations to receive the information. As such, advanced network monitoring can help them prepare for what awaits them in battle—but it can also support them once they get there.

 

Find the full article on Government Computer News.

 

The SolarWinds trademarks, service marks, and logos are the exclusive property of SolarWinds Worldwide, LLC or its affiliates. All other trademarks are the property of their respective owners.

For some, artificial intelligence (AI) can be a scary technology. There are so many articles on the web about how AI will end up replacing X% of IT jobs by Y year. There’s no reason to be afraid of AI or machine learning. If anything, most IT jobs will benefit from AI/machine learning. The tech world is always changing, and AI is becoming a big driver of change. Lots of people interact with or use AI without even realizing it. Sometimes I marvel at the video games my kids are playing and think back to when I played Super Mario Brothers on the original Nintendo Entertainment System (which I still have!). I ask myself, “What kind of video games will my kids be playing?” The same applies to the tech world now. What kinds of things will AI and machine learning have changed in 10 years? What about 20 years?

 

Let’s look at how AI is changing the tech world and how it will continue to benefit us in decades to come.

 

Making IoT Great

The idea of the Internet of Things focuses on all the things in our world capable of connecting to the internet. Our cars, phones, homes, dishwashers, washing machines, watches, and yes, even our refrigerators. The internet-connected devices we use need to do what we want, when we want, and do it all effectively. AI and machine learning help IoT devices perform their services effectively and efficiently. Our devices collect and analyze so much data, and they rely more and more on AI and machine learning to sift through and analyze all the data to make our interactions better.

 

Customer Service

I don’t know a single human being who gets a warm and fuzzy feeling when thinking about customer service. No one wants to deal with customer service, especially when it comes to getting help over the phone. What’s our experience today? We call customer service and we likely get a robotic voice (IVR) routing us to the next most helpful robot. I find myself yelling “representative” over and over until I get a human… really, it works!

 

AI is improving our experience with IVR systems by making them easier to interact with and get help from. IVR systems also use AI to analyze input from callers to better route their calls to the right queue based on common trending issues. AI also helps ensure 24/7 customer service, which can be helpful with off-hours issues. You don’t have to feed AI-enhanced IVR systems junk food and caffeine to get through the night!

 

Getting Info to the User Faster

Have you noticed the recommendations you get on YouTube? What about the “because you watched…” on Netflix? AI and machine learning are changing the way we get info. Analytical engines pour through user data and match users with the info they most want, and quickly. On the slightly darker side of this technology, phones and smart speakers have become hot mics. If you’re talking about how hungry you are, your phone or speaker hears you, then sends you an email or pops up an ad on your navigation system to the nearest restaurant. Is that good? I’m not sold on it yet—it feels a little invasive. Like it or not, the way we get our data is changing because of AI.

 

Embrace It or Pump the Brakes?

For some, yeah you know who I’m talking about (off-gridders), AI is an invasive and non-voluntary technology. I wouldn’t be surprised if you had to interact with AI at least once in your day. We can’t get around some of those moments. What about interacting with AI on purpose? Do you automate your home? Surf the web from your fridge door? Is AI really on track with helping us as humans function easier? It’s still up for debate, and I’d love to hear your response in the comments.

Not too long ago, a copy of Randall Munroe’s “Thing Explainer” made its way around the SolarWinds office—passing from engineering to marketing to development to the Head Geeks, and even to management.

 

Amid chuckles of appreciation, we recognized Munroe had struck upon a deeper truth: as IT practitioners, we’re often asked to describe complex technical ideas or solutions. However, it’s often for an audience requiring a simplified explanation. These may be people who consider themselves “non-technical,” but just as easily, it could be for folks deeply technical in a different IT discipline. From both groups (and people somewhere in-between) comes the request to “explain it to me like I’m five years old” (a phrase shortened to “Explain Like I’m Five,” or ELI5, in forums across the internet).

 

There, amid Munroe’s mock blueprints and stick figures, were explanations of complex concepts in hyper-simplified language achieving the impossible alchemy of being amusing, engaging, and accurate.

 

We were inspired. And so, for the December Writing Challenge 2019, we hope to do for IT what Randall Munroe did for rockets, microwaves, and cell phones: explain what they are, what they do, and how they work in terms anyone can understand, and in a way that may even inspire a laugh or two.

 

At the same time, we hope to demonstrate a simple idea best explained by a man who understood complicated things:

“If you can’t explain it simply, you don’t understand it well enough.” – Albert Einstein

 

Throughout December, one writer—chosen from among the SolarWinds staff and THWACK MVPs—will be the lead writer each day. You—the THWACK community—are invited to contribute your own thoughts each day, both on the lead post and the word itself. In return, you’ll receive friendship, camaraderie, and THWACK points. 200 to be precise, for each day you comment on a post.*

 

You’ll find each day’s post on the December Writing Challenge 2019 forum. Take a moment now to visit it and click "Follow" so that you don't miss a single post. As in past years, I’ll be writing a summary of the week and posting it over on the Geek Speak forum.

 

In the spirit of ELI5, your comments (and indeed, those of the lead writers as well) can be in the form of prose, poetry, or even pictures. Whatever you feel addresses the word of the day and represents a way to explain a complex idea simply and clearly.

 

To help you get your creative juices flowing, here’s the word list in advance.

 

Everyone here on THWACK is looking forward to reading your thoughts!

  1. Monitoring
  2. Latency
  3. Metrics
  4. NetFlow
  5. Logging
  6. Observability
  7. Troubleshoot
  8. Virtualization
  9. Cloud Migration
  10. Container
  11. Orchestration
  12. Microservices
  13. Alert
  14. Event Correlation
  15. Application Programming Interface (API)
  16. SNMP
  17. Syslog
  18. Parent-Child
  19. Tracing
  20. Information Security
  21. Routing
  22. Ping
  23. IOPS
  24. Virtual Private Network (VPN)
  25. Telemetry
  26. Key Performance Indicator (KPI)
  27. Root Cause Analysis
  28. Software Defined Network (SDN)
  29. Anomaly detection
  30. AIOps
  31. Ransomware

 

* We’re all reasonable people here. When I say “a comment,” it needs to be meaningful. Something more than “Nice” or “FIRST!” or “Gimme my points.” But I’m sure you all knew that already.

It’s time to stop seeing IT as just a black hole to throw money towards and instead, give back to the business. Often, the lack of proper measurement and standardization are the problem, so how do we address this?

 

We’ve all tried to leverage more funds from the organization to undertake a project. Whether replacing aging hardware or out-of-support software, without the business seeing the value the IT department brings, it’s hard to obtain the financing required. That’s because the board typically sees IT as a cost center.

 

A cost center is defined as a department within a business not directly adding profit, but still requiring money to run. A profit center is the inverse, whereby its operation directly adds to overall company profitability.

 

But how can you get there? You need to understand everything the IT department does is a process, and thereby can be measured and thus improved upon. Combining people and technology, along with other assets, and passing them through a process to deliver customer success or business results is a strategic outcome you want to enhance. If you can improve the technology and processes, then you’ll increase the strategic outcomes.

 

By measuring and benchmarking these processes, you can illustrate the increases made due to upgrades. Maybe a web server can now support 20% more traffic, or its loading latency has been reduced by four seconds, which over a three-month period has led to an increase of 15% web traffic and an 8% rise in sales. While I’ve just pulled these figures from the air, the finance department evaluates spending and can now see a tangible return on the released funds. The web server project increased web traffic, sales, and therefore profits. When you approach the next project you want to undertake, you don’t have to just say “it’s faster/newer/bigger than our current system” (although you should add to your discussion the faster, newer, bigger piece of hardware or software will provide an improvement to the overall system). However, without data, you have no proof, and the proof is in the pudding, as they say. Nothing will make a CFO happier than seeing a project with milestones and KPIs (well, maybe three quarters exceeding predictions). So, how do we measure and report all these statistics?

 

If you think of deployments in terms of weeks or months, you’re trying to deploy something monolithic yet composed of many moving and complex parts. Try to break this down. Think of it like a website. You can update one page without having to do the whole site. Then you start to think “instead of the whole page, what about changing a .jpg, or a background?” Before long, you’ve started to decouple the application at strategic points, allowing for independent improvement. At this stage, I’d reference the Cloud Native Computing Foundation Trail Map as a great way to see where to go. Their whole ethos on empowering organizations running modern scalable applications can help you with transformation.

 

But we’re currently looking at the measurement aspect of any application deployment. I’m not just talking about hardware or network monitoring, but a method of obtaining baselines and peak loads, and of being able to predict when a system will reach capacity and how to react to it.

 

Instead of being a reactive IT department, you suddenly become more proactive. Being able to assign your ticketing system to your developers allows them to react faster to any errors from a code change and quickly fix the issue or revert to an earlier deployment, thereby failing quickly and optimizing performance.

 

I suggest if you’re in charge of an application or applications, or support one on a daily basis, start to measure and record anywhere you can on the full stack. Understand what normal looks like, or how it’s different from “rush hour,” so you can say with more certainty it’s not the network. Maybe it’s the application, or it’s the DNS (it’s always DNS) leading to delays, lost revenue, or worse, complete outages. Prove to the naysayers in your company you have the correct details and that 73.6% of the statistics you present aren’t made up.

Continuing from my previous blog post, Meh, CapEx, I’m going to take a cynical look at how and why Microsoft has killed its perpetual licensing model. Now don’t get me wrong, it’s not just Microsoft – other vendors have done the same. I think a lot of folks in IT can say they use at least one Microsoft product, so it’s easily relatable.

 

Rant

Let’s start with the poster child for SaaS done right: Office 365. Office 365 isn’t merely a set of desktop applications like Word and Excel with such cool features as Clippy anymore.

Clippy

No, it’s a full suite of services such as email, content collaboration, instant messaging, unified communications, and many more, but you already knew that, right? With a base of 180 million active users as of Q3 2019¹ and counting, it’d be silly for Microsoft not to invest their time and effort into developing the O365 platform. Traditional on-premises apps, though, are lagging in feature parity or in some cases changed in a way that to me, at least, seems like a blatant move to push people towards Office 365. Let’s look at the minimum hardware requirements for Exchange 2019 for example: 128GB memory required for the mailbox server role². ONE HUNDRED AND TWENTY-EIGHT! That’s a 16 times increase over Exchange 2016³. What’s that about then?

 

To me, it seems like a move to guide people down the path of O365. People without the infrastructure to deploy Exchange 2019 likely have a small enough mail footprint to easily move to O365.

 

Like I said in my Meh, CapEx blog post, it’s the extras bundled in with the OpEx model make it even more attractive. Microsoft Teams is one such example of a great tool that comes with O365 and O365 only. Its predecessors, Skype for Business and Lync on-premises, are dead.

 

  Now, what about Microsoft Azure? Check out this snippet from the updated licensing terms as of October 1, 2019:

Beginning October 1, 2019, on-premises licenses purchased without Software Assurance and mobility rights cannot be deployed with dedicated hosted cloud services offered by the following public cloud providers: Microsoft, Alibaba, Amazon (including VMware Cloud on AWS), and Google.

So basically, no more perpetual license on one of the big public cloud providers for you, Mr./Mrs. Customer.

 

Does this affect you? I’d love to know.

 

I saw some stats from one of the largest Microsoft distributors in the U.K., 49% of all deployed workloads in Azure that are part of a CSP subscription they’ve sold are virtual machines. I’d be astonished if this license change doesn’t affect a few of those customers.

 

Wrap It Up

In my cynical view, Microsoft is leading you down a path where subscription licensing is more favorable. You only get the cool stuff with a subscription license, while traditional on-premises services are being made to look less favorable one way or another. And guess what—they were usually licensed with a perpetual license.

 

It’s not all doom and gloom though. Moving to services like O365 also removes the headache of having to manage services like Exchange and SharePoint. But you must keep on paying, every month, to continue to use those services.

 

 

¹ Microsoft third quarter earnings call transcript, page 3 https://view.officeapps.live.com/op/view.aspx?src=https://c.s-microsoft.com/en-us/CMSFiles/TranscriptFY19Q3.docx?version=0e85483a-1f8a-5292-de43-397ba1bfa48b

 

² Exchange 2019 system requirements https://docs.microsoft.com/en-us/exchange/plan-and-deploy/system-requirements?view=exchserver-2019

 

³ Exchange 2016 system requirements https://docs.microsoft.com/en-us/exchange/plan-and-deploy/system-requirements?view=exchserver-2016

 

⁴ Source, Microsoft licensing terms for dedicated cloud https://www.microsoft.com/en-us/licensing/news/updated-licensing-rights-for-dedicated-cloud

I am back in Orlando this week for Live 360, where I get to meet up with 1,100 of my close personal data friends. If you're attending this event, please find me--I'm the tall guy who smells like bacon.

 

As always, here are some links I hope you find interesting. Enjoy!

 

Google will offer checking accounts, says it won’t sell the data

Because Google has proved itself trustworthy over the years, right?

 

Google Denies It’s Using Private Health Data for AI Research

As I was just saying...

 

Automation could replace up to 800 million jobs by 2035

Yes, the people holding those jobs will transition to different roles. It's not as if we'll have 800 million people unemployed.

 

Venice floods: Climate change behind highest tide in 50 years, says mayor

I honestly wouldn't know if Venice was flooded or not.

 

Twitter to ban all political advertising, raising pressure on Facebook

Your move, Zuck.

 

California man runs for governor to test Facebook rules on lying

Zuckerberg is doubling down with his stubbornness on political ads. That's probably because Facebook revenue comes from such ads, so ending them would kill his bottom line.

 

The Apple Card Is Sexist. Blaming the Algorithm Is Proof.

Apple, and their partners, continue to lower the bar for software.

 

Either your oyster bar has a trough or you're doing it wrong. Lee & Rick's in Orlando is a must if you are in the area.

 

Omar Rafik, SolarWinds Senior Manager, Federal Sales Engineering

 

Here’s an interesting article by Mark Hensch about when my colleague Arthur Bradway spoke at a conference about gamifying cybersecurity to improve results.

 

Can gaming significantly improve how governments protect their cybersecurity by making their employees more careful about how they use their IT?

 

According to one cybersecurity expert, the answer might be yes. Arthur Bradway says turning security training into a game can help public servants remember tips for keeping their agency’s data safe.

 

“A big topic lately is gamifying the security training,” Bradway said during GovLoop’s virtual summit. “We all like games. We all like to win.”

 

Bradway is a senior government sales engineer at SolarWinds, a software provider specializing in network, systems, and IT management.

 

According to Bradway, many agencies use dull training lectures, presentations, and videos that don’t help their employees retain cybersecurity knowledge.

 

“A lot of these methods aren’t engaging to the end user,” he said. “You spend time to get your users there and they don’t remember anything. By making it more engaging, they’ll retain more of the information.”

 

Bradway said creating games can also help agencies establish, teach, and enforce IT controls for their workforces.

 

IT controls consist of the procedures and policies to help ensure technologies are being used for their intended purposes in a reasonable manner.

 

Examples of some general controls used for essential IT processes include risk and change management, security, and disaster recovery.

 

When it comes to IT controls, government employees are often unaware of what their agencies expect from them in terms of cybersecurity.

 

“End users are our weakest links in all of this,” he said. “The majority of them don’t know anything about security. They’re used to being constantly connected anywhere they want on their devices. They assume they’ll be able to do the same thing at the office.”

 

Unfortunately, governments can’t take cybersecurity concerns lightly because of the sensitive data they often handle.

 

Governments that fail to protect their data can lose the trust of their citizens, suffer financial damage, and even endanger national security.

 

Bradway said, however, gaming can help prevent cybersecurity incidents by teaching public servants about the topic in an entertaining way.

 

For example, he continued, gaming can educate people about the different cyberthreats currently menacing agencies.

 

Bradway suggested one game where players assume the role of such cyberthreats as hostile foreign governments to learn how they act.

 

“When people are playing the role of the bad guy, they realize, ‘Wait, there’s more than one type of bad guy in the world?’” he said. “They realize more is going on and they need to start paying attention to it.”

 

Gamifying security training could resonate with public servants—especially younger ones—who are used to playing games on their mobile devices.

 

“Everyone is used to doing something on their phones and getting some little reward,” he said. “We know the end users are the problem. A lot of this highlights the trainings, policies, and procedures in place.”

 

Find the full article on GovLoop.

 

The SolarWinds trademarks, service marks, and logos are the exclusive property of SolarWinds Worldwide, LLC or its affiliates. All other trademarks are the property of their respective owners.

The reports of my death are greatly exaggerated - The On Premises Data Center Mark Twain

 

As I mentioned in my previous post, the hype cycle has the on-premises data center dead to rights. Public cloud is the future and there’s no turning back. When you step back and investigate the private cloud space, you’d be surprised.

 

When the order comes down to move to the public cloud, sometimes you discover the application or workload isn’t suitable for migration. How many times have you been in an environment where an important line-of-business application was written by an employee who left 15 years ago and has no documentation. Is the application even running on an operating system one of the public cloud providers supports? While proper planning and refactoring can provide a path to move your application to the cloud, occasionally you’ll run into unavoidable physical limitations. In manufacturing environments, often you’ll find the need for a custom interface for a machine or a USB license dongle for an application to function.

 

Sometimes applications can’t move to the cloud despite planning. But not everyone takes the time to plan their cloud migrations. Failure to plan leads to many issues with cloud adoption (which I’ll discuss in my next post). What about your connectivity to the internet and cloud? Whether it’s uptime or bandwidth, the network can prove to be a cloud roadblock. When these migrations fail, where does the workload go? Back to the on-premises data center!

 

Looking at purchasing new data center hardware or moving to the cloud often includes decision-makers outside of the IT department. While we will leave the deep fiscal details for the financial experts, Chief Financial Officers and business leaders often weigh the benefits of the Operating Expenditure versus Capital Expenditures, commonly referred to as OpEx vs. CapEx. While the ability to quickly scale your cloud resources up and down based on demand might be a blessing to an application administrator, the variations on cost accompanying it can prove to be difficult for the accounting department. The ability to make a one-time purchase every three to five years and amortize the cost over those years is a tried-and-true financial tactic.

 

Speaking of finances, the fiscal performance of hardware vendors must certainly be a bellwether of cloud adoption. Sales of data center hardware has to be falling with the ever-growing adoption of public cloud, right? Wrong. As has been announced over the past year, vendors such as Dell Technologies, Pure Storage, and Lenovo report record earnings and growth. In May 2019, Dell Technologies announced 2% growth year over year. May also brought a revenue announcement from Lenovo of 12.5% growth year over year. August of 2019 saw Pure Storage announce a whopping 28% year over year revenue growth. These companies are just a small example. Clearly physical data center hardware is still in high demand.

 

Looking at many factors, it’s easy to say the on-premises data center is far from dead. Continuing the “Battle of the Clouds” series, we’ll dive into why “just put it in the cloud” isn’t always the best solution. 

 

 

Few relationships sail along without ever having to cross stormy seas. Even the best marriages need to batten down the hatches occasionally. A combination of genuine involvement, good communication, and various tools used by all involved is the best way to extend the time between encountering tempests. Business and professional relationships are no different.

 

Calming the Waters

The relationship between an IT department and the business it supports is one of provider and customer. A real relationship thrives when all participants are genuinely involved and prospers with good communication. Set up adequately and then maintained, IT Service Management (ITSM) is designed to create an environment where the relationship flourishes.

 

What Is ITSM?

The set of activities involved to plan, design, create, deliver, operate, and manage IT services for the business.

 

What Does This Mean?

Most people are familiar with the paradigm where IT systems management focuses on managing specific technology silos such as network, storage, and compute. Success is measured in uptime, availability, utilization, and how often the email inbox doesn’t get flooded. IT service management focuses on the processes to meet employee needs and business requirements. Success is measured by employee satisfaction and quality of experience. The processes for ITSM maintain a keen focus and a strong motivation for continuous improvement. An email or chat notification is an opportunity to jump into action and enhance the relationship.

 

Benefits of a Healthy Relationship

Great things can happen when IT is in a healthy relationship with the broader organization and employees. ITSM can deliver benefits for IT and the business.

 

Benefits Realized by IT:

  • A better understanding of business requirements
  • Well-defined processes
  • Proactive problem resolution
  • Improved tracking of repeat problems
  • Stats to measure and improve performance
  • Defined roles and responsibilities
  • Happier employees

 

Benefits Realized by the Business:

  • Increased IT service availability translates to increased employee and business productivity
  • Increased value and cost-efficiency
  • Reduced risk as the business complies with industry regulations
  • Adoption of new IT technology at a quicker pace
  • Increased competitive advantage

 

One Size Does Not Fit All

An IT department can be set up in many ways, and a business can consume the services offered by IT in many ways. Because of these variables, there are myriad types of relationships between the business and IT. It only makes sense that there are multiple ITSM frameworks to choose from.

 

Some Common Frameworks:

  • Business Process Framework (eTOM)
  • Control Objectives for information and Related Technologies (COBIT)
  • FitSM
  • ISO/IEC 20000
  • ITIL
  • Lean
  • Six Sigma
  • The Open Group Architecture Framework (TOGAF)

 

Supporting the business and helping it achieve its goals is true north for any IT department. An ITSM framework should, therefore, be chosen with the business in mind and for business reasons. It’s possible to build your own by adopting the most relevant parts from some or all of the frameworks listed above. It’s also possible to incorporate homegrown processes into one of the well-known frameworks. This mixing and matching allow complete customization of the way IT offers its services.

 

What’s in a Framework?

The success or failure of any system can usually be traced back to people, processes, and tools; in order of importance. While the IT department supplies the people and the tools, the ITSM framework delivers on the processes. All processes are designed to shift the habits of IT from being operationally focused on systems to being employee-focused and service-oriented.There are conventional processes contained in each of the different frameworks. The broad categories for these processes are:

 

  • Service Desk
  • Incident Management
  • Problem Management
  • Change Management
  • Asset Management
  • Knowledge Management

 

The Core of ITSM

A core benefit of ITSM is how it can positively change the relationship between the business and IT. When IT takes on a more employee-focused and service-oriented approach to delivering its services, IT can transform from being viewed merely as a cost center to a trusted business partner. There will always be a need for IT to be tactical in certain situations. By adopting and being genuinely involved in ITSM processes, IT can spend more time being strategic. IT can be a business enabler.

We’ve all heard it before.

 

“The cloud is the future!”

“We need to move to the cloud!”

“The on-premises data center is dead.”

 

If you believe the analysts and marketing departments, public cloud is the greatest thing to happen to the data center since virtualization. But, is it true? Could public cloud be the savior of the IT department? While many features to the public cloud make it an attractive infrastructure replacement, failure to adequately plan for its use can prove to be a costly mistake.

 

Moving past the marketing, the cloud is simply “someone else’s computer.” Yes, it’s more complicated than that, but when you boil it down to the basics, it’s a data center maintained by a third-party with proprietary software on top to provide an easy-to-use dashboard for provisioning and monitoring. When you move to the cloud, you’re still running an application on a server. Many of the same problems you have with your application running on-premises can persist in the cloud.

 

In a public cloud environment, the added complexity of multi-tenancy on the underlying resources can complicate things. Now you have to think about regulatory compliance? And after all, public cloud is still a data center subject to human error. This has been made evident over and over, famously by the Amazon Web Services S3 outage of February 2017.* The wide adoption of public clouds such as AWS and Microsoft Azure has also opened the door to more instances of shadow IT. Rogue devs, admins, and end users who either don’t have the patience to wait or have been denied resources opening cloud accounts with their own credit cards and putting corporate data at risk. And, we have yet to even take into consideration the consumption-based billing model.

 

Even with the above listed “issues” (I put quotes around issues as some of the problems can be encountered in the private cloud or worked around), public cloud can be an awesome tool in the IT administrator’s toolbox. Properly architected cloud-based applications can alleviate performance issues and can be developed with robust redundancies to avoid downtime. The ability to quickly scale compute up and down based on demand provides the business amazing agility not before seen in the standard data center procurement cycle. And, the growing world of SaaS products provides an easy gateway to enter the cloud (yes, I’m going to take the stance that as-a-Service qualifies as cloud). The introduction of cloud technologies has also opened a world of new application deployment models such as microservices and serverless computing. These amazing ways of looking at infrastructure weren’t possible until recently.

 

Is there hype around public cloud? For sure! Is some of it warranted? Absolutely! Is it the be-all and end-all technology of the future? Not so fast. In the upcoming series of posts I’m calling “Battle of the Clouds,” we’ll look at public cloud versus private cloud, going past the hype to dive into the state of on-premises data centers, what it takes for a successful cloud implementation, and workload planning around both solutions. I look forward to hearing your opinions on this topic as well!

 

*Summary of the Amazon S3 Service Disruption in the Northern Virginia (US-EAST-1) Region

Filter Blog

By date: By tag:

SolarWinds uses cookies on its websites to make your online experience easier and better. By using our website, you consent to our use of cookies. For more information on cookies, see our cookie policy.