Called to Account

showmethemoney.jpg

As IT professionals who have a special interest in monitoring (because why ELSE would you be here on THWACK, except for maybe the snuggies and socks), we understand why monitoring - and more importantly, automation in response to monitoring events - is  intrinsically useful and valuable to an organization. We understand that automation creates efficiencies, saves effort, creates consistency, and most of all, saves money in all it's forms - labor, downtime, lost opportunity, and actual cost.

To us, the benefits of automation are intuitively obvious.

And that's a problem. When I was in university and a professor used the phrase "it's intuitively obvious" that meant one thing: run like hell out of the class because:

A) the professor didn't understand it themselves, and

B) it was going to be on the exam

The idea that things that we see as intuitive are the same things we find either difficult or unnecessary to explain was driven home to me the other day when I solicited examples of the measurable business value of monitoring automation.

I said,

"I’m putting together some material that digs into the benefit of automated responses in monitoring and alerting. I’m looking for quotable anecdotes of environments where this has been done. What’s most important are the numbers, including the reduction in tickets, the improvement in response, etc.

Example:

A certain company “enjoyed” an average of 327 disk full alerts each and every month, which required roughly 20 minutes of staff time to investigate. After implementing a simple vbscript to clear the TEMP directory before opening a ticket, the number of alerts dropped to 57 per month. Now each of those tickets required one hour to resolve, but that’s because they were REAL events that required action, versus the spurious (and largely ignored) quantity before automation was introduced.)

If you have any stories like this, I’d love to hear it. Feel free to pass this along to other colleagues as you see fit."

I received back some incredible examples of automation - actual scripts and recipes, but that's not what I wanted. I wanted anecdotes about the results. So I tried again.

What I am looking for is more like examples that speak to how monitoring can justify its own existence to the bean counters who don’t care if technology actually works as long as it’s cheap.

I got back more examples of monitoring automation. Some even more jaw-droppingly awesome than before. But it's still not what I wanted. I started cornering people one-on-one to see if maybe email was the wrong medium.

What I discovered upon intense interroga... discussion was that IT pros are very familiar with how automation is accomplished. We remember those details, down to individual lines of code in some cases. But pressed for information that describes how that automation affected the business (saved money, reduced actionable tickets, averted specific downtimes which would have cost X dollars per minute), all I got was, "Well, I know it helped. The users said it was a huge benefit to them. But I don't have those numbers."

That was the answer in almost every case.

I find this interesting for a couple of reasons. First, because you'd think we would not only be curious, we would positively bask in the glory that our automation was saving the company money. We are, after all, IT professionals. The job practically comes with a cape and tights.

Second, we're usually the first ones to shout "Prove it!" when someone makes a claim about a particular fact, event, effect, or even opinion. Did someone say Star Trek IV was the highest grossing movie of the franchise? Show me the BoxOfficeMojo.com numbers for that!* At a dinner party and someone says sharks are a major cause of death in the summer? You're right there to list out 25 things that are actually more likely to kill you than sharks.**

But despite our fascination with facts and figures when it comes to ephemera and trivia, we seem to have a blind spot with business. Which is a shame.

It's a shame because being able to prove the impact that monitoring and alerting has on the business is the best way to get more of it. More staff, more servers, more software, more time, and most importantly, more buy-in.

Imagine providing your CEO with data on how one little alert saved $250 each and every time it triggered, and then opening up the ticket logs to show that the alert triggered 327 times last month. That's a savings of $81,750 in one month alone!!

Put those kind of numbers against a handful of automated responses, and you could feel like Scrooge McDuck diving into his pool of money every time you opened the ticket system!

So prove me wrong. In the comments below, give me some examples of the VALUE and business impact that monitoring has had.

More than just giving me grist for the mill (which, I'll be honest, I'm totally going to use in an upcoming eBook, totally giving credit where credit is due, and THWACK points!) what we'll all gain is insight into the formulae that works for you. Hopefully we can adapt it to our environment.

* In actuality StarTrek IV ranked 4th, unless you adjust for inflation in which case it was 3rd. First place is held in both categories by the original motion picture. The more you know.

**Sharks kill about 5 people annually***. Falling out of bed kills 450. Heck, bee stings claim 53 lives each year. So go ahead and dive in. The water's fine and Bruce the shark probably will leave you alone.

*** Unless you are watching "Sharknado". Then the number is closer to 16 people.

Parents
  • You put your finger on the heart of the issue when you discovered technical people saying  "I don't have those numbers."

    We, as I.T. professionals, resolve issues and move onto the next task or fire without knowing the cost benefit to the organization--because we are not in Management.  Management works with those statistics, translates outages into lost revenue or customers, converts Help Desk incidents into costs against a product or department, and reports to the C-Level executives.  We aren't allowed to ask what our coworkers in other teams and departments earn, therefore we don't know what an hour of wasted time for them costs.

    And I.T. Managers in my organization have limited knowledge of Orion's worth to the organization.  They only know how much it costs for the licenses, hardware, and support contract.  Which doesn't enable them to be a powerful advocate for purchasing SW products.

    Some decades ago I learned about organizations that provided bonuses to employees who made useful suggestions that ended up decreasing costs or increasing sales or both.  I heard percentages from 10 to 25, and that meant if you came up with a time saver that ended up saving the company $400K a year, you'd see a bonus in your check that was between $40K and $100K.

    Wouldn't it be great to have the detailed info that was shared with those employees to calculate their bonuses?  You save the company $1M and walk away with serious bonus money.

    If we had access to the actual costs saved, to justify products like Orion, we'd feel more like the heroes we actually are.  Part of remaining unsung is caused by our remaining out of the loop for the significant numbers that translate into dollars.

    Once upon a time my organization hired out a time & motion / efficiency study to determine where time was being lost.  Our IT staff are housed several blocks away from our data centers, and a number of us needed to be in that facility daily.  The study revealed we were losing two Full Time I.T. employees every year in  time spent walking between the office and the data center.  The logical response might have been to move our team to the new facility that houses the data center.  Two employees at $100K/year, times the fourteen years I've been here, would have saved $2,800,000.  But we didn't move because it was too expensive.  Management knows more than I do, and I have to trust them on this.  They have the big picture.   Or do they? Without the statistics you're looking for, they're in the blind about how network monitoring is a money-maker instead of a liability with endless support contract costs.

    For fun, and for easy math, let's say an employee's wasted time costs $50/hour.  If we have 15,000 employees who rely on computers and network and the data center, and we prevent an hour of data center down time, we just saved $750,000.  And we continue doing so for EVERY HOUR of downtime that we reduce or prevent.

    That doesn't include intangible costs for Public Relations and customers lost and negative impressions made.  A company might never climb out of that hole with some customers.

    This is what you're looking for, real world examples of how Network Monitoring prevented or reduced network down time, and the associated costs to employees and customers.  We KNOW Orion saves down time.  But we don't know how much money it saves the organization.  

    I've seen several cases where human errors caused major outages that impacted the entire organization.  Having NPM and Kiwi at my side, showing me real-time traps and syslog messages and systems/sites going offline, I was alerted to the issue in less than two seconds.  Responding to the issue correctly was done significantly faster because that Orion monitoring is in place, resulting in reduced data center outage time by at least 30 minutes, maybe an hour.

    Extend this into WAN services, server outages, database problems--all which NPM and its peers monitor and alert on, and put us onto the root cause quickly.  I see WAN outages every month, and sometimes they're caused by backhoe fade, sometimes by rodents, sometimes by storms, sometimes by human error on the providers' side, and sometimes by hardware failure.  NPM lets us know there's an issue faster than the end users can call the Help Desk.  By the time they DO call the Help Desk, we're already working on restoring services and have notified the Help Desk and the WAN providers, etc.

    Just having NCM prevents untold outages by giving us the ability to review running configurations without having to SSH into a production network appliance, potentially making a mistake and bringing it down.  And when a device DOES get misconfigured, correcting the problem is MUCH faster when we can compare the current faulty running configuration against last night's known-good configuration, and backing out whatever changes were applied.  That saves many hours of lost production time and frustration.

    But as for finding the specific amount of dollars saved, it may be that you're targeting the wrong audience, adatole​.  I wouldn't be surprised to learn many organizations don't have a "right" audience for you to target.  Accounting and H.R. may know how much each employee earns, but they don't know how many minutes or hours or days of lost production time using a network monitor like Orion prevents.  In the same manner,  I.T. staff can't determine how many dollars in lost wages/productivity are saved, and we don't track improved uptime because we don't have a good solution for comparing up time over the years.   We're not even in a great place for determining how much down time is prevented because we don't have a great data warehouse solution for our statistics from Solarwinds.  If we had such a tool, it would be sweet to compare statistics year-to-year so that we could easily see lost production time decreased after installing another poller or another module.  It could show trends in down time while accounting for increased employee numbers and increased applications deployed to the organization.  Not to mention being able to track bandwidth utilization on WAN or Internet circuits over years, allowing us to better forecast future budgetary needs for growth . . .

    Hmm.  Data Warehousing for Orion . . .   Hmmm . . .

    pastedImage_0.png

    That one item might be the enabler for us to get the information you seek . . .

Comment
  • You put your finger on the heart of the issue when you discovered technical people saying  "I don't have those numbers."

    We, as I.T. professionals, resolve issues and move onto the next task or fire without knowing the cost benefit to the organization--because we are not in Management.  Management works with those statistics, translates outages into lost revenue or customers, converts Help Desk incidents into costs against a product or department, and reports to the C-Level executives.  We aren't allowed to ask what our coworkers in other teams and departments earn, therefore we don't know what an hour of wasted time for them costs.

    And I.T. Managers in my organization have limited knowledge of Orion's worth to the organization.  They only know how much it costs for the licenses, hardware, and support contract.  Which doesn't enable them to be a powerful advocate for purchasing SW products.

    Some decades ago I learned about organizations that provided bonuses to employees who made useful suggestions that ended up decreasing costs or increasing sales or both.  I heard percentages from 10 to 25, and that meant if you came up with a time saver that ended up saving the company $400K a year, you'd see a bonus in your check that was between $40K and $100K.

    Wouldn't it be great to have the detailed info that was shared with those employees to calculate their bonuses?  You save the company $1M and walk away with serious bonus money.

    If we had access to the actual costs saved, to justify products like Orion, we'd feel more like the heroes we actually are.  Part of remaining unsung is caused by our remaining out of the loop for the significant numbers that translate into dollars.

    Once upon a time my organization hired out a time & motion / efficiency study to determine where time was being lost.  Our IT staff are housed several blocks away from our data centers, and a number of us needed to be in that facility daily.  The study revealed we were losing two Full Time I.T. employees every year in  time spent walking between the office and the data center.  The logical response might have been to move our team to the new facility that houses the data center.  Two employees at $100K/year, times the fourteen years I've been here, would have saved $2,800,000.  But we didn't move because it was too expensive.  Management knows more than I do, and I have to trust them on this.  They have the big picture.   Or do they? Without the statistics you're looking for, they're in the blind about how network monitoring is a money-maker instead of a liability with endless support contract costs.

    For fun, and for easy math, let's say an employee's wasted time costs $50/hour.  If we have 15,000 employees who rely on computers and network and the data center, and we prevent an hour of data center down time, we just saved $750,000.  And we continue doing so for EVERY HOUR of downtime that we reduce or prevent.

    That doesn't include intangible costs for Public Relations and customers lost and negative impressions made.  A company might never climb out of that hole with some customers.

    This is what you're looking for, real world examples of how Network Monitoring prevented or reduced network down time, and the associated costs to employees and customers.  We KNOW Orion saves down time.  But we don't know how much money it saves the organization.  

    I've seen several cases where human errors caused major outages that impacted the entire organization.  Having NPM and Kiwi at my side, showing me real-time traps and syslog messages and systems/sites going offline, I was alerted to the issue in less than two seconds.  Responding to the issue correctly was done significantly faster because that Orion monitoring is in place, resulting in reduced data center outage time by at least 30 minutes, maybe an hour.

    Extend this into WAN services, server outages, database problems--all which NPM and its peers monitor and alert on, and put us onto the root cause quickly.  I see WAN outages every month, and sometimes they're caused by backhoe fade, sometimes by rodents, sometimes by storms, sometimes by human error on the providers' side, and sometimes by hardware failure.  NPM lets us know there's an issue faster than the end users can call the Help Desk.  By the time they DO call the Help Desk, we're already working on restoring services and have notified the Help Desk and the WAN providers, etc.

    Just having NCM prevents untold outages by giving us the ability to review running configurations without having to SSH into a production network appliance, potentially making a mistake and bringing it down.  And when a device DOES get misconfigured, correcting the problem is MUCH faster when we can compare the current faulty running configuration against last night's known-good configuration, and backing out whatever changes were applied.  That saves many hours of lost production time and frustration.

    But as for finding the specific amount of dollars saved, it may be that you're targeting the wrong audience, adatole​.  I wouldn't be surprised to learn many organizations don't have a "right" audience for you to target.  Accounting and H.R. may know how much each employee earns, but they don't know how many minutes or hours or days of lost production time using a network monitor like Orion prevents.  In the same manner,  I.T. staff can't determine how many dollars in lost wages/productivity are saved, and we don't track improved uptime because we don't have a good solution for comparing up time over the years.   We're not even in a great place for determining how much down time is prevented because we don't have a great data warehouse solution for our statistics from Solarwinds.  If we had such a tool, it would be sweet to compare statistics year-to-year so that we could easily see lost production time decreased after installing another poller or another module.  It could show trends in down time while accounting for increased employee numbers and increased applications deployed to the organization.  Not to mention being able to track bandwidth utilization on WAN or Internet circuits over years, allowing us to better forecast future budgetary needs for growth . . .

    Hmm.  Data Warehousing for Orion . . .   Hmmm . . .

    pastedImage_0.png

    That one item might be the enabler for us to get the information you seek . . .

Children
No Data
Thwack - Symbolize TM, R, and C