Application Apocalypse

Capture.PNGApologies for the doomsday reference, but I think it’s important to draw attention to the fact that business-critical application failures are creating apocalyptic scenarios in many organizations today. As businesses become increasingly reliant on the IT infrastructure for hosting applications and Web services, the tolerance for downtime and degraded performance has become almost nil. Everyone wants 100% uptime with superior performance of their websites. Whether applications are hosted on-premises or in the cloud, whether they are managed by your internal IT teams or outsourced to managed service providers, maintaining high availability and application performance is a top priority for all organizations.

Amazon Web ServicesRegistered (AWS) experienced a massive disruption of its services in Australia last week. Across the country, websites and platforms powered by AWS, including some major banks and streaming services, were affected. Even AppleRegistered experienced a widespread outage in the United States last week, causing popular services, including iCloudRegistered, iTunesRegistered and iOSRegistered App Store to go offline for several hours.

I’m not making a case against hosting applications in the cloud versus hosting on-premises. Regardless of where applications are running—on private, public, hybrid cloud, or co-location facilities—it is important to understand the impact and aftermath of downtime. Take a look at these statistics that give a glimpse of the dire impact due to downtime and poor application performance:

  • The average hourly cost of downtime is estimated to be $212,000.
  • 51% of customers say slow site performance is the main reason they would abandon a purchase online.
  • 79% of shoppers who are dissatisfied with website performance are less likely to buy from the same site again.
  • 78% of consumers worry about security if site performance is sluggish.
  • A one-second delay in page response can result in a 7% reduction of conversions.

*All statistics above sourced from Kissmetrics, Brand Perfect, Ponemon Institute, Aberdeen Group, Forrester Research, and IDG.


Understanding the Cost of Application Downtime

  • Financial losses: As seen in the stats above, customer-facing applications that perform unsatisfactorily affect online business and potential purchases, often resulting in customers taking their business to other competitors.
  • Productivity loss: Overall productivity will be impacted when applications are down and employees are not able to perform their job or provide customer service.
  • Cost of fixing problems and restoring services: IT departments spend hours and days identifying and resolving application issues, which involves labor costs, and time and effort spent on problem resolution.
  • Dent on brand reputation: When there is a significant application failure, customers will start having a negative perception about your organization and its services, and lose trust in your brand.
  • Penalty for non-compliance: MSPs with penalty clauses included in service level agreements will incur additional financial losses.

Identifying and Mitigating Application Problems

Applications are the backbone of most businesses. Having them run at peak performance is vital to the smooth execution of business transactions and service delivery. Every organization has to implement an IT policy and strategy to:

  1. Implement continuous monitoring to proactively identify performance problems and indicators.
  2. Identify the root cause of application problems, apply a fix, and restore services as soon as possible, while minimizing the magnitude of damage.

It is important to have visibility, and monitor application health (on-premises and in the cloud) and end-user experience on websites, but it is equally important to monitor the infrastructure that supports applications, such as servers, virtual machines, storage systems, etc. There are often instances where applications perform sluggishly due to server resource congestion or storage IOPS issues.

Share your experience of dealing with application failures and how you found and fixed them.

Check out this ebook "Making Performance Pay - How To Turn The Application Management Blame Game Into A New Revenue Stream" by NEXTGEN, an Australia-based IT facilitator, in collaboration with SolarWinds.


  • You're likely better off to recommend solar-flare-proof storage and great fire suppression systems, and quick-trip circuit breakers and insulated/isolated UPS's when that day comes.

    At least if you recommend them, you'll be a hero for being right, even if management chooses to ignore your best advice.

    And if the flare doesn't happen while you're there, you'll still have given the best advice you could.

    I'd rather see a way for the electrical grid to receive and store, or even bleed off, all the excess juice that will be generated through the lines by a massive flare.  If it were only affordable and safe to do, who knows how long a city or state or nation or continent could run off the stored energy from a single massive flare?

  • This is a great post!

    You mention the importance of continuous monitoring which I agree is key.  It's also important to view the monitoring practice itself as one of continuous improvement.  With each failure we can usually find ways to improve our monitoring to better identify the issue before it becomes an apocolypse.  You have item #2 being to identify root cause and fix; item #3 should be to take what you learned in item #2 and improve monitoring.

  • $212,000 an hour.... that does not sound like a good time!

  • Any chance we can just rearrange things so we can leave the mission critical systems alone, and only lose the social media outlets (minus Thwack, of course) and cruddy TV lineups...? emoticons_mischief.png

    I suppose, when it does happen, we will all have new jobs that day... lol