Application Apocalypse

Capture.PNGApologies for the doomsday reference, but I think it’s important to draw attention to the fact that business-critical application failures are creating apocalyptic scenarios in many organizations today. As businesses become increasingly reliant on the IT infrastructure for hosting applications and Web services, the tolerance for downtime and degraded performance has become almost nil. Everyone wants 100% uptime with superior performance of their websites. Whether applications are hosted on-premises or in the cloud, whether they are managed by your internal IT teams or outsourced to managed service providers, maintaining high availability and application performance is a top priority for all organizations.

Amazon Web ServicesRegistered (AWS) experienced a massive disruption of its services in Australia last week. Across the country, websites and platforms powered by AWS, including some major banks and streaming services, were affected. Even AppleRegistered experienced a widespread outage in the United States last week, causing popular services, including iCloudRegistered, iTunesRegistered and iOSRegistered App Store to go offline for several hours.

I’m not making a case against hosting applications in the cloud versus hosting on-premises. Regardless of where applications are running—on private, public, hybrid cloud, or co-location facilities—it is important to understand the impact and aftermath of downtime. Take a look at these statistics that give a glimpse of the dire impact due to downtime and poor application performance:

  • The average hourly cost of downtime is estimated to be $212,000.
  • 51% of customers say slow site performance is the main reason they would abandon a purchase online.
  • 79% of shoppers who are dissatisfied with website performance are less likely to buy from the same site again.
  • 78% of consumers worry about security if site performance is sluggish.
  • A one-second delay in page response can result in a 7% reduction of conversions.

*All statistics above sourced from Kissmetrics, Brand Perfect, Ponemon Institute, Aberdeen Group, Forrester Research, and IDG.

 

Understanding the Cost of Application Downtime

  • Financial losses: As seen in the stats above, customer-facing applications that perform unsatisfactorily affect online business and potential purchases, often resulting in customers taking their business to other competitors.
  • Productivity loss: Overall productivity will be impacted when applications are down and employees are not able to perform their job or provide customer service.
  • Cost of fixing problems and restoring services: IT departments spend hours and days identifying and resolving application issues, which involves labor costs, and time and effort spent on problem resolution.
  • Dent on brand reputation: When there is a significant application failure, customers will start having a negative perception about your organization and its services, and lose trust in your brand.
  • Penalty for non-compliance: MSPs with penalty clauses included in service level agreements will incur additional financial losses.

Identifying and Mitigating Application Problems

Applications are the backbone of most businesses. Having them run at peak performance is vital to the smooth execution of business transactions and service delivery. Every organization has to implement an IT policy and strategy to:

  1. Implement continuous monitoring to proactively identify performance problems and indicators.
  2. Identify the root cause of application problems, apply a fix, and restore services as soon as possible, while minimizing the magnitude of damage.

It is important to have visibility, and monitor application health (on-premises and in the cloud) and end-user experience on websites, but it is equally important to monitor the infrastructure that supports applications, such as servers, virtual machines, storage systems, etc. There are often instances where applications perform sluggishly due to server resource congestion or storage IOPS issues.

Share your experience of dealing with application failures and how you found and fixed them.

Check out this ebook "Making Performance Pay - How To Turn The Application Management Blame Game Into A New Revenue Stream" by NEXTGEN, an Australia-based IT facilitator, in collaboration with SolarWinds.

eBook_Cover-72-SolarWinds.jpg

Anonymous
Parents
  • I like the thoughts here, and will inject one of my own, which others find trite:  Those who ignore history are doomed to repeat it.

    Putting our trust in WAN providers, ISP's, and especially in ASP's referred to as "the cloud" is an extremely risky proposition, given how vulnerable those services are to things completely out of our control.

    We can defend against hackers, malware, and human error.  We can build in redundancy to account for natural disasters like earthquakes and hurricanes.  All at a huge price.

    But we've done nothing to protect our users and data from space.  I know--"this guy's off his rocker--aliens?  Give me a break."

    No, I'm referring to events like the Carrington Flare (Solar storm of 1859 - Wikipedia, the free encyclopedia) that burned telegraph operators and their transmission lines and utility poles.  (Wiki quote:  "Telegraph systems all over Europe and North America failed, in some cases giving telegraph operators electric shocks.[12] Telegraph pylons threw sparks.[13] Some telegraph operators could continue to send and receive messages despite having disconnected their power supplies.[14]).

    In 2011, National Geographic published thoughts about the impact of such a solar storm happening today:  http://news.nationalgeographic.com/news/2011/03/110302-solar-flares-sun-storms-earth-danger-carrington-event-science/

    If you don't take the time to read the above linked article, here's a snip:

    pastedImage_1.png

    Lloyds of London was reported to have estimated the cost of a single Carrington Flare in 2014 would be $2.6 Trillion.  And it would take ten years to recover from it.

    Happily, the United States government is working to coordinate the creation of a power grid that could recover more quickly than ten years.

    pastedImage_2.png

    The same article  goes on:

    pastedImage_3.png

    Even as we are concerned about dollars lost for minutes or a single hour of network downtime, we must also be planning for a serious Disaster Recovery--one that makes a Hurricane Katrina sized impact on electric resources and satellites and data transfer.

    Are you prepared?  What would it cost to become prepared?

    Would such a flare damage backed up files on a thumb drive in your desk?  Would copper lines generate electricity that damages NIC's and switch ports?

    Do SATA and SSD media have risk of data loss or physical damage?  How about tape backups stored off-site--are they safe, or would they be erased/corrupted?

    I'm not advocating putting on your aluminum foil helmet and heading for the 1950's fallout shelter buried in the back yard.  But being aware of actual risks can put you ahead of the game.  Even as our governments advocate having an earthquake kit if you live in that kind of area, or a survival kit if you live where tornadoes or hurricanes or ice storms can cause extended electrical outages, so too should we have a "data survival mode" that is a "down-time-procedure on steroids."

    This happened to North American telegraphs in 1859.  It happened to Hydro-Quebec's power grid in 1989.  We're foolish to not learn from history, and we're ridiculous if we build an electronic IT infrastructure that remains vulnerable to something history has proven happens repeatedly.

Comment
  • I like the thoughts here, and will inject one of my own, which others find trite:  Those who ignore history are doomed to repeat it.

    Putting our trust in WAN providers, ISP's, and especially in ASP's referred to as "the cloud" is an extremely risky proposition, given how vulnerable those services are to things completely out of our control.

    We can defend against hackers, malware, and human error.  We can build in redundancy to account for natural disasters like earthquakes and hurricanes.  All at a huge price.

    But we've done nothing to protect our users and data from space.  I know--"this guy's off his rocker--aliens?  Give me a break."

    No, I'm referring to events like the Carrington Flare (Solar storm of 1859 - Wikipedia, the free encyclopedia) that burned telegraph operators and their transmission lines and utility poles.  (Wiki quote:  "Telegraph systems all over Europe and North America failed, in some cases giving telegraph operators electric shocks.[12] Telegraph pylons threw sparks.[13] Some telegraph operators could continue to send and receive messages despite having disconnected their power supplies.[14]).

    In 2011, National Geographic published thoughts about the impact of such a solar storm happening today:  http://news.nationalgeographic.com/news/2011/03/110302-solar-flares-sun-storms-earth-danger-carrington-event-science/

    If you don't take the time to read the above linked article, here's a snip:

    pastedImage_1.png

    Lloyds of London was reported to have estimated the cost of a single Carrington Flare in 2014 would be $2.6 Trillion.  And it would take ten years to recover from it.

    Happily, the United States government is working to coordinate the creation of a power grid that could recover more quickly than ten years.

    pastedImage_2.png

    The same article  goes on:

    pastedImage_3.png

    Even as we are concerned about dollars lost for minutes or a single hour of network downtime, we must also be planning for a serious Disaster Recovery--one that makes a Hurricane Katrina sized impact on electric resources and satellites and data transfer.

    Are you prepared?  What would it cost to become prepared?

    Would such a flare damage backed up files on a thumb drive in your desk?  Would copper lines generate electricity that damages NIC's and switch ports?

    Do SATA and SSD media have risk of data loss or physical damage?  How about tape backups stored off-site--are they safe, or would they be erased/corrupted?

    I'm not advocating putting on your aluminum foil helmet and heading for the 1950's fallout shelter buried in the back yard.  But being aware of actual risks can put you ahead of the game.  Even as our governments advocate having an earthquake kit if you live in that kind of area, or a survival kit if you live where tornadoes or hurricanes or ice storms can cause extended electrical outages, so too should we have a "data survival mode" that is a "down-time-procedure on steroids."

    This happened to North American telegraphs in 1859.  It happened to Hydro-Quebec's power grid in 1989.  We're foolish to not learn from history, and we're ridiculous if we build an electronic IT infrastructure that remains vulnerable to something history has proven happens repeatedly.

Children
No Data