Application Troubleshooting: Making Molehills Out of Mountains

When:           August 2005

Where:          Any random organization experimenting with e-commerce

Employee:         I can’t access the CRM! We might lose a deal today if I can’t send that quote.

Admin:               Okay, check the WAN, check the LAN, and the server crm1.primary.

Junior:                All fine.

Admin:               Great, Restart the application service. That should solve it.

Junior:                Yes, Boss! The app service was down. I restarted it and now CRM is back up.

When:             August 2015

Where:            Any random organization that depends on e-commerce

Sales Rep:           Hey! The CRM is down! I can’t see my data. Where are my leads! I’m losing deals!

Sales Director:     Quick! Raise a ticket, call the help desk, email them, and cc me!

Help desk:           The CRM is down again! Let me assign it to the application team.

App team:            Hmm. I’ll reassign the ticket to the server guys and see what they say.

SysAdmin:           Should we check the physical server? Or the VM instance? Maybe the database is down.

DB Admin:           Array, disc, and LUN are okay. There are no issues with queries. I think we might be fine.

Systems team:     Alright, time to blame the network!

Net Admin:           No! It’s not the network. It’s never the network. And it never will be the network!

Systems team:     Okay, where do we start? Server? VM? OS? ApacheRegistered? App?

See the difference?

App deployment today

Today’s networks have changed a lot. There are no established points of failure like there were when the networks were flat. Today’s enterprise networks are bigger, faster, and more complex than ever before. While current network capabilities provide more services to more users more efficiently, this also has led to an increase in the time it takes to resolve an issue, much less pinpoint the cause of failure.

For example, let’s say a user complains about failed transactions. Where would you begin troubleshooting? Keep in mind the fact that you’ll need to check the Web transaction failures, make sure the server is not misbehaving, and that the database is good. Don’t forget the hypervisors, VMs, OS, and the network. Also consider the fact that there’s switching between multiple monitoring tools, windows, and tabs, trying to correlate the information, finding what is dependent on what, collaborating with various teams, and more. All of this increases the mean time to repair (MTTR), which means increased service downtime and lost revenue for the enterprise.

Troubleshoot the stack

Applications are not standalone entities installed on a WindowsRegistered server anymore. Application deployment relies on a system of components that must perform in unison for the application to run optimally. A typical app deployment in most organizations looks like this:

app stack.png

When an application fails to function, any of these components could be blamed for the failure. When a hypervisor fails, you must troubleshoot multiple VMs and the multiple apps they host that may have also failed. Where would troubleshooting begin under these circumstances?

Ideally, the process would start with finding out which entity in the application stack failed or is in a critical state. Next, you determine the dependencies of that entity with other components in the application stack. For example, let’s say a Web-based application is slow. A savvy admin would begin troubleshooting by tracking Web performance, move to the database, and on to the hosting environment, which includes the VM, hypervisor, and the rest of the physical infrastructure.

To greatly reduce MTTR, it is suggested that you begin troubleshooting your application stack.  This will help move your organization closer to the magic three nines for availability. To make stack-based troubleshooting easier, admins can adopt monitoring tools that support correlation and mapping of dependencies, also known as the AppStack model of troubleshooting.

Learn more at Microsoft Convergence 2015.

If you would like to learn more, or see the AppStack demo, SolarWinds will be at booth 116 at Microsoft Convergence in Barcelona.


Parents Comment Children
No Data
Thwack - Symbolize TM, R, and C