For the purpose of this discussion of alerting let’s assume a business with a very small IT staff consisting of two techs and the IT manager. Besides receiving alerts on network events—including device failures and excessive use of disk space or bandwidth—the team needs a way to take action as efficiently as possible should multiple concurrent events require simultaneous intervention.


In three standard ways IT systems provide state-related alerts: through SNMP polling that a monitoring application initiates; through SNMP traps that, when triggered, send an alert code to the monitoring console for parsing and user-friendly display; and through syslog message forwarding.


The IT manager’s challenge in this case is to define the best troubleshooting and resolution workflow.


Engineering the Right Process

 

Implicitly, the IT manager and team needs a monitoring application that both provides timely information about problems and begins to organize and coordinate a response. Minimally, in setting up network monitoring, the team depends on it to do these things:

  1. Immediately, as soon as the application recognizes an alert condition, the application generates and sends an email to the entire team, sends a page to the tech who is on call, and writes an entry into an event log.
  2. If the alert is not acknowledged within 20 minutes, the application fires a second alert, generating another team email and another page—this one targeted at both techs, and writes another entry into the event log.
  3. If the second alert is not acknowledged within 20 minutes, the application fires a third alert, generating a third team email and another page—this one targeted both techs and the IT manager.

With escalated alerts the team can efficiently triage concurrent events by not losing time on communication-related confusion. Acknowledgments let everyone know that a team member is fielding a specific issue. Assuming that everyone is available as expected, all three members of the small team would be simultaneously engaged in addressing three different issues within 60 minutes.


SolarWinds Network Configuration Manager supports the escalated alert features needed to implement this workflow.