cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post
Level 21

Managing Alerts - Tracking & Accountability in a NOC

I am curious how other folks out there are managing their alerts in a NOC environment?

I need a way to do the following...

  • Present my alerts in such a way that they are obvious to my NOC Techs
  • Provide a way to track how the alert was handled
  • Provide a way to track who handled the alert for accountability

I know that NPM has an Alarm acknowledgment mechanism but that only accounts for Advanced Alerts, not alerts generated by Syslog and Traps.  I would prefer a consolidated "ALARM" interface through which to aggregate all alerts and manage then in once place that provides all of the things listed above on my bullet list.  I am considering sending all of my alerts to our ticketing system to achiveve this.

What are other folks out there doing to accomplish this, I would love to hear from you!

0 Kudos
23 Replies
Level 17

I use a bunch of custom properties and a custom alerts page using a table and stored procedure run every 30 seconds and output to a view (some pro's and con's to this).

I use 3 tiers of alerts named for email and other filtering "CriticalSW", WarningSW and InfoSW.

I use several "Realms" to further narrow down, like "Production", Staging, Development, QA, LAB, Initial, Decommissioned, etc...

I've used a bit CP for business hours only alerting which alerts from 7am to 7pm when enabled and the CriticalSW alerts allow for this.

I have CPs for Mute<Object> which prevents the alert from firing if true and mute<object>notes for documentation purposes.

The options above in " " are the only ones that show up on the operations center's report/view and they are to call an on-call engineer if it shows up here, regardless of time.

My alert structure is such that I have the 3 levels of criticality for each alert. 

We structured so

  • "CriticalSW" and "Production" => Shows on view and Ops makes a call
  • WarningSW - an email goes out to all emails in "AlertEmail" custom property and if populated, "TicketEmail" emails our ticketing system and opens a ticket
  • InfoSW - writes to eventlog only and is reported on if needed.

The CriticalSW <level> alert logic takes into consideration

  • the business hours only bit.
  • the mute node/object bits
  • the name of the component check or app must NOT contain <InfoSw> or <WarningSW> in the name (allows granular control)
  • the alert reset criteria includes logic for if "Production", the ACK note must contain a ticket number (5 numbers) and the operators initials and ties in the AlertActive table to only consider active alerts (see component reset logic example below).

RESET CUSTOM SQL Criteria for Component

JOIN AlertObjects o on ('AM:'+CAST(APM_AlertsAndReportsData.ComponentID as nvarchar)=o.EntityNetObjectId)

    JOIN AlertActive aa ON (o.AlertObjectID=aa.AlertObjectID and TriggeredMessage like '%Crit%Compo%crit%do%')

    JOIN AlertHistory a on (o.AlertObjectID=a.AlertObjectID and a.EventType in (0,2))

    JOIN Nodes n ON (APM_AlertsAndReportsData.NodeId=n.NodeID) 

    JOIN APM_ApplicationCustomProperties cp ON (APM_AlertsAndReportsData.ApplicationId = cp.ApplicationID)

WHERE

(

  n.MuteNode <> 0 OR

  cp.MuteApp <> 0 OR

  APM_AlertsAndReportsData.ComponentStatus NOT IN ('Critical','Down')

)

AND a.AlertActiveID IN (select top 1 AlertActiveID from alerthistory where alertobjectid=o.AlertObjectID order by alertobjectid, alertactiveid desc, timestamp desc)

AND

  -- Prod Critical Ticket Entry

  (CASE WHEN (ISNULL(cp.realm,n.realm) ='Production')

    THEN CASE

   WHEN a.[Message] like '%[0-9][0-9][0-9][0-9][0-9]%' and
                       a.EventType IN (2)
               THEN 1
   ELSE 0
   END

    ELSE 1 --Clear b/c not production

    END

  )=1

Wow, just saw the original post date...oh well, still valid...maybe help people, HERE's an old content exchange post around some of this.

Have fun!

0 Kudos
Level 11

I have a complete redesign of the solarwinds alerts interface that I wrote in PHP. It Requires a ticket number to acknowledge an alert, on top of several visual and functional enhancements:

 

I plan on packaging and releasing on thwack soon, after some more testing 😄

Level 12

Three years later, same question, ticketing to Numara Footprints.  On version 11.5 so would be great if this has been worked out.

0 Kudos
Level 7

this is great I would like, if you can, to show a column where all unassigned/assigned tickets that are in the remedy queue are shown next to the alert. now that will be nice to have. good job julrich!!

0 Kudos
Level 7

Great Work!

0 Kudos
Level 11

Working on packaging latest version now....

Hopefully something coming out soon.

0 Kudos
Level 9

Any news yet on on a release for this yet?

0 Kudos
Level 9

Sorry to resurrect and old thread but any news on your solarwinds alerts interface?

0 Kudos
Level 8

That could prove very useful for us a well.  looks great.  Have you made any progress in unleashing it to the masses?

 

-Kevin

0 Kudos
Level 8

Any updates on this?

0 Kudos
Level 14

Well I'm certainly impressed...

I await the arrival of the package. 🙂

0 Kudos
Level 8

this is a great topic.  Alert management is not currently a strong suit of Orion ((this is my opinion).

We are in the process of integrating alerts from Solarwinds into a ticket management system (Frontrange ITSM).  Within ITSM, all alerts will be treated as incidents, where we can define SLAs and track acknowledgements and actions.

0 Kudos
Level 12

We are using Frontrange HEAT and will go to Frontrange ITSM in a couple of weeks, I would be interested in how this worked for you and how you integrated with ORION.

 

Thanks

0 Kudos
Level 11

We use HEAT for ticketing as well, and for integration we just send alert emails to the HEAT auto ticket generator which assigns the tickets to our NOC. Other than the emails we don't use much of the alerting functions native to Orion. It's worked great this way for years but after the upgrade to 10.1 it's been broken as the ATG can't properly read the html emails the advanced alerts send. At least we still get the alerts, it's just the tickets generated have very little detail.

0 Kudos
Level 14

Just got word yesterday we're getting NUMARA.

So, if anyone has any tips .etc about how to make it work with Orion please let me know.

0 Kudos
Level 21



Just got word yesterday we're getting NUMARA.

So, if anyone has any tips .etc about how to make it work with Orion please let me know.



We use Numara Footprints for our ticketing system.  I have Orion configured to open tickets in Footprints by sending email to it.  You can add specific data in the body of the email that will cause Footprints to automatically set fields on the ticket it opens based on the details that you set, that data will then be stripped out of the body of the email when the ticket is created.  It all seems to work very well.

0 Kudos
Level 14

Have you done anything by programming ORION Advanced Alerts' ESCALATION into Footprints? Or are you only sending basic variables into Footprints?

We are about to have a group meeting on requirements for Footprints and I am hesitant to suggest doing escalations...

I did a proof-of-concepts a few years ago with another product which offered escalation and it worked out well, but I'm don't know Footprint or have an SDK for Footprints.

0 Kudos
Level 21

Right now we are just sending alerts into Footprints via email and setting basic variables with specified things in the body of the email such as "Status=Alert".

0 Kudos
Level 21



this is a great topic.  Alert management is not currently a strong suit of Orion ((this is my opinion).

We are in the process of integrating alerts from Solarwinds into a ticket management system (Frontrange ITSM).  Within ITSM, all alerts will be treated as incidents, where we can define SLAs and track acknowledgements and actions.



Thanks!  Hopefully this will drum up some good ideas for this entire community and potentially some good ideas for SolarWinds for future releases.

The ticket system we use is Footprints by Numara which sounds like it has similar functionality.

0 Kudos
Level 14

i need this function-feature today!

we are about to migrate from using CA (Computer Associates) products to only using Solarwinds - CA manages this to the highest degree but it really is bad with Orion.

and also - i'm hearing we may very soon be getting Numara as well - how do you like it?

0 Kudos