Currently we have dozens of different alerts which are 'shared' across all of our different clients.
Background: So going ultra simple, the node down alert is the same for client A as it is for client Z, etc. We also, [and let me state up front I don't agree with it] have what has been dubbed "intelligent alerting". So, NPM alerts which triggers a batch file under the trigger actions. This batch file generates an event file which is collected by BEM (part of the BMC suite) which in turn routes the file into a One ITSM incident.
The intelligent part is handled by BEM and it essentially silences all alarms for a client for 15m after an alert is triggered.
A scenario could be: an interface down triggers and successfully alerts and generates an incident. For the next 15 minutes BEM logs any other alert for that client internally but 'silently' drops them - as in, no incident is raised. So SolarWinds continues to generate as expected, but our man in the middle doesn't. **
The Requirement: However, one of our internal client teams now has a desire to bypass this and they want an eMail for every*** single alert. I also think this is not the approach to take and we are trying to educate them out of their "we want everything to alert just in case" approach.
So this is where you come in - the only method I can currently "see" is to duplicate every alert we have and edit all of them.
- The original set will have additional logic added that basically says 'if = Client Z then ignore'
- whilst the duplicated set will be client Z specific - we can then add a trigger action of 'send an eMail'.
My (current) Answer: So I have recently got in to modern dashboards, and see this as a fantastic approach to re-creating the built-in Top 10 but is client specific. And the opening view is this:
I see this as a reasonable response to their "log everything" request and my "No" response. They can, at a glance see if they have a flood of node downs, etc. This can clearly be expanded on to include other info such as interface downs, nodes in maintenance, nodes muted, etc
So, bottom line, is there any other easy [ish] answer that doesn't involve
duplicating ?40 alerts and editing all of them?
** I have disagreed with this policy from day one on the basis that a simple alert (e,.g. a WAP down) could mask a serious outage (e.g. FW failure or similar) but this change has been driven by management that "just want to see a reduction in incident numbers" - as if reducing numbers fixes problems. Grrrr.
*** I also fundamentally disagree with alert on absolutely everything including items that are managed and monitored by other toolsets.