Open for Voting
over 1 year ago

Mute Alerts and SLA Reporting

The device maintenance function within Orion currently provides two options to control alerting and monitoring during maintenance windows.

Unmanage Object

This works by stopping all polling to the object. With no data in the database, alerts cannot fire and with no records for status of the object being down SLA availability reports cannot include that data and so (assuming the device was up the whole time outside of the Unmanage schedule) SLA reports will show 100% uptime.

All good, with two primary functions provided

Mute Alerts

This option works by allowing the alert to be evaluated, but in addition if any objects that would generate an alert appear in the Muted schedule, the alert will be suppressed. All polling functionality continues, so that during a maintenance window created with this method, performance metrics are still collected, object status is identified.

This is more often used, as our customers indicate they still wish to see utilisation data during these maintenance windows. They would also like to see exactly when devices/services go down during the window, as this confirms expected activity and gives them a record of the timings of service loss events. Mute alerts simply means that alerting does not get generated.

The problem occurs where a customer who has to supply an SLA availability report that excludes scheduled maintenance windows with the Mute method, as any downtime during this period will still be included in the percentage downtime calculation due to the fact that the device status is still marked as down and the values in the database used to calculate availability are not affected by the Mute state.


Provide a mechanism that allows for Mute periods to not affect SAL availability report output

Either build the data structure to allow historic mute periods to be recorded easily or provide a status and data structure change that allows these time periods to be queried relatively easily for exclusion from availability calculation. Currently Auditing needs to be enabled and the data structure is not very efficient for including mute periods over a long period of time.

Top Comments

  • Oh, I like this one m_roberts​! +1 emoticons_grin.png

    Mini rant: Whilst you can shoehorn MSP-like behaviour into Orion via custom properties and fancy reporting, out of the box the platform simply assumes it's just monitoring a single monolithic estate. There is nothing stopping Orion from being used in a multi-tenanted environment, but it's clear, by the hoops you have to jump through, that the platform is simply not designed with this in mind.

    I have long believed that if SWI came up with 'Orion MSP', with a different licensing structure and which includes all the features of a fully licensed Orion instance, it would fly off the shelves. After all, Orion supports up to 100 polling engines now with NPM 12.4, that's a healthy number of MSP clients, if you assume one polling engine per customer. NOM/NAM were baby steps in that direction, but they didn't go nearly far enough.

    (Note I'm talking full MSP customers here, not the one-demand, break/fix-type comodotised customers that solutions like SolarWinds RMM and N-central are designed to handle. This offering would be for enterprise-grade MSP clients)

    People want it, SWI, so if you build it (and price it sensibly), they will come!