This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Hardware Alerts > Event Log

Is there a solution to only display non OK Hardware Health of Servers.

With this new S&A Monitor one server reload will create near 50 events when you incorporate all of the hardware and application events.

This makes having the home page event log almost unusable from the operations perspective.

In an ideal situation I'd prefer more controls so that the events > event log only if there was something operations needs to know and act on.

Example:

  [server node name] reloaded.

After a time delay of nn minutes

  [server node name]  DHCP failed to load.

  [server node name]  WINS failed to load.

  [server node name] Hardware sensor Physical Disk 1 of 4 Critical

 

No one pays attention to every event; might as well display those that are interesting to note.

  • SAM 5.0 does not include a hardware health summary resource. This is actually something we're currently working on.In the meantime some customer have opted to create hardware reports and embed these as resources of their summary view. Also, there are out of the box alerts for hardware monitoring which should bubble up any hardware issues under the Active Alerts resource. This information will also be included in the events resource, which can be searched, filtered, and displayed on any summary view.

    As for your question regarding events, you can disable some of these events by editing the alert action that triggered them using the Advanced Alert Manager. You can also search and filter these events in the Events view for easier viewing. I would recommend configuring your alerts to write to the Events log using a standard structure that can be easily sorted and filtered. This should make using the events resource easier and more effective at bubbling up problems.

  • I'm glad that there is some work in progress for the hardware health summary, because the amount of positives events on a server re-loads is somewhat interruptive for ops to wade through.

    The Alert Triggers are fine, they only appear when there is an issue-Great. Only one comment there.  The hardware component is noted on the alert, the node name should be added.

    The Comment on modifying the Advanced alerts is not realistic when you are monitoring many servers.  Each having 'unique' behaviors.

    Example:

    on [Server 1] no alert on [Fan A]

    on [Server 2] no alert on [Temperature C]

    ...

    on [Server 101] no alert on [Chassis Intrusion]

    I pondered on the recommendation "I would recommend configuring your alerts to write to the Events log using a standard structure that can be easily sorted and filtered. This should make using the events resource easier and more effective at bubbling up problems" for several days now, and I can not figure out why or how this would help.  An example for the less deft of mind may help.

  • Maybe I am getting complex.

    How can I stop OK EVENTS being reported by "Hardware Health of Servers"?

    Did someone really put in a change request to forward all positive and negative hardware events to the display log? 

  • This is the behavior of all Orion managed elements. Anything that triggers a negative (Down/Warning/Critical) status event also triggers a positive (Up/Ok) status so when auditing events you can easily see for instance when a device was exhibiting problems, and when it was resolved.

    Starting with SAM 5.0 we added the ability for customers to filter events. It's not a simple process as the request is fairly infrequent. It does require making changes directly to the database so if you're not comfortable altering data in the database directly I recommend contacting support for assistance.

    To prevent the logging of a specific event you'll need to choose an existing event which was generated by SAM. Then go to the Orion DB using the included Database Manager tool, or the SQL Management Studio to the table EventTypes. SAM events have ID's from 500 - 513. Locate the event you no longer want logged and change the Record Value to 0.

  • Thank You for the reply.

    This enhancement is again another in a recent series of changes that does not provide the customer a method to tailor for their network.

    With earlier versions of Orion you'd get a simple up / down server Event notices.

    Then with the earlier versions of Alert Manager up / down server notices included services

    The with APM the up / down notices were followed with a host of other events.

    Now with Hardware Events being added we are receiving an additional 15 - 25 OK Events.

    So over the past several years the number of Events captured on an up / down situation has grown from 1 to near 50 (each situation) with no mechanism to discriminate what is captured, or displayed.

    I am aware that discrete monitoring does provide significant benefit; however to display events because it follows past practices should not be the goal.  As above I would recommend a solution be provided allowing the users to elect what events OK, Warning, Down, Critical they wish to receive.  Potentially use a similar interface as the Custom Property Editor.

  • In this scenario I would recommend disabling all events following the method I provided above to disable all out of the box events. Then create your own actions to event upon using the Advanced Alert Manager with the "Log the Alert to the NetPerfMon Event Log" trigger action. This will allow you to define precisely what events you'd like logged, as well as the ability to define the exact event messaging format, including what information is contained in the event message. Below is an example.

    CustomOrionEvents.png