3 Replies Latest reply: Apr 5, 2012 2:14 PM by drose2878 RSS

Filter Out Non-Critical Hardware

kylebrandt

Dell made the extremely obnoxious choice to list hardware that either:

  • Needs a firmware update
  • It doesn't recognize

As being "non-critical". In order to go nearly all SSD, which Dell has been very behind in adopting, we use Intel SSDs in most of our servers. This makes almost all of our servers have a warning role up and makes them all listed in "nodes with problems".

Is there a way I can filter this out? I don't want all the rollup icons to show as warning for just this. When it comes to nodes with problems, I am concerned that if I use the SQL filtering functionality I will somehow mask actual warning problems I care about by either filtering those out to (Either because the filter targets other warning stats directly, or because the host does have hardware in warning the filter applies).

How can I work around this?

 
  • Re: Filter Out Non-Critical Hardware
    aLTeReGo

    Hardware monitoring in SAM is simply a centralized reflection of how your servers hardware health status is reported by the servers themselves. Dell doesn't provide reason or explanation as to why these components are in this state via SNMP or WMI so it's not possible to filter out driver related issues. This is actually intelligence built into the Server Administrator that determines driver and firmware versions and forces the state of that sensor into a non-critical warning state. The recommended way to avoid this situation in the future is to ensure you don't update your version of Server Administrator unless you're also willing to update the drivers and firmware as well. A potential workaround may in fact be to downgrade the version of Server Administrator you're running to one that's unaware of these later firmware and driver versions.

    • Re: Filter Out Non-Critical Hardware
      kylebrandt

      The problem is that this isn't limited to out of date firmware, but also in includes non-certified drives (i.e. Intel SSDs). The reason they do this is an over-reaction to people buying broken drives on ebay and then calling into Dell to get this fixed -- so yes it is Dells fault.

       

      In terms of not being able to filter this out on Orion's side that to an extent is incorrect. It is also common enough that the check_openmange Nagios plugin (check_openmanage) includes an option to do just this "pdisk_cert Ignore warnings for non-certified physical drives". I imagine it is based off the multi-level status codes already reflected in Orion:

       

      MultipleLevels.png

      So although it might not be possible to filter out reason for being "non-critical" or "Warning", I don't see why there couldn't be an option to ignore "non-critical" / "Warning" states for the array and disk status and only roll up the overall hardware status for these based on "Up" vs "Critical" or "Down".

  • Re: Filter Out Non-Critical Hardware
    drose2878

    Here is what I did, I notice there is an field called "Orginal Status" under the "APM Hardware Sensor", this appears to send Dell's severity rating.  So I setup my Alert rule to filter out all "Non Critial".  It doesn't remove the Hardware Warning Status from the Node view, but at least you can avoid alerts being genereated.