This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

How to get hardware failure alerts

I'm running NPM and want to setup alerts for hardware failures. I was thinking of setting up the solarwinds syslog server and hoping I could find event logs about hardware failures and then hopefully have the syslog send me email alerts. I'm running other things like UCS and a few different kindes of SANs, so I'm also hoping I'll be able to forward logs from those systems also. does anyone have a good solution for setting up hardware failures in NPM?

  • If they devices are capable of SNMP, then you could send traps and use the Trap Server software to send alerts when you trigger the type of trap you are looking for.  You could see if there are any OID's you could use for NPM to poll the device.  Then have NPM alert the custom poller value.

    Your syslog solution could also work if the devices send the syslog msgs you are looking for.

    Zak Kahl

    http://www.loop1systems.com

  • How do I setup alerts for the custom pollers? I've setup several custom pollers but I don't see them in the drop downs when I'm in the advanced alert manager?

  • from the Alert Editor, Trigger Condition tab, "Type of Property to Monitor" drop-down:

    For alerts on UnDB (custom oil pollers) you would  use either a "Custom Node Poller" (for UnDP that collects a "get" or "get next" value) or Custom Node Table Poller (for UnDP that does a "get table" operation). If your UnDP is interface-related, then you would use "Custom Interface Poller".

    On the other hand, if you are still talking about a trap or syslog, you would have to set up the alert in that utility (Trap or Syslog).

    Hope that helps.

    - Leon

  • This sounds like what I'm looking for. I downloaded the netapp undp from thwack and I can see the disk drive status in my orion webpage. I tried all the drop downs like you said for the custom node poller and table poller but I do not see an option for a hard drive status set to failed. Can someone post a screenshot perhaps of a hardware failure alert from a undp? thank you for your help.

  • Ah... I see where you are confused. You won't see something that says "Hard drive" (or whatever). The alert setup will look something like:

    • "Poller Name" is equal to <your poller name>
    • Value/Rate/Total is greater than (or equal to) <your threshold>

    Depending on the specifics of the display, you may also need to add lines for Column number, Row ID, etc.

    You can get specifics (ie: find out if you need Value, Rate etc) by creating an email to yourself and then putting ALL the variables in the email

    (ie: "Value is ${Value}"

    and see which one(s) have the actual information you want.

  • OK, I'm getting closer, thanks for your help. I'm using a custom poller named diskfailed message with OID = 1.3.6.1.4.1.789.1.6.4.10

    I set my alert to the drop down custom node poller because it is a get next type.

    I set the condition to

    Poller Name is equal to diskfailedMessage

    Status contains fail

    the alert sent me an email right away even though I don't have any failed hard drives right now. How do i find out what the condition needs to be for a failed hard drive?

  • Make sure the poller name matches in the UnDP system and the alert (case sensitive, etc)

    Now in your alert email, add a whole butt-load of variables so you can see what is getting detected and returned:

    Assignmentname is ${AssignmentName}

    OID is ${CustomPollers.OID}

    Uniquename is ${CustomPollers.UniqueName}

    Rate is ${CustomPollerStatus.Rate}

    Rawstatus is ${CustomPollerStatus.RawStatus}

    status is ${CustomPollerStatus.Status}

    total is ${CustomPollerStatus.Total}

    That way you can see EXACTLY what is being triggered, and then re-formulate your actual triggers based on what you are seeing.

  • I copied and pasted what you had in the last post and the email returned these results.

    OID is 1.3.6.1.4.1.789.1.6.4.10

    Uniquename is diskFailedMessage

    Rate is 

    Rawstatus is 

    status is There are no failed disks.

    total is

    How do I know what the status will be when a disk fails? If I'm understanding this correctly the alert is triggering because the status contains fails even though the status is no failed disks?

  • So your alert should be tweaked so it triggers when:

    UniqueName = "diskFailedmessage"

    (no threshold yet)

    Now the hard part. You need to make something actually break so you can get a positive confirmation on the error. You *might* be able to figure out the message from the MIB document, but in reality, until you see it live all bets are off.

    We have a policy at my company that no alert goes live (ie: cuts a ticket, etc) until we've seen at least one alert "in the wild" so we have that level of comfort that we know exactly what we'll be seeing.

    If you absolutely can't fabricate an alert, then your next best option is to set the second trigger as:

    ${CustomPollerStatus.Status} IS NOT EQUAL TO "There are no failed disks."

    Then you won't get "I'm OK" messages, but you won't know what you WILL get until you start getting it. I'd leave those other variables in until you've completely worked it out.