cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post
Highlighted
Level 10

How to get hardware failure alerts

I'm running NPM and want to setup alerts for hardware failures. I was thinking of setting up the solarwinds syslog server and hoping I could find event logs about hardware failures and then hopefully have the syslog send me email alerts. I'm running other things like UCS and a few different kindes of SANs, so I'm also hoping I'll be able to forward logs from those systems also. does anyone have a good solution for setting up hardware failures in NPM?

Tags (1)
0 Kudos
10 Replies
Highlighted
Level 10

Re: How to get hardware failure alerts

anyone have any ideas?

0 Kudos
Highlighted
Level 12

Re: How to get hardware failure alerts

If they devices are capable of SNMP, then you could send traps and use the Trap Server software to send alerts when you trigger the type of trap you are looking for.  You could see if there are any OID's you could use for NPM to poll the device.  Then have NPM alert the custom poller value.

Your syslog solution could also work if the devices send the syslog msgs you are looking for.

Zak Kahl

http://www.loop1systems.com

0 Kudos
Highlighted
Level 10

Re: How to get hardware failure alerts

How do I setup alerts for the custom pollers? I've setup several custom pollers but I don't see them in the drop downs when I'm in the advanced alert manager?

0 Kudos
Highlighted
Level 17

Re: How to get hardware failure alerts

from the Alert Editor, Trigger Condition tab, "Type of Property to Monitor" drop-down:

For alerts on UnDB (custom oil pollers) you would  use either a "Custom Node Poller" (for UnDP that collects a "get" or "get next" value) or Custom Node Table Poller (for UnDP that does a "get table" operation). If your UnDP is interface-related, then you would use "Custom Interface Poller".

On the other hand, if you are still talking about a trap or syslog, you would have to set up the alert in that utility (Trap or Syslog).

Hope that helps.

- Leon

Leon Adato | Head Geek
------
"Measure what is measurable,
and make measurable what is not so." - Gallileo

0 Kudos
Highlighted
Level 10

Re: How to get hardware failure alerts

This sounds like what I'm looking for. I downloaded the netapp undp from thwack and I can see the disk drive status in my orion webpage. I tried all the drop downs like you said for the custom node poller and table poller but I do not see an option for a hard drive status set to failed. Can someone post a screenshot perhaps of a hardware failure alert from a undp? thank you for your help.

0 Kudos
Highlighted
Level 17

Re: How to get hardware failure alerts

Ah... I see where you are confused. You won't see something that says "Hard drive" (or whatever). The alert setup will look something like:

  • "Poller Name" is equal to <your poller name>
  • Value/Rate/Total is greater than (or equal to) <your threshold>

Depending on the specifics of the display, you may also need to add lines for Column number, Row ID, etc.

You can get specifics (ie: find out if you need Value, Rate etc) by creating an email to yourself and then putting ALL the variables in the email

(ie: "Value is ${Value}"

and see which one(s) have the actual information you want.

Leon Adato | Head Geek
------
"Measure what is measurable,
and make measurable what is not so." - Gallileo

0 Kudos
Highlighted
Level 10

Re: How to get hardware failure alerts

OK, I'm getting closer, thanks for your help. I'm using a custom poller named diskfailed message with OID = 1.3.6.1.4.1.789.1.6.4.10

I set my alert to the drop down custom node poller because it is a get next type.

I set the condition to

Poller Name is equal to diskfailedMessage

Status contains fail

the alert sent me an email right away even though I don't have any failed hard drives right now. How do i find out what the condition needs to be for a failed hard drive?

0 Kudos
Highlighted
Level 17

Re: How to get hardware failure alerts

Make sure the poller name matches in the UnDP system and the alert (case sensitive, etc)

Now in your alert email, add a whole butt-load of variables so you can see what is getting detected and returned:

Assignmentname is ${AssignmentName}

OID is ${CustomPollers.OID}

Uniquename is ${CustomPollers.UniqueName}

Rate is ${CustomPollerStatus.Rate}

Rawstatus is ${CustomPollerStatus.RawStatus}

status is ${CustomPollerStatus.Status}

total is ${CustomPollerStatus.Total}

That way you can see EXACTLY what is being triggered, and then re-formulate your actual triggers based on what you are seeing.

Leon Adato | Head Geek
------
"Measure what is measurable,
and make measurable what is not so." - Gallileo

0 Kudos
Highlighted
Level 10

Re: How to get hardware failure alerts

I copied and pasted what you had in the last post and the email returned these results.

OID is 1.3.6.1.4.1.789.1.6.4.10

Uniquename is diskFailedMessage

Rate is 

Rawstatus is 

status is There are no failed disks.

total is

How do I know what the status will be when a disk fails? If I'm understanding this correctly the alert is triggering because the status contains fails even though the status is no failed disks?

0 Kudos