This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

UnDP for Cisco fault persists after fault has cleared

Hey all,

I'm designing a UnDP to alert us to faults from our Cisco IMC's.  I've been using the following link as a general guide.

https://communities.cisco.com/docs/DOC-37197

The guide describes two methods for polling this info; snmp traps, or standard snmp polling of the faults table.  I decided to try and use the table to avoid messing with snmp traps.

To test, I setup a RAID on a lab server and then yanked one of the disks.  This produced the expected fault, and I was able to poll fault information from the table.  So far, so good.

pastedImage_3.png

The problem surfaced after the disk was reinserted.  During RAID rebuild, the fault updated to reflect the new status.  Ok, cool.  But then the rebuild completed and the fault in the CIMC cleared... except that the the UnDP is still reporting the fault data.  The following screen cap is current, and the fault has definitely cleared on the CIMC side.

pastedImage_4.png

I thought the CIMC might still be reporting the data, so I did an SNMP walk against the OIDs.  Nothing.  (I was previously using this same method when the fault was active and it reported successfully, as expected.)

Does anyone know why the UnDP's are still reporting the previous status?  Is there a way to automatically clear it when the snmp client stops reporting it?

Thanks!

  • Hello,

    Make sure when you add the Universal Device Poller, you set the polling interval. It's under Show Advanced Options, expand this and input polling interval. Hope this helps.

  • Hey Rick, I like your thinking, but I don't think it's an issue with the UnDP waiting to poll.  As I understand it, leaving the polling interval blank should default the UnDP to the default polling interval in NPM.  Additionally, the UnDP did report the updated status when the disk was reinserted and the RAID was rebuilding.

    Digging into this a little further, I noticed that the OIDs I polled this information from are no longer responding.  My theory is that the UnDP doesn't know what to update because it's no longer getting a response, and apparently the OID only responds when there's an active fault.

    Unless there's an option to clear a UnDP when it stops getting a response (which would introduce it's own set of problems), I think I'm stuck here.  May need to go the trap route, after all. emoticons_sad.png

  • I see, I understand your situation now, the OID is being generated if the device has an alarm, and if it clears, the OID won't be available anymore, thus UnDP will not be able to poll info that the alarm has been cleared. Is there any other OIDs from cucsFaultEntry that gives you status of the RAID?

  • There are some other OIDs we are currently polling for disk status, but I was hoping to avoid creating individual alerts for every possible hardware health OID.  Monitoring faults should cover all hardware issues, but it doesn't look like there's a good way to do it without relying on traps.