This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Cisco UCS | Advice on configuring alerts

Hi,

First post so here goes.....

I'm monitoring my Cisco UCS environment via NPM and am pleased to say that I can see what I would expect to see, Cluster IP, Fabric Interconnect A & B, Blades, PSU and Fan's:-)

The problems start when I use the Advanced Alert Manager to create alerts on specific events, like a fan modules. I pulled a fan module earlier and can see that the trap was received but my alert didn't fire! I've attached my alert.

Would anyone be happy to share with me how they have the alerts configured or would anyone from Solarwinds like to get in touch to discuss further?

Things I know that I'm going to need alerts for include....

PSU status

Fan status

UCSM availability

Fabric Interconnect availability

DIMM's that have ECC errors, degraded, inoperable

Any help greatly appreciated.

Cheers,

Greg

UCS Fan Alert.AlertDefinition
  • Capture.JPG

    You need to limit the highlighted area. (or just remove it entirely as the UCSStatus will only apply to UCS devices, which is what I assume you want)

  • Thanks for that. I have a custom Group that all my UCS devices live in. Should I back out of that?

    UCS Fans2.PNG

    UCS Fans4.PNG

    I'm thinking that there is a possibility that the fan wasn't pulled for long enough so I've asked my colleague to pull the fan again.

    Do you have any other UCS Alerts you would be happy to share with me?

    Cheers,

    Greg

  • WHat psu did you pull- chassis or fabric interconnect?

  • It was a fan in a Chassis that I pulled.

    The odd thing is that we pulled the fan at 11:19 ... we got the SNMP trap ... looked at the FAN status in NPM and it was still showing as operable ... looked at UCSM and it shows as missing ... opened a support case 555565 .... did a poll and it was still showing as operable ... a few minutes ago my email alert was received!

    UCS Fans5.PNG

    UCS Fans6.PNG

    It seems like these was some massive lag. I've uploaded to Leapfile so we'll see what happens.

    Cheers,

    Greg

  • I just came of a call with Support and have been advised to adjust to adjust my alert from 1 second to 2 minutes.

    'Do not trigger this action until condition exists for more than 2 minutes'

    I'll pop the fan back in and see what happens.

    Cheers,

    Greg

  • How long between the 11:19 event and the email alert?

    And no, you don't have to remove that custom property, that is fine.

  • Alert received at 01:01, approx 1:40 delay.

    Spoke to support again and the settings are back to where they were.

    Also, the custom property is working fine:-)

    Cheers,

    Greg

  • Just going back to my original point about UCS monitoring...... Would anyone be happy to share with me how they have the alerts configured or would anyone from Solarwinds like to get in touch to discuss further?

    Happy Friday:-)

  • I think you're not getting a lot of responses because you're on the right track already.

    UCS monitoring is really straight-forward and set around PSUs, fans, up/down on blades and a little more with SNMP. All of this would be fairly simple to setup through the Advanced Alert Manager, just like you did with your first alert.

    If you're SQL-oriented, you can look through the UCS tables in the database and alert on any column there. And, like always, you can alert on SNMP Traps that fire only when a specific event occurs.

    Do you have anything specific you want to alert on that you are having trouble with?

  • Sorry but I disagree completely

    Anyone pull a power supply in your UCS and see if it alarms somewhere in the Solarwinds UCS summary page?

    I did not and had to build a resource and alarm just for that. Also I found that the active fabric interconnect also responded as the ${nodeid} for the alert so I had to add a SQL subselect to get the actual alarming component.

    Hopefully everyone proves me wrong and the time was wasted.

    Here's the alert

    Screen Shot 2013-12-13 at 6.04.55 PM.png

    and the added SQL in the email action:

    ${SQL:SELECT c.name FROM npm_ucschassis c inner join[dbo].[NPM_UCSPsus]p on p.parentid=c.id where p.parenttype = 0 and p.parentid = ${ParentID}}

    And here's my custom query resources added to the UCS summary page:

    SELECT n.caption,p.Power, p.Status, p.Model, p.Name FROM Orion.NPM.UCSPsus p

    inner join Orion.npm.UCSFabrics u on p.parentid=u.id

    inner join Orion.Nodes n on u.nodeid=n.nodeid

    where p.parenttype = 1 and p.status <> 'operable'

    UNION ALL

    (

    SELECT n.caption,f.Power, f.Status, f.Model,f.Name FROM Orion.NPM.UCSfans f

    inner join Orion.npm.UCSFabrics u on f.parentid=u.id

    inner join Orion.Nodes n on u.nodeid=n.nodeid

    where f.parenttype = 1 and f.status <> 'operable'

    )

    --------&---------

    SELECT n.caption,p.Power, p.Status, p.Model,p.Name FROM Orion.NPM.UCSPsus p

    inner join Orion.npm.UCSFabrics f on p.parentid=f.id

    inner join Orion.Nodes n on f.nodeid=n.nodeid

    where parenttype = 1 and

    (

    hostnodeid = ${nodeid} or

    f.nodeid = ${nodeid}

    )

    --------&---------

    SELECT n.caption,p.Power, p.Status, p.Model,p.Module,p.Name FROM Orion.NPM.UCSfans p

    inner join Orion.npm.UCSFabrics f on p.parentid=f.id

    inner join Orion.Nodes n on f.nodeid=n.nodeid

    where parenttype = 1 and

    (

    hostnodeid = ${nodeid} or

    f.nodeid= ${nodeid}

    )