cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post

Alert when Orion HA Failover occurs

Alert when Orion HA Failover occurs

Have alert trigger criteria for when a Orion High Availability Failover event occurs.  This should be a configurable alert that can email team members and notify them a failover has occurred as well as be able to execute programs/scripts like any other alert can do.  The built in email settings function of HA settings is just the default mail send actions.  This is useful if other things needed to happen outside of Orion in the event of a failover.

12 Comments
Product Manager
Product Manager

This is already possible today. There's even an out-of-the-box example pictured below which can be completely customized to satisfy the needs you described.

pastedImage_0.png

Level 10

Thanks for the reply AlterEgo.  I have tried that and the alert criteria doesn't appear to work.  Can you please explain how it works?  When I enabled it, all it seemed to do was alert every minute and then reset itself, after following the KB here: Configure alerts for other DNS types pastedImage_1.png

I contacted Orion support and they stated there was no alert that notified of a failover event and our only option would be some type of custom SQL query.  I may not be understanding it correctly but this appears to be more of a heartbeat than something that notifies of a failover event.

Level 13

Hello,

It looks like reset condition is wrong. It will be reseting after 1Min and trigger again if active node is the same one. (It maintains DNS in good shape every minute)

1. I recommend Copy that DNS alert.

2. Go to reset condition and change from "Reset this alert automatically after" -> Reset this alert when trigger condition is no longer true (Recommended)

It will trigger alert after failover but "Alert will be in triggering state" till another failover will occur. You can definitely acknowledge that alert.

Lada

Level 10

Hello LadaVarga, thanks for the reply.  Can you explain what the trigger conditions are and how that signals only in the event of a failover?  It looks like it is just stating that a pool member is equal to "MainPoller" and the Pool ID is not empty.  Wouldn't these conditions be true at anytime regardless of which HA server is currently running?  I was expecting to see more of a "Failover event occurred" or "Preferred Node is not active"

Level 13

Hello,

It will trigger when any of poll member which is in Active poll will become "Mainpoller".
Node1 = MainPoller (alert triggered)
Node2 = MainPollerStandby
after failover
Node1 = MainPollerStandby (alert reset)
Node2 = MainPoller (Alert triggered)

Unfonantely we don't have such as "failover event occurred" or other one. during failover we definitely knows this will is changed, having event based alert will not be reliable.

You can send email from HA direclty if something is happened: Settings-> http://oriondemo.solarwinds.com/ui/ha/settings

There three events which will send email to default email address.

Lada

Level 10

Thanks I will give it a try and let you know

Level 10

It looks like I got it working.  Step 3 is absolutely wrong and should be updated.  I think what I couldn't picture at first was that there would always be an "active alert" for whichever HA node was active.  I have modified the severity to "Notice" since we don't use that for any other alerts so we can filter it out.  I have tested it with my custom PowerShell script and the DNS change is happening automatically now.  Thanks so much!

pastedImage_0.png

Level 8

solarwinded  wrote:

It looks like I got it working.  Step 3 is absolutely wrong and should be updated.  I think what I couldn't picture at first was that there would always be an "active alert" for whichever HA node was active.  I have modified the severity to "Notice" since we don't use that for any other alerts so we can filter it out.  I have tested it with my custom PowerShell script and the DNS change is happening automatically now.  Thanks so much!

The recommendation to reset the alert automatically was meant for DNS update script, so script action may repeat periodically (there could be some environmental network problem so it is desirable to repeat the update attempts and example script for AWS Route 3 is designed to handle repeatable updates correctly.)

However, your scenario is something different, so reset action does not make sense in your case.

Level 10

My scenario is a DNS update script, that only needs to run when a failover event occurs.  Running a DNS script to update the DNS entry of your current Orion server every 1 minute seems excessive, but I am not sure how AWS Route 3 works

Product Manager
Product Manager

A minute is used to redirect users to the 'active' member in near-real-time when a failover occurs. Updating DNS once a minute may sound excessive, but honestly any DNS server should be able to handle this without breaking a sweat.