cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post
Highlighted
Product Manager
Product Manager

Re: HA host going into down(red)state

I have reached out to support management and requested that your case be escalated.

0 Kudos
Highlighted

Re: HA host going into down(red)state

aLTeReGo​, I had a ticket open a while back for mine but it still reoccurs

Case # 00237473 - HA Service on multiple pools reports failed

0 Kudos
Highlighted
Level 10

Re: HA host going into down(red)state

The only thing we have done so far with support is pull diagnostic files.  All our servers are up to date with current Microsoft patches now so we have eliminated that as a possibility.

0 Kudos
Highlighted
Level 10

Re: HA host going into down(red)state

I'm still working with support on the case.  They did give me a cool alert to setup.  Basically it does a SWIS query against the HA pool members table.  I'll upload the xml file they gave me as well.

Inner join HA_PoolMembers

on Nodes.Sysname = HA_PoolMembers.Hostname

where HA_PoolMembers.status != '2'

pastedImage_0.png

0 Kudos
Highlighted
Level 10

Re: HA host going into down(red)state

I changed the query a bit.  It looks like status=1 is healthy in the HA.poolmembers table.  We basically wrote an alert looking for this condition.

SELECT HostName, Status, LastHeartbeatTimestamp

FROM Orion.HA.PoolMembers

WHERE Status != '1'

0 Kudos
Highlighted
Level 10

Re: HA host going into down(red)state

Update on this case. I am still working with support.  We really have not found any evidence pointing to root cause.  I really cannot say what the origin of the problem is at this point but will update the post.  I did work with my contact at Solarwinds to write a query that will show us when the lastheartbeat time stamp is greater than 1 hour of the current UTC.  This alerts us when the condition occurs.  We do see some cases where status=1 and we have HA in critical state.

pastedImage_0.png

0 Kudos
Highlighted

Re: HA host going into down(red)state

Mine had been quiet for a few weeks and then today i get in and bang all but two pools is red again...

pastedImage_0.png

Supports answer was to recreate the pools due to an erroneous record in the database but it still happens.

0 Kudos
Product Manager
Product Manager

Re: HA host going into down(red)state

Have you tried restarting services on one of those pollers? The lack of a heartbeat for that long sounds like connectivity to the database was lost or a service stopped.

0 Kudos
Highlighted

Re: HA host going into down(red)state

Restarting the services is the only way to get them back to green, although when restarting the services it is 50/50 whether it will stop and generally requires the process to be killed on each node in a pair to bring the pool to green.

Next time it does it occurs i'll try and grab some logs to see if that gives any further clues.

0 Kudos
Highlighted
Product Manager
Product Manager

Re: HA host going into down(red)state

There are a couple of similar issues reported by other customers which were addressed in Orion Platform 2019.2, included in NPM 12.5 which can be downloaded now from your Customer Portal.

0 Kudos