cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post
Highlighted

Re: HA host going into down(red)state

Nice

Just need my maintenance sorting (today hopefully) and i can get the next RC downloaded to take a look.

Was there anything particular that was addressed that you can share details on? would be good to understand what was the cause and if our environments was a contributing factor.

0 Kudos
Highlighted
Product Manager
Product Manager

Re: HA host going into down(red)state

It's a little too in the details to explain, and quite honestly without a lot of Orion development context, probably even more difficult to understand. If you're uncomfortable upgrading without validation that the issue you're experiencing is the same as the ones addressed in Orion Platform 2019.2, then I recommend opening a case with support. They will be able to investigate the issue and work with engineering to determine if the issue you're encountering is related.

0 Kudos
Highlighted

Re: HA host going into down(red)state

Not a problem, to be honest hearing the issue should be resolved is enough.

If it still occurs then will raise a case to see if our issue is a different one.

Thank you

0 Kudos

Re: HA host going into down(red)state

Hi aLTeReGo​,

Upgraded to 2019.2 HF2 a couple of weeks back and so far so good, haven't had a single issue with HA

Email alerts have been a little bit noiser on the main poller but i can live with that, thanks for the work on this.

Highlighted
Level 10

Re: HA host going into down(red)state

Update on our case.  We need to run validation for a few more days but I think we found the root cause.  We are running SQL HA in 2 separate sites.  Our DBA's had mistakenly set the replication to synchronous.  Since the database VM's are in separate sites we need asynchronous replication.  After configuring asynchronous replication in SQL we have not seen the HA nodes change to critical state and the HA table stop updating.  We also setup the configuration manager to point to the FQDN of the SQL listener.  I think the replication configuration may be the culprit.  Will update the case once we are sure.

They key error we saw and that support pointed out was in the Solarwinds application event log:

og Name: SolarWinds.Net

Source: SolarWinds.SyslogTraps.SyslogService

Date: 4/23/2019 3:51:49 AM

Event ID: 4001

Task Category: None

Level: Error

Keywords: Classic

User: N/A

Computer: HOST-our-host

Description:

Service was unable to open new database connection when requested.

SqlException: A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: TCP Provider, error: 0 - The remote computer refused the network connection.)

Connection string - Data Source=OUR_SQL_Instance ;Initial Catalog=SolarWindsOrion;Persist Security Info=False;User ID=SQL-dummy-user ;X;Max Pool Size=1000;Connect Timeout=20;Load Balance Timeout=120;Packet Size=4096;Application Name="Orion Syslog Service@SyslogService.exe";Workstation ID=Ourhost;MultiSubnetFailover=True

Event Xml:

Highlighted
Level 10

Re: HA host going into down(red)state

We did finally close our case.  I have seen the our HA environment stay in healthy state for over 10 days now.  So yes, if you have your HA pool go to critical state, look for any SQL errors in the Solarwinds event log and make sure SQL connectivity is not the root cause.