SolarWinds THWACK
THWACK
  • Sign In
  • Search
  • Product Forums
    • Observability
      • SolarWinds Observability
      • Hybrid Cloud Observability
      • DevOps
    • The Orion Platform
      • Forum
      • Scalability Engines (HA, APE, AWS)
      • Enterprise Operations Console (EOC)
      • Orion SDK
      • Alert Lab
      • Report Lab
    • Network Management
      • Network Performance Monitor (NPM)
      • NetFlow Traffic Analyzer (NTA)
      • Network Configuration Manager (NCM)
      • IP Address Manager (IPAM)
      • User Device Tracker (UDT)
      • VoIP & Network Quality Manager (VNQM)
      • Log Analyzer
      • Engineer’s Toolset
      • Network Topology Mapper (NTM)
      • Kiwi CatTools
      • Kiwi Syslog Server
      • ipMonitor
    • Systems Management
      • Server & Application Monitor (SAM)
      • Virtualization Manager (VMAN)
      • Storage Resource Monitor (SRM)
      • Server Configuration Monitor (SCM)
      • SolarWinds Backup
      • Web Performance Monitor (WPM)
    • Database Management
      • Database Performance Analyzer (DPA)
      • SQL Sentry
      • Database Performance Monitor (DPM)
      • Database Mapper
      • Task Factory
    • Application Management
      • AppOptics
      • Loggly
      • Papertrail
      • Pingdom
      • DevOps
    • IT Security
      • Access Rights Manager (ARM)
      • Identity Monitor
      • Security Event Manager (SEM)
      • Patch Manager
      • Serv-U FTP & MFT
    • IT Service Management
      • SolarWinds Service Desk (SWSD)
      • Web Help Desk (WHD)
      • DameWare Remote Support (DRS)
      • DameWare Remote Everywhere (DRE)
      • DameWare Mini Remote Control (MRC)
  • Resources
    • THWACK Command Center
    • DevOps
    • What We're Working On
    • Blogs
      • Community Announcements
      • Product Blog
      • Monitoring Central
      • Geek Speak
      • The DevOps Blog
    • THWACK Tech Tips
    • TechPod
    • Support
      • Success Center
      • Documentation
      • Submit a Support Ticket
      • Customer Portal
      • Renew Maintenance
    • Community Groups
      • New To THWACK
      • Federal & Government
      • User Experience
      • EMEA Group
      • Japan Group
  • Events, Missions, & Musings
    • Events
      • THWACK Livecast
      • THWACK Livecast Archive
      • SolarWinds Lab
      • SolarWinds Lab Archive
      • THWACKcamp 2022 On Demand
      • SolarWinds User Groups
      • SolarWinds Events Calendar
    • Missions & Contests
      • Monthly Mission: What Killed Your Productivity?
      • THWACK 101
    • Musings
      • Water Cooler
      • Geek Tank
      • IT Tech Jobs/Careers
      • Monitoring for Managers
    •  
      •  
  • Content Exchange
    • The Orion Platform
      • Alerts
      • Custom HTML
      • Custom Queries
      • Modern Dashboards
      • Reports
      • Scripts
    • Network Performance Monitor
      • Device Pollers
      • Universal Device Pollers (UnDP)
    • Network Configuration Manager
      • Config Change Scripts
      • Device Templates
      • Firmware Upgrade Templates
      • Policy Documents
    • Server & Application Monitor
      • API Pollers
      • Application Monitor Templates
    • Server Configuration Monitor
      • Policies
      • Profiles
    • Database Performance Analyzer
      • Custom Alerts
      • Custom Metrics
      • Custom Queries
    • Web Help Desk
      • Style Sheets
  • Academy
    • Newsroom
    • Forums
      • Classroom Discussions
      • SolarWinds Certified Professional (SCP)
    • Training & Certification
  • Free Tools & Trials
  • Store
The Orion Platform
  • Content Exchange
  • More
The Orion Platform
Alerts Force HA Failover when a certain percentage of SolarWinds Agent per polling engine reach an unknown state
  • Tags
  • More
  • Cancel
Options
  • View all
  • Previous
  • Next
  • View slideshow
  • More
  • Cancel
fileName

Force HA Failover when a certain percentage of SolarWinds Agent per polling engine reach an unknown state

Overview

We have a unique scenario where a certain amount of SolarWinds Agents go into an unknown state. While we are still investigating the root cause, we noticed that if the Polling Engine assigned to those agents fails over using HA, then the agents will eventually reconnect.

This alert uses a fairly extensive SWQL query to look for certain conditions before it triggers. You will need to update three items in this query to adapt this to your deployment.

Update the following items

  • Percentage of Agent in unknown state (Yellow): From the screenshot this is set to 80% of all agents assigned to that APE that are in an unknown state.
  • Minimum count of Agents per APE (Blue): They might be a wide range of agents assigned to a single APE. For the use-cases where there is a low agent count, this allows you to have a minimum agent count per APE. This is important since the previous number (yellow) could be greatly skews in environments with low agent per APE ratios
  • Minimum amount of time after a failover event before the alert can re-trigger (Green): This is in minutes since the last failover for that respective HA Pool. 720 minutes = 12 hours. It might take some time for all agents to reconnect and we did not want to trigger another HA failover prematurely.

Details of Alert Actions

  • 1st Alert Action: NetPerfMon Event. This logs the details surrounding the conditions of the alert into the Events/Message center. This allows a reference point for historical purposes.

  • 2nd Alert Action: This is a PowerShell scrip that calls into the SolarWinds API to force a failover. The script can be downloaded here and saved to your Main SolarWinds Server. If you run HA on your main server, then download and save it to both the main and the main standby servers.

https://thwack.solarwinds.com/content-exchange/the-orion-platform/m/scripts/3767

Misc

As an additional source of monitoring, I created this SAM Application Template so I can track unknown agent status in my environment.

https://thwack.solarwinds.com/content-exchange/server-application-monitor/m/application-monitor-templates/3766

  • Alert
  • ha
  • agent
  • alert action
  • high availability
  • Agent Management
  • failover
chad.every
chad.every
  • 26 May 2023
  • 1 Download
  • Share
  • More
  • Cancel
Anonymous

SolarWinds solutions are rooted in our deep connection to our user base in the THWACK® online community. More than 190,000 members are here to solve problems, share technology and best practices, and directly contribute to our product development process.

SolarWinds Customer Success Center Certification SolarWinds Lab Link Accounts
About THWACK Blogs Federal & Government Edit Settings Free Tools & Trials
Legal Documents Terms of Use Privacy California Privacy Rights Security Information
©2021 SolarWinds Worldwide, LLC. All Rights Reserved.