Solarwinds Agent Issue RCA & Troubleshooting

Dear Thwack Forum member,

 

i seek you guidance to handle Solarwinds agent related issues , from detecting the cause of issues to troubleshooting the issue.

Our current environment monitor 10,000 servers using SAM -agent based approach, randomly we used to have 100 agent issues / day on average & below is our troubleshooting guidelines.

1. Restart the Solarwinds Service (via automation tools).

2. (If step 1 doesn't help) Re-initialize agents (manually).

While above troubleshooting mostly solve the issues, Since daily we have some random agent issues we would like to understand RCA procedure of such agent issues. Below are the areas we seek your help

- Solarwinds Agent log based analysis - Which log should we ideally look at Agent side ??

- Any way to Automate/ Scripted way to  Re-initialize agents in both windows & linux ??

     

 

Parents Reply Children
  • Also may be good to include your networking team to see if there's any related events occuring and/or at least rule out the basic (if not done already). Ping response time etc... basic network triage

    Maybe even setup a TCP Port check on 17778 for their assigned polling engine?